webdev - freeCodeCamp.org

How to Build a Browser-Based PDF Image Extractor Using JavaScript

Bhavin Sheth — Mon, 13 Jul 2026 14:12:57 +0000

PDF files are widely used for sharing documents because they preserve formatting across different devices. Many PDFs contain valuable images such as logos, product photos, charts, diagrams, illustrations, and marketing graphics.

While these images are easy to view, extracting them individually isn't always simple. Many users rely on screenshots or manual cropping, which can reduce image quality and take unnecessary time.

In this tutorial, you'll build a browser-based PDF Image Extractor using JavaScript. The application lets users upload a PDF, preview its pages, extract embedded images, organize them by page, and download individual images or all extracted images at once.

Everything runs directly inside the browser, so uploaded documents never leave the user's device. This makes the tool fast, private, and easy to use without requiring a backend server.

By the end of this tutorial, you'll have a fully functional PDF Image Extractor capable of recovering embedded images while preserving their original quality.

Why Extract Images from PDFs?
How Images Are Stored Inside PDF Files
Understanding Embedded Images vs Rendered Pages
Project Setup
What Libraries Are We Using?
Creating the Upload Interface
Previewing Uploaded PDF Pages
Finding Embedded Images
Extracting Images from PDF Pages
Displaying Extracted Images
Downloading Individual Images
Downloading All Images from the PDF
Demo: How the PDF Image Extractor Works
Performance Optimization Tips
Important Notes from Real-World Use
Common Mistakes to Avoid
Conclusion

Why Extract Images from PDFs?

Although PDF documents are primarily designed for sharing text-based information, they often contain valuable visual assets. Product catalogs include product photographs, annual reports contain charts and graphs, presentations use icons and illustrations, brochures showcase marketing banners, and technical manuals include diagrams and engineering drawings.

Without a dedicated image extraction tool, users often take screenshots or manually crop pages to save these visuals. Unfortunately, screenshots usually reduce image quality, introduce unwanted page elements, and require significant manual effort when working with large documents.

A PDF Image Extractor automates this process by identifying every embedded image inside the document and separating it from the surrounding page content. Instead of copying an entire page, users receive individual image files that can be downloaded and reused immediately.

This capability is extremely useful across many industries.

Graphic designers frequently receive client brochures, advertisements, and presentation files that contain logos, icons, or promotional graphics. Instead of recreating those assets manually, they can extract the original images directly from the PDF and continue working with high-quality source files.

Marketing teams often work with catalogs, product flyers, promotional leaflets, and campaign reports. Extracting product photographs or promotional graphics saves considerable time when creating social media posts, advertisements, landing pages, or newsletters.

Publishers and content creators regularly receive PDF magazines, ebooks, newsletters, and educational material containing illustrations and infographics. Individual images can be extracted and reused without manually cropping every page.

Researchers frequently download scientific papers containing graphs, charts, microscopy images, satellite photographs, and experimental diagrams. Image extraction allows them to save those visuals separately for presentations, publications, or further analysis.

Educational institutions use image extraction when preparing teaching material. Teachers can reuse diagrams, mathematical figures, scientific illustrations, historical maps, or educational graphics from reference documents without recreating them from scratch.

Government departments often maintain scanned archives containing seals, stamps, signatures, photographs, maps, engineering plans, and official diagrams. Extracting these images individually simplifies document digitization and archival workflows.

E-commerce businesses can also benefit significantly from image extraction. For example, a seller may receive a supplier catalog in PDF format containing hundreds of product photographs. Instead of requesting every image separately, the seller can extract all embedded product images from the catalog within minutes and reuse them while preparing listings for platforms such as Amazon, Flipkart, Meesho, Shopify, or WooCommerce.

Businesses working with invoices and purchase documents can also recover company logos, QR codes, signatures, and barcode images for document verification or automation systems.

Because this application performs every operation locally inside the browser, sensitive business documents remain private while users quickly recover all embedded images from their PDFs.

How Images Are Stored Inside PDF Files

Many people assume that a PDF page is simply a picture of the document. In reality, PDF files are much more sophisticated.

Each page inside a PDF is built from multiple independent objects. Text is stored separately using fonts and character information. Lines and shapes are represented as vector drawing instructions. Images are embedded as independent image objects that are placed at specific positions on the page.

This separation is one of the reasons PDFs remain flexible and efficient. A document may contain dozens of pages while reusing the same company logo or icon multiple times without storing duplicate copies.

When an image is embedded inside a PDF, the file usually preserves information such as the image dimensions, color space, compression method, and image format. Depending on how the document was created, embedded images may use formats such as JPEG, PNG, JPEG2000, CCITT, or other PDF-supported image encodings.

A PDF Image Extractor scans the internal structure of the document to locate these embedded image objects. Instead of capturing the entire page as a screenshot, it retrieves each image individually whenever possible.

This approach preserves much higher quality because the original embedded image is recovered rather than recreating it from the rendered page.

Understanding how PDF files store images also explains why some documents contain dozens of extractable images while others contain none at all. If a PDF consists entirely of vector graphics or text, there may be no embedded raster images available for extraction.

Knowing the difference between these document structures helps developers build more accurate PDF processing tools while helping users understand the capabilities and limitations of image extraction.

Understanding Embedded Images vs Rendered Pages

One of the most common misconceptions about PDF image extraction is that every visible picture on a page can always be extracted as a separate image.

In reality, there is an important difference between embedded images and rendered page images.

An embedded image is an independent object stored inside the PDF document. These images usually retain their original quality and can often be extracted without any loss of resolution.

A rendered page, on the other hand, is simply a visual representation of everything that appears on the page. When PDF.js displays a page inside the browser, it combines text, vector graphics, images, backgrounds, and shapes into a single canvas. Although this rendered page looks identical to the original document, it's no longer separated into individual components.

For example, imagine a product catalog containing a company logo, five product photographs, several icons, and descriptive text.

The PDF page preview displays everything together as one complete page. However, the PDF Image Extractor scans the document internally and identifies the individual logo, each product photograph, and every embedded icon separately. This allows users to download each image individually instead of cropping screenshots from the page preview.

Another important point is that not every visible graphic is actually an image.

Some company logos are created entirely using vector drawing commands. Charts may also be generated using vector graphics rather than bitmap images. Since these objects are not stored as raster images, they can't always be extracted using an image extraction tool.

Understanding this distinction helps explain why image extraction results may differ between documents even when they appear visually similar.

For developers, learning how embedded resources differ from rendered pages provides a much deeper understanding of PDF internals and browser-based document processing.

Project Setup

We'll build the PDF Image Extractor using standard web technologies so that the entire application runs directly inside the browser without requiring a backend server.

The project consists of a simple HTML file for the interface, a CSS file for styling, and a JavaScript file that handles PDF loading, page rendering, image extraction, and downloading.

Create the following project structure:

pdf-image-extractor/

│── index.html

│── style.css

│── script.js

│── assets/

After creating the project, include the required JavaScript libraries inside your index.html file:

Once these files are ready, the browser will have everything required to read PDF files, render document pages, inspect PDF objects, locate embedded images, and generate downloadable image files.

Keeping the project lightweight also makes it easier to understand each stage of the image extraction workflow.

What Libraries Are We Using?

Extracting images from PDF documents requires more than simply displaying PDF pages inside the browser. The application needs to load the document, inspect its internal structure, render preview pages, and recover embedded image objects.

To accomplish this, we'll use two JavaScript libraries.

The first library is PDF.js.

PDF.js is Mozilla's open-source PDF rendering engine. It allows browsers to load PDF documents without additional plugins and provides APIs for reading document pages, rendering previews, accessing page objects, and inspecting document resources.

In this project, PDF.js is responsible for loading the uploaded PDF and generating the page preview shown to the user before image extraction begins.

The second library is PDF-lib.

PDF-lib provides low-level access to PDF objects and document resources. Although it's widely used for editing PDF files, it's also useful when working with embedded objects and document manipulation. It complements PDF.js by giving developers additional flexibility when extending the application with future PDF editing features.

Together, these libraries allow us to create a browser-based image extraction workflow that is fast, secure, and completely client-side.

The following code initializes PDF.js:

pdfjsLib.GlobalWorkerOptions.workerSrc =

"https://cdnjs.cloudflare.com/ajax/libs/pdf.js/4.4.168/pdf.worker.min.js";

Loading the worker separately keeps PDF rendering responsive while large documents are processed.

Creating the Upload Interface

Every document processing application begins with uploading a file.

The upload interface is responsible for accepting PDF documents, validating the selected file, and preparing it for processing. A clean upload experience is important because it becomes the entry point for the entire application.

In this project, users can either drag a PDF onto the upload area or browse for a document using the standard file picker.

Once the file has been selected, the browser immediately verifies that it's a valid PDF before attempting to load it.

Supporting both drag-and-drop uploads and manual file selection provides a familiar experience across desktop and mobile devices.

The upload section also displays clear instructions so first-time users understand exactly how to begin extracting images.

Create the upload area using the following HTML:



    

        ☁

    

    Drag & Drop PDF Here

    Or click to browse file

Next, validate the uploaded document:

const file = pdfFile.files[0];

if(!file){

    return;

}

if(file.type !== "application/pdf"){

    alert("Please upload a valid PDF.");

    return;

}

After validation succeeds, the browser reads the uploaded file:

const buffer =

await file.arrayBuffer();

loadPDF(buffer);

At this point, the PDF has been loaded into memory and is ready for preview generation.

Previewing Uploaded PDF Pages

Before extracting images, users should first verify that they uploaded the correct document.

Instead of immediately scanning the PDF, the application generates thumbnail previews for every page. These previews help users confirm page order, inspect the document contents, and estimate where embedded images are located.

Preview generation is particularly useful for large reports, product catalogs, presentations, magazines, brochures, ebooks, and technical documentation containing dozens of pages.

Each preview is rendered using PDF.js and displayed inside a responsive grid layout.

First, load the uploaded document:

const pdf = await pdfjsLib

.getDocument({

    data:buffer

})

.promise;

Next, loop through every page:

for(

let pageNumber = 1;

pageNumber <= pdf.numPages;

pageNumber++

){

    renderPage(pageNumber);

}

Render the page as a canvas:

const page =

await pdf.getPage(pageNumber);

const viewport =

page.getViewport({

    scale:0.35

});

const canvas =

document.createElement("canvas");

canvas.width = viewport.width;

canvas.height = viewport.height;

await page.render({

    canvasContext:

    canvas.getContext("2d"),

    viewport

}).promise;

Finally, add the preview to the page:

previewContainer

.appendChild(canvas);

Once rendering is complete, every page becomes visible inside the preview area.

Users can scroll through the thumbnails and verify that the correct document has been uploaded before starting the extraction process.

Although the preview displays complete page images, no extraction has occurred yet. The next stage scans the internal PDF structure to locate every embedded image separately.

Why Previewing the PDF Matters

Previewing the uploaded document may seem like a small feature, but it greatly improves the overall user experience.

Without a preview, users have no way to verify that they selected the correct file before extraction begins. This becomes especially important when working with multiple PDFs that have similar filenames.

Page previews also help users estimate where images appear throughout the document. For example, a product catalog may contain photographs only on certain pages, while a technical report might include diagrams only within specific chapters.

By reviewing the page thumbnails first, users gain confidence that the document is correct before the application begins scanning for embedded images.

This simple verification step reduces mistakes, avoids unnecessary processing, and makes the overall workflow feel much more intuitive.

Finding Embedded Images

Once the PDF pages have been rendered and displayed in the preview section, the application is ready to search for embedded images.

This stage is different from generating page previews. The preview simply renders each page as a complete visual image, while the extraction process examines the internal structure of the PDF to locate every embedded image object stored inside the document.

Each page is scanned individually. If embedded images are found, the application records their location, dimensions, image format, and page number before preparing them for extraction.

This page-by-page approach makes it easier to organize the extracted images later and allows users to understand exactly where each image originated within the document.

Scanning only begins after users click the Extract Images button, ensuring that unnecessary processing is avoided if they simply want to preview the document.

First, create the click event for the extraction button:

document

.getElementById(

"extractBtn"

)

.addEventListener(

"click",

extractImages

);

Next, loop through every page inside the uploaded PDF:

for(

let pageNumber = 1;

pageNumber <= pdf.numPages;

pageNumber++

){

    const page =

    await pdf.getPage(

    pageNumber

    );

}

Read the page operator list:

const operatorList =

await page.getOperatorList();

Now inspect every drawing operation to determine whether it contains an embedded image:

operatorList.fnArray

.forEach(operation=>{

    if(

    operation ===

    pdfjsLib.OPS.paintImageXObject

    ){

        console.log(

        "Image Found"

        );

    }

});

After every page has been scanned, the application creates a collection containing all discovered images grouped by page.

This collection becomes the foundation for the extraction process that follows.

Extracting Images from PDF Pages

Once embedded image objects have been identified, the application begins extracting them from the PDF.

Unlike taking screenshots of an entire page, this method retrieves each embedded image individually whenever possible. As a result, the extracted images preserve their original quality, dimensions, and compression rather than inheriting the resolution of the page preview.

Each image is assigned to the page where it was found. Organizing images this way makes it much easier for users to locate graphics inside large reports, magazines, brochures, presentations, catalogs, technical manuals, and research papers.

As each page finishes processing, its extracted images are stored inside a JavaScript array before being displayed inside the browser.

Create an array for storing extracted images:

const extractedImages = [];

Save every discovered image:

extractedImages.push({

    page:

    pageNumber,

    image:

    imageData,

    type:

    imageType

});

Each stored object contains useful information that will later be displayed to the user.

The extraction routine continues until every page inside the uploaded PDF has been inspected.

Once complete, the browser immediately displays the extracted images without requiring another processing step.

Displaying Extracted Images

After extraction finishes, the application presents every recovered image inside an organized gallery.

Instead of displaying one long collection of images, the results are grouped according to the page from which they were extracted. This organization makes it much easier to understand the original document structure.

For every extracted image, the application displays a preview together with useful technical information.

Users can immediately see the image dimensions, image format, and the page where the image originated. This additional information helps determine whether an image is suitable for reuse before downloading it.

For example, a high-resolution product photograph may be useful for marketing material, while a small company logo may only be appropriate for branding purposes.

Loop through every extracted image:

extractedImages.forEach(image=>{

    renderImageCard(

    image

    );

});

Create the preview card:

const card =

document.createElement(

"div"

);

card.className =

"image-card";

Insert the preview:

const img =

document.createElement(

"img"

);

img.src =

image.image;

Display the image information:

details.innerHTML =

`

Dims:

${image.width} × ${image.height}




Type:

${image.type}

`;

Once every card has been created, the gallery displays all extracted images grouped beneath their respective pages.

This layout provides a clean overview of every image contained inside the uploaded PDF while making individual downloads straightforward.

Downloading Individual Images

Many users don't need every image contained inside a PDF.

For example, a designer may only want the company logo from a brochure, while a researcher may only need one graph from a scientific paper.

To support these workflows, every extracted image includes its own download button.

When clicked, the browser downloads only the selected image without affecting any of the remaining extracted images.

This allows users to quickly save exactly the graphics they need.

Create a download button:

const button =

document.createElement(

"button"

);

button.innerText =

"Download";

Attach the download event:

button.onclick = ()=>{

    downloadImage(

    image

    );

};

Generate the download:

const link =

document.createElement(

"a"

);

link.href =

image.image;

link.download =

image.fileName;

link.click();

Providing separate download buttons makes the application much more flexible because users can save only the images they actually need instead of downloading every extracted asset.

Downloading All Images from the PDF

Large PDF documents often contain dozens or even hundreds of embedded images.

Downloading every image individually would be both slow and repetitive.

To simplify this workflow, the application also includes a Download All Images from PDF button.

After extraction has completed, clicking this button automatically downloads every extracted image from every page.

This feature is particularly useful when working with supplier catalogs, product brochures, magazines, annual reports, technical documentation, ebooks, educational material, and presentation files containing many graphics.

Loop through every extracted image:

extractedImages.forEach(image=>{

    downloadImage(

    image

    );

});

Attach the click event:

downloadAllButton

.addEventListener(

"click",

downloadAllImages

);

Finally, allow users to begin another extraction:

startOverButton

.addEventListener(

"click",

resetApplication

);

After the downloads have completed, users can click Start Over to upload another PDF and repeat the extraction process without refreshing the browser.

Demo: How the PDF Image Extractor Works

Step 1: Upload Your PDF Document

The image extraction workflow begins by uploading a PDF document using either the drag-and-drop area or the file picker.

Once a document has been selected, the browser validates that the uploaded file is a PDF before reading it into memory. Since the application performs all processing locally, the uploaded document never leaves the user's computer, making the tool suitable for confidential business reports, catalogs, presentations, contracts, technical manuals, research papers, and other sensitive documents.

After the PDF has been loaded successfully, the application prepares every page for preview generation before image extraction begins.

Step 2: Preview Uploaded PDF Pages

After the upload is complete, the application renders thumbnail previews for every page inside the document.

Instead of immediately scanning the PDF for images, users first receive a visual overview of the entire document. This allows them to verify that they selected the correct PDF and quickly identify which pages contain photographs, diagrams, illustrations, charts, or other graphics.

Page previews are especially useful when working with large product catalogs, magazines, annual reports, brochures, technical documentation, educational books, and research papers containing dozens or even hundreds of pages.

This verification step helps prevent unnecessary processing and improves the overall user experience.

Step 3: Extract Embedded Images

Once the document has been verified, users click the Extract Images button to begin scanning the PDF.

The application examines every page individually, searching for embedded image objects stored inside the document. Unlike screenshots or page rendering, the extractor retrieves the original image resources whenever possible, preserving their quality and dimensions.

As each page is processed, every discovered image is grouped according to the page where it originally appeared. This organization makes it much easier to browse the extracted results later.

Depending on the size of the PDF and the number of embedded images, extraction may take a few seconds for larger documents.

Step 4: Review the Extracted Images

After extraction is complete, the browser displays every recovered image inside an organized gallery.

Instead of showing one large collection of images, the application groups the results page by page. This allows users to understand exactly where each image came from within the original document.

Every image card displays a preview together with useful information such as the page number, image dimensions, and image format. This helps users quickly identify the graphics they need before downloading anything.

For example, a brochure may contain company logos, banners, icons, and product photographs spread across several pages. Grouping images by page makes navigating these documents much easier.

Step 5: Download Individual Images or Pagewise

Each extracted image includes its own Download Image button.

This feature is useful when users only need one or two graphics from a large document. Instead of downloading every extracted image, they can save only the specific illustrations, charts, product photographs, or logos that are relevant to their work.

For example, a designer may only need a company logo, while a marketing team may only want product images from a supplier catalog. Individual downloads eliminate unnecessary files and simplify the workflow.

After clicking the download button, the browser immediately saves the selected image without requiring any additional processing.

Step 6: Download Every Image

For users who need all graphics contained inside the document, the application also provides a Download All Images from PDF button.

After extraction has finished, clicking this button automatically downloads every recovered image from every page. This saves considerable time compared to downloading each image individually.

This feature is particularly useful when processing large product catalogs, marketing brochures, presentation decks, educational books, technical manuals, magazines, supplier catalogs, or company reports containing dozens of embedded images.

Once the downloads are complete, users can click Start Over to clear the current session and upload another PDF without refreshing the page.

Step 7: Start a New Extraction

After downloading the required images, users can begin working with another PDF document.

Clicking Start Over clears the uploaded document, removes every generated preview, resets the extracted image gallery, and restores the application to its initial state.

This allows users to process multiple PDF files during the same session without reloading the browser or reopening the application.

The reset process is completed instantly, making the workflow smooth and efficient when working with many PDF documents throughout the day.

Performance Optimization Tips

Image extraction is generally faster than OCR because the application recovers existing image objects instead of recognizing characters. But large PDF documents containing hundreds of pages or high-resolution graphics can still require significant processing time.

One simple optimization is to process pages sequentially instead of attempting to analyze every page simultaneously.

for(

let page = 1;

page <= pdf.numPages;

page++

){

    await extractPageImages(page);

}

Loading only the required page into memory reduces browser memory usage and improves stability when processing large documents.

If the application supports page selection, allowing users to extract images from only specific pages can greatly reduce processing time for large catalogs or reports.

const startPage = 10;

const endPage = 25;

After images have been downloaded, release any temporary browser resources:

URL.revokeObjectURL(

imageURL

);

Finally, clear the extracted image collection before processing another PDF:

extractedImages.length = 0;

These small optimizations help the application remain responsive even when working with documents containing hundreds of embedded graphics.

Important Notes from Real-World Use

Not every PDF contains embedded images.

Some documents consist entirely of text and vector graphics, while others may contain scanned pages that appear as a single full-page image. Understanding how the original PDF was created helps set realistic expectations before extraction begins.

Always validate uploaded files before processing.

if(

file.type !== "application/pdf"

){

    alert(

    "Please upload a valid PDF."

    );

}

Some PDF creators compress embedded images heavily to reduce file size. In those situations, the extracted images will match the quality stored inside the PDF, but they can't be improved beyond the original resolution.

Users should also verify extraction results before downloading every image, particularly when processing large reports or catalogs containing hundreds of graphics.

Because the application performs all operations locally inside the browser, confidential documents remain private throughout the extraction process. This makes browser-based image extraction suitable for business reports, engineering drawings, financial documents, legal records, educational resources, and other sensitive PDFs.

Common Mistakes to Avoid

One common mistake is assuming every visible object inside a PDF is an extractable image.

Many diagrams, logos, and charts are actually vector graphics rather than raster images. These elements are rendered by drawing commands and can't always be extracted as standalone image files.

Another mistake is relying on screenshots instead of extracting embedded images.

Screenshots capture only the rendered page displayed on the screen, which often reduces image quality and includes unnecessary page elements.

Always verify that embedded images have been detected before displaying the results.

if(

extractedImages.length === 0

){

    alert(

    "No embedded images found."

    );

}

Some users also forget to organize extracted images by page.

Grouping images according to their original page location makes it much easier to navigate large documents containing dozens or hundreds of graphics.

Finally, always review the extracted images before downloading them.

Checking the preview allows users to confirm image quality, dimensions, and page location before saving the files.

Conclusion

In this tutorial, you built a browser-based PDF Image Extractor using JavaScript.

You learned how to upload PDF files, preview document pages, locate embedded image objects, extract images while preserving their original quality, organize results page by page, download individual images, and download every extracted image directly from the browser.

More importantly, you learned the difference between embedded images and rendered page previews, giving you a better understanding of how PDF documents are structured internally.

Because the entire workflow runs locally inside the browser, users can safely recover graphics from confidential PDF documents without uploading them to external servers.

You can try the complete implementation here:

PDF Image Extractor: https://allinonetools.net/extract-images-from-pdf/

Once you understand this workflow, you can extend the project further by adding duplicate image detection, automatic image compression, AI-powered image tagging, background removal, image format conversion, OCR on extracted images, watermark detection, or bulk asset management features.

How to Build a Production-Safe Agent Loop: From Exit Conditions to Audit Trails

Daniel Nwaneri — Mon, 15 Jun 2026 23:18:49 +0000

In July 2025, a Claude Code recursion loop burned between 16,000 USD and 50,000 USD in five hours. There was no crash or error, just agents doing exactly what they were told, indefinitely, because nobody told them when to stop.

Four months later, a four-agent LangChain loop ran for eleven days and cost 47,000 USD. Nobody noticed until the invoice arrived. The pipeline worked correctly in testing, and the agents were doing exactly what they were told. Same pattern.

This tutorial is about that missing instruction.

You'll build five small Python primitives that catch most agent loop failures before they ship:

A spec writer that forces you to define done before the loop starts
A circuit breaker that kills the loop when it exceeds hard limits
A ledger that records every turn in an append-only SQLite audit trail
An agent loop that ties all three together
A review surface that forces human attestation before downstream systems receive anything

By the end you'll have a working repo you can drop into any agent project. The full code is at github.com/dannwaneri/production-safe-agent-loop.

Why This Keeps Happening
Prerequisites
Phase 1: Define Done Before You Build
Phase 2: Enforce Done at Runtime
Phase 3: Record Everything
Phase 4: The Loop That Respects Its Boundaries
Phase 5: The Review Surface
Phase 6: A Real Example, SEO Audit Agent
Pluggable LLM Client
Running the Tests
What You've Built
Next Steps

Why This Keeps Happening

The math that got companies into trouble was simple. A chatbot costs roughly 0.04 USD per interaction. An orchestrated multi-agent workflow costs 1.20 USD. That's a 30x multiplier — and production benchmarks show it can reach 70x on complex tasks.

The problem isn't that agents are expensive. The problem is that most teams budgeted for chatbot costs and deployed agent architectures. Gartner found the token consumption gap between pilot chatbots and production agent workflows sits at 5-30x. The FinOps Foundation's 2026 State of FinOps report found 73% of enterprises say AI costs exceeded original projections.

The mechanism is straightforward once you see it. When an agent fails a task and retries, it doesn't start fresh. It re-reads the entire context window — every prior failed attempt — before trying again. Iteration one costs 100 tokens. Iteration two costs 200. Iteration ten costs thousands. You're paying for every failure, over and over, in milliseconds.

# This is the entire problem in three lines
while True:
    result = agent.run(task)
    # done when...?

That question mark is where the money goes.

The other thing making it worse: agents don't fail loudly. Traditional code hits an undefined state and crashes. An LLM hits ambiguity and tries to be helpful. It retries. It reformats the tool call. It spins up a verification agent. The verification agent finds something. A correction agent fires. Nobody defined what "correct" means. The loop looks beautiful on every dashboard you have — activity, tool calls, completion rate — while quietly burning through your budget.

Gartner predicts that 40% of agentic projects will be scrapped by 2027 due to economic failure. Most of that failure is preventable. Not with better models, but with exit conditions.

Prerequisites

Python 3.10+
An Anthropic API key (or any provider — more on that later)
Basic familiarity with Python classes and SQLite

git clone https://github.com/dannwaneri/production-safe-agent-loop
cd production-safe-agent-loop
pip install -r requirements.txt
export ANTHROPIC_API_KEY=sk-...

Phase 1: Define Done Before You Build

The most expensive mistake in agent development isn't a bad model choice or a missing retry limit. It's starting the build before you can answer one question in one sentence:

What does done look like?

Most teams can't answer it. Not because they're careless, but because nothing forces them to before they open the terminal. The spec writer is that forcing function.

# spec_writer.py
from spec_writer import SpecWriter

spec = SpecWriter(db_path="spec.db").run()

When you call .run(), it won't return until you've answered three questions:

What does this do?
What does this NOT do?
What does done look like in one sentence?

The third question is the one that matters. It's also the hardest. "The agent audits the site" is not an answer. "The agent crawls the target URL, extracts all </code> and <code><meta description></code> tags, flags any missing or over-length, and stops" is an answer. One of those gives the circuit breaker something to enforce.</p> <p>The spec stores to SQLite and returns a <code>SpecResult</code> dataclass with a <code>session_id</code>. That ID becomes the thread connecting your spec, your ledger rows, and your loop result. One session, traceable end to end.</p> <pre><code class="language-python">@dataclass(frozen=True) class SpecResult: what_it_does: str what_it_does_not: str done_looks_like: str session_id: str </code></pre> <p><code>frozen=True</code> matters. The spec is a commitment, not a draft. Once it's written, the loop runs against it. No mid-run revisions.</p> <p>For testing, <code>SpecWriter</code> accepts injectable <code>input_fn</code> and <code>output_fn</code> callables. No stdin monkey-patching required. See <code>tests/test_spec_writer.py</code> for working examples — the suite uses a small <code>scripted_input</code> helper that returns answers from a generator, and writes to a per-test SQLite file via pytest's <code>tmp_path</code> fixture. SQLite's <code>:memory:</code> isn't safe here, because <code>SpecWriter</code> opens a fresh connection per method and each <code>:memory:</code> connection is its own isolated database.</p> <h2 id="heading-phase-2-enforce-done-at-runtime">Phase 2: Enforce Done at Runtime</h2> <p>Defining the exit condition upstream is discipline. The circuit breaker is enforcement.</p> <pre><code class="language-python"># circuit_breaker.py from circuit_breaker import CircuitBreaker, CircuitBreakerError breaker = CircuitBreaker(turn_limit=5, token_limit=15000) breaker.check(turn_count, accumulated_tokens) # raises on breach </code></pre> <p>Two ceilings. Both hard.</p> <p><code>turn_limit</code> caps how many times the loop can call the LLM. <code>token_limit</code> caps total token consumption across all turns. Either one tripping raises <code>CircuitBreakerError</code> immediately.</p> <p>The boundary is strict: <code>turn_count == turn_limit</code> is allowed. <code>turn_count == turn_limit + 1</code> trips. No grace periods or warnings. A hard stop forces a human checkpoint.</p> <pre><code class="language-python">from dataclasses import dataclass @dataclass class CircuitBreakerError(Exception): reason: str # "turn_ceiling" or "token_ceiling" turn_count: int accumulated_tokens: int def __post_init__(self) -> None: super().__init__( f"circuit breaker tripped: {self.reason} " f"(turn={self.turn_count}, tokens={self.accumulated_tokens})" ) class CircuitBreaker: def __init__(self, turn_limit: int = 5, token_limit: int = 15000) -> None: self.turn_limit = turn_limit self.token_limit = token_limit def check(self, turn_count: int, accumulated_tokens: int) -> None: if turn_count > self.turn_limit: self._trip("turn_ceiling", turn_count, accumulated_tokens) if accumulated_tokens > self.token_limit: self._trip("token_ceiling", turn_count, accumulated_tokens) def _trip(self, reason: str, turn_count: int, accumulated_tokens: int) -> None: print( "\n=== CIRCUIT BREAKER CHECKPOINT ===\n" f"reason : {reason}\n" f"turn_count : {turn_count} / limit {self.turn_limit}\n" f"tokens_used : {accumulated_tokens} / limit {self.token_limit}\n" "action : halt loop, surface to human reviewer\n" "==================================" ) raise CircuitBreakerError( reason=reason, turn_count=turn_count, accumulated_tokens=accumulated_tokens, ) </code></pre> <p><code>CircuitBreakerError</code> is an exception, not a return code. That's intentional. A return code can be ignored. An uncaught exception can't. Silent breach is impossible. The human-readable checkpoint banner is printed to stdout by <code>_trip()</code> <em>before</em> the exception is raised, so even if a caller swallows the exception the operator still sees state.</p> <p>The critical rule: call <code>.check()</code> <strong>before</strong> every LLM call, not after. Post-flight checking means you've already burned the tokens before you knew the limit was exceeded.</p> <pre><code class="language-python"># Wrong — post-flight result = client.messages.create(...) breaker.check(turn_count, accumulated_tokens) # too late # Right — pre-flight breaker.check(turn_count, accumulated_tokens) # raises before any spend result = client.messages.create(...) </code></pre> <p>The defaults (5 turns, 15,000 tokens) match a tight tutorial demo. Your production budget is different. Tune at instantiation:</p> <pre><code class="language-python"># Production example — tighter token budget, more turns breaker = CircuitBreaker(turn_limit=10, token_limit=50000) </code></pre> <h2 id="heading-phase-3-record-everything">Phase 3: Record Everything</h2> <p>The circuit breaker protects your bank account. The ledger protects your understanding of what happened.</p> <p>Most teams log for debugging — they want to know what went wrong after it went wrong. The ledger has a different purpose. It's governance. Every row is proof that the loop stayed within its boundaries, or didn't, and exactly when.</p> <pre><code class="language-python"># ledger.py from ledger import Ledger ledger = Ledger(db_path="ledger.db") ledger.write( session_id=spec.session_id, turn_count=1, state_origin="llm", input_str=task, token_delta=523, execution_time_ms=1240, pass_fail=True, ) </code></pre> <p>One row per turn. Append-only, no updates, and no deletes. The immutability is the point: a ledger you can edit isn't a ledger, it's a notebook.</p> <p>The schema:</p> <pre><code class="language-sql">CREATE TABLE IF NOT EXISTS ledger ( id INTEGER PRIMARY KEY AUTOINCREMENT, session_id TEXT NOT NULL, turn_count INTEGER NOT NULL, state_origin TEXT NOT NULL, input_hash TEXT NOT NULL, token_delta INTEGER NOT NULL, execution_time_ms INTEGER NOT NULL, pass_fail INTEGER NOT NULL, -- 1=pass, 0=fail breach_reason TEXT, -- NULL unless circuit breaker fired created_at TEXT NOT NULL -- ISO 8601, UTC ); CREATE INDEX IF NOT EXISTS idx_ledger_session ON ledger(session_id); </code></pre> <p>The index makes <code>get_session(session_id)</code> — the primary read path — a constant-time lookup as the ledger grows.</p> <p>Three decisions worth explaining:</p> <ol> <li><p><code>input_hash</code> <strong>not</strong> <code>input_text</code><strong>.</strong> The raw input string never persists. Only its SHA-256 hash does. There are two benefits to this: identical inputs across runs are detectable, and PII never enters the audit trail.</p> </li> <li><p><code>pass_fail</code> <strong>as</strong> <code>INTEGER</code> <strong>not</strong> <code>BOOLEAN</code><strong>.</strong> SQLite has no boolean type. <code>1</code> and <code>0</code> are canonical. Clean Python ergonomics at the API edge, correct SQL types on disk.</p> </li> <li><p><code>created_at</code> <strong>as</strong> <code>datetime.now(timezone.utc).isoformat()</code><strong>.</strong> <code>datetime.utcnow()</code> was deprecated in Python 3.12. Timezone-aware timestamps avoid the footgun in any system that crosses timezones.</p> </li> </ol> <p>Retrieve by session:</p> <pre><code class="language-python">rows = ledger.get_session(spec.session_id) for row in rows: print(f"Turn {row.turn_count}: {'PASS' if row.pass_fail else 'FAIL'} " f"| {row.token_delta} tokens | {row.execution_time_ms}ms") </code></pre> <h2 id="heading-phase-4-the-loop-that-respects-its-boundaries">Phase 4: The Loop That Respects Its Boundaries</h2> <p>The agent loop wires the three primitives together. It's the only component that calls the LLM. Everything else is local.</p> <pre><code class="language-python"># agent_loop.py from agent_loop import AgentLoop loop = AgentLoop(spec, breaker, ledger, client) result = loop.run(task) # LoopResult(success, turns, total_tokens, session_id, breach_reason) </code></pre> <p>The anatomy of a turn, in order:</p> <ol> <li><p><code>circuit_breaker.check(turn_count, accumulated_tokens)</code> — raises if either ceiling is exceeded</p> </li> <li><p><code>client.messages.create(...)</code> — the actual LLM call</p> </li> <li><p><code>ledger.write(...)</code> — one row, append-only</p> </li> <li><p>If <code>stop_reason == "end_turn"</code>, return. Otherwise loop.</p> </li> </ol> <p>Pre-flight checking before every LLM call, with no exceptions.</p> <pre><code class="language-python">def run(self, task: str) -> LoopResult: session_id = self.spec.session_id messages: list[dict] = [{"role": "user", "content": task}] turn = 0 total_tokens = 0 try: while True: turn += 1 self.circuit_breaker.check(turn, total_tokens) started = time.perf_counter() response = self.client.messages.create( model=self.model, max_tokens=self.max_tokens, system=self._system_prompt(), messages=messages, ) elapsed_ms = int((time.perf_counter() - started) * 1000) turn_tokens = ( getattr(response.usage, "input_tokens", 0) + getattr(response.usage, "output_tokens", 0) ) total_tokens += turn_tokens text = self._text_from(response) messages.append({"role": "assistant", "content": text}) self.ledger.write( session_id=session_id, turn_count=turn, state_origin="llm", input_str=task, token_delta=turn_tokens, execution_time_ms=elapsed_ms, pass_fail=True, ) if getattr(response, "stop_reason", "end_turn") == "end_turn": return LoopResult( success=True, turns=turn, total_tokens=total_tokens, session_id=session_id, ) messages.append({"role": "user", "content": "continue"}) except CircuitBreakerError as err: self.ledger.write( session_id=session_id, turn_count=turn, state_origin="circuit_breaker", input_str=task, token_delta=0, execution_time_ms=0, pass_fail=False, breach_reason=err.reason, ) return LoopResult( success=False, turns=turn, total_tokens=total_tokens, session_id=session_id, breach_reason=err.reason, ) def _system_prompt(self) -> str: return ( "You are an agent working on a tightly-scoped task.\n\n" f"What this does: {self.spec.what_it_does}\n" f"What this does NOT do: {self.spec.what_it_does_not}\n" f"Done looks like: {self.spec.done_looks_like}\n" ) @staticmethod def _text_from(response) -> str: content = getattr(response, "content", None) if not content: return "" block = content[0] return getattr(block, "text", "") or "" </code></pre> <p>A few choices worth calling out in this body:</p> <ul> <li><p><strong>The whole</strong> <code>while True:</code> <strong>is wrapped in one</strong> <code>try/except CircuitBreakerError</code><strong>.</strong> The check happens at the top of every turn, so a breach is caught the same way whether it fires on turn 1 or turn 6.</p> </li> <li><p><code>input_str=task</code> on every ledger row — the original task, not the last assistant message. The <code>input_hash</code> column then groups rows that share the same starting input across the run.</p> </li> <li><p><code>pass_fail=True</code> <strong>for every LLM turn that returns</strong>, <code>False</code> only on breach. The pass/fail flag tracks whether the loop <em>reached</em> the row legitimately, not whether the model's output was good. Quality scoring is a separate concern.</p> </li> <li><p><code>_system_prompt()</code> <strong>uses all three spec fields</strong>, not just <code>done_looks_like</code>. The model needs the negative scope (<code>what_it_does_not</code>) at least as much as the positive scope.</p> </li> <li><p><code>time.perf_counter()</code> <strong>not</strong> <code>time.time()</code> — monotonic, immune to wall-clock adjustments mid-run.</p> </li> </ul> <p><code>LoopResult.session_id</code> is inherited from <code>spec.session_id</code>. The ledger rows tie back to the spec without a join. One session ID, one traceable run, start to finish.</p> <h2 id="heading-phase-5-the-review-surface">Phase 5: The Review Surface</h2> <p>The circuit breaker protects your bank account. The ledger records what happened. But neither tells you whether what happened matched what you promised.</p> <p>That gap is where bad loops get approved. Polished output, green dashboard, missed commitment. A reviewer sees the artifact, decides it looks acceptable, and signs off. Nobody asked whether the original promise was kept.</p> <p>The review surface closes that gap. It reads the session from SQLite, assembles the five-element frame, and forces a comparison before anything downstream receives the output.</p> <pre><code class="language-python">from review_surface import ReviewSurface rs = ReviewSurface(spec_db_path="spec.db", ledger_db_path="ledger.db") print(rs.render(session_id)) </code></pre> <p>Here's the five-element frame, in order:</p> <ol> <li><p><strong>Original promise</strong> — pulled from the spec table: what it does, what it doesn't do, what done looks like</p> </li> <li><p><strong>Acceptance criteria</strong> — the <code>done_looks_like</code> field rendered as the explicit benchmark</p> </li> <li><p><strong>Diff</strong> — first turn input vs final turn output, turns completed, total tokens, whether the loop breached</p> </li> <li><p><strong>Evidence</strong> — all ledger rows for the session: turn-by-turn pass/fail, token delta, execution time</p> </li> <li><p><strong>Unresolved assumptions</strong> — derived from breach rows and failed turns. Empty when clean.</p> </li> </ol> <p>When the reviewer is satisfied, they attest:</p> <pre><code class="language-python">attestation = rs.attest( session_id=result.session_id, reviewer="daniel", notes="Output matches spec. Approved." ) print(attestation.frame_hash) </code></pre> <p><code>.attest()</code> writes to the <code>attestations</code> table in <code>ledger.db</code>. The <code>frame_hash</code> is a SHA-256 of the canonical frame data — deterministic across reviewers attesting the same session. It's the audit receipt. It proves the reviewer saw the exact frame as rendered, not a summary or a paraphrase.</p> <p>Approval confirms the process ran. Attestation confirms the reviewer compared output to commitment. When the loop touches something regulated, those are different legal documents.</p> <pre><code class="language-python">@dataclass(frozen=True) class ReviewFrame: session_id: str original_promise: SpecResult acceptance_criteria: str diff: DiffResult evidence: tuple # tuple[LedgerRow, ...] unresolved_assumptions: tuple # tuple[str, ...] created_at: str </code></pre> <p><code>ReviewFrame</code> is frozen for the same reason <code>SpecResult</code> is — the frame is evidence, not a draft. <code>evidence</code> and <code>unresolved_assumptions</code> are tuples because lists aren't hashable and frozen dataclasses need hashable fields.</p> <p>The full end-to-end flow with the review surface lives in <code>examples/review_example.py</code> in the repo. Run it after any completed session: it renders the five-element frame, prompts for attestation, and writes the receipt if you approve.</p> <p>The loop runs to you. Downstream systems get nothing until someone signs.</p> <h2 id="heading-phase-6-a-real-example-seo-audit-agent">Phase 6: A Real Example — SEO Audit Agent</h2> <p>The pattern only makes sense against a real problem. This is the same agent architecture behind my <a href="https://github.com/dannwaneri/seo-agent">seo-agent</a> project.</p> <p>SEO audits have a natural cadence: crawl, surface what's broken, fix, wait for reindex. Running the agent continuously doesn't change that cadence. It just burns tokens in the empty space between the moments that matter. A cron job wired to the loop is the honest architecture.</p> <pre><code class="language-python"># examples/seo_audit_example.py import requests from bs4 import BeautifulSoup import anthropic from spec_writer import SpecWriter from circuit_breaker import CircuitBreaker from ledger import Ledger from agent_loop import AgentLoop def crawl_url(url: str) -> str: response = requests.get(url, timeout=10) soup = BeautifulSoup(response.text, "html.parser") title = soup.find("title") meta_desc = soup.find("meta", attrs={"name": "description"}) h1_tags = soup.find_all("h1") return ( f"URL: {url}\n" f"Title: {title.text if title else 'MISSING'}\n" f"Meta description: " f"{meta_desc['content'] if meta_desc else 'MISSING'}\n" f"H1 count: {len(h1_tags)}\n" f"H1 tags: {[h.text[:50] for h in h1_tags]}" ) def run_seo_audit(url: str) -> None: # Step 1: Define done before the loop starts spec = SpecWriter(db_path="spec.db").run() # Step 2: Initialise circuit breaker and ledger breaker = CircuitBreaker(turn_limit=5, token_limit=15000) ledger = Ledger(db_path="ledger.db") client = anthropic.Anthropic() # Step 3: Crawl the URL site_data = crawl_url(url) # Step 4: Run the loop # AgentLoop catches CircuitBreakerError internally and returns # LoopResult(success=False, breach_reason=...). Branch on the # result — do NOT wrap loop.run() in try/except CircuitBreakerError. loop = AgentLoop(spec, breaker, ledger, client) result = loop.run( f"Audit this page for SEO issues:\n\n{site_data}" ) # Step 5: Print the ledger print(f"\nResult: {'SUCCESS' if result.success else 'BREACH'}") if not result.success: print(f"Breach reason: {result.breach_reason}") print(f"Turns: {result.turns} | Tokens: {result.total_tokens}") print("\nAudit trail:") for row in ledger.get_session(result.session_id): status = "PASS" if row.pass_fail else "FAIL" print(f" Turn {row.turn_count}: {status} | " f"{row.token_delta} tokens | {row.execution_time_ms}ms") if __name__ == "__main__": import sys run_seo_audit(sys.argv[1] if len(sys.argv) > 1 else "https://example.com") </code></pre> <p>Run it:</p> <pre><code class="language-bash">python examples/seo_audit_example.py https://yourdomain.com </code></pre> <p>The spec writer prompts you. The loop runs, the circuit breaker fires if the limits are exceeded, and the ledger records every turn. The output lands in front of you and you decide what to fix.</p> <p>The loop runs to you, not into a void.</p> <h2 id="heading-pluggable-llm-client">Pluggable LLM Client</h2> <p>The loop works with any client that satisfies the <code>LLMClient</code> protocol (Anthropic by default). Bring your own via a ~20-line adapter.</p> <pre><code class="language-python"># agent_loop.py from typing import Protocol, runtime_checkable @runtime_checkable class MessagesEndpoint(Protocol): def create(self, *, model: str, max_tokens: int, system: str, messages: list) -> object: ... @runtime_checkable class LLMClient(Protocol): messages: MessagesEndpoint </code></pre> <p><code>messages</code> is an instance attribute (not a nested class) because that's how the real Anthropic SDK exposes it — <code>anthropic.Anthropic().messages.create(...)</code>. Modeling it as a nested class would mean the real client wouldn't satisfy the Protocol. The <code>@runtime_checkable</code> decorator lets you sanity-check conformance with <code>isinstance(client, LLMClient)</code>, and the repo's test suite uses exactly that assertion against the <code>FakeClient</code> test double.</p> <p>Here's an OpenAI adapter example (This is illustrative. A production adapter would also map streaming, tool-use, and error shapes.):</p> <pre><code class="language-python"># openai_adapter.py — illustrative pseudocode, not production-ready. from openai import OpenAI as _OpenAI class _MessagesAdapter: def __init__(self, client): self._client = client def create(self, *, model, max_tokens, system, messages): completion = self._client.chat.completions.create( model=model, max_tokens=max_tokens, messages=[{"role": "system", "content": system}] + messages, ) # Reshape OpenAI's response into the Anthropic-shaped surface # AgentLoop reads: response.usage.{input,output}_tokens, # response.content[0].text, response.stop_reason. return _adapt_response(completion) class OpenAIAdapter: def __init__(self, api_key: str): self._client = _OpenAI(api_key=api_key) self.messages = _MessagesAdapter(self._client) # instance attr, not a nested class </code></pre> <p>The adapter pattern is worth teaching explicitly. Provider APIs don't share a shape. Anthropic puts <code>system</code> at the top level. OpenAI puts it inside the messages array. An adapter shim is ~20 lines and makes the loop provider-agnostic without rewriting anything. Note that <code>self.messages</code> is assigned in <code>__init__</code> so it's a real attribute on each adapter instance, the same shape as the actual SDK.</p> <h2 id="heading-running-the-tests">Running the Tests</h2> <pre><code class="language-bash">python -m pytest tests/ </code></pre> <p>With coverage:</p> <pre><code class="language-bash">python -m coverage run --source=circuit_breaker,ledger,spec_writer,agent_loop,review_surface -m pytest tests/ python -m coverage report -m </code></pre> <p>80 tests, 100% coverage on all five core modules. The loop is exercised against a <code>FakeClient</code> test double defined inline in <code>tests/test_agent_loop.py</code>. It satisfies the <code>LLMClient</code> protocol via duck typing: <code>messages</code> is set to <code>self</code>, so <code>client.messages.create(...)</code> routes back to the same object and ships with scripted responses for each test scenario. Clone the repo and run <code>pytest</code> to see all 80 tests pass without touching the network or needing an API key.</p> <p><code>circuit_breaker.py</code> has 100% coverage — no untested paths. It's the financial safety component. Every path through it is exercised.</p> <h2 id="heading-what-youve-built">What You've Built</h2> <p>In this tutorial, you've build five small primitives, each independently usable.</p> <table> <thead> <tr> <th>Module</th> <th>Role</th> <th>Lines</th> </tr> </thead> <tbody><tr> <td><code>spec_writer.py</code></td> <td>Forces three answers before the loop runs</td> <td>104</td> </tr> <tr> <td><code>circuit_breaker.py</code></td> <td>Hard ceilings on turns and tokens</td> <td>41</td> </tr> <tr> <td><code>ledger.py</code></td> <td>Append-only SQLite audit trail</td> <td>113</td> </tr> <tr> <td><code>agent_loop.py</code></td> <td>The loop that respects both</td> <td>128</td> </tr> <tr> <td><code>review_surface.py</code></td> <td>Assembles the five-element frame, records human attestation</td> <td>114</td> </tr> </tbody></table> <p>The pattern: upstream discipline defines the boundaries. Downstream enforcement breaks the circuit. Neither trusts the model to police itself.</p> <p>A loop that runs without an exit condition isn't autonomous. It's a billing event waiting to happen.</p> <p>Define what done looks like before you start. That's the job, and always has been.</p> <h2 id="heading-next-steps">Next Steps</h2> <p>The repo is at <a href="https://github.com/dannwaneri/production-safe-agent-loop">github.com/dannwaneri/production-safe-agent-loop</a>.</p> <p>There are three natural extensions if you want to go further:</p> <h3 id="heading-1-graduation-to-distributed-systems">1. Graduation to Distributed Systems</h3> <p>The SQLite ledger works for isolated sequential loops. The moment you run multiple agents against shared state, you need serializable isolation — concurrent writes to flat JSON corrupt silently. The README documents the three tipping points where a flat ledger needs to graduate.</p> <h3 id="heading-2-cryptographic-signing">2. Cryptographic Signing</h3> <p>For compliance-scale systems where the auditor wasn't present when the loop ran, SQLite rows aren't enough. A database admin can run an <code>UPDATE</code> query. Ed25519 signing wraps each ledger row in a receipt that proves the log wasn't altered after execution. But that's a different tutorial.</p> <h3 id="heading-wiring-a-cron-job">Wiring a Cron Job</h3> <p>The honest architecture for the SEO audit agent isn't 24/7 autonomous operation. It's a cron job that runs on schedule, surfaces what's broken, and stops. <code>0 3 * * 2 python examples/seo_audit_example.py https://yourdomain.com</code> is the whole thing. The loop runs to you, not into a void.</p> <p>If you need this architecture built for your own stack (circuit breakers, audit trails, production-safe agent loops), I do freelance work. <a href="https://dannwaneri.com/ai-agents/">dannwaneri.com/ai-agents/</a></p> </article> <article> <h1> How to Build a PDF Page Numbering Tool in the Browser Using JavaScript </h1> <p>Bhavin Sheth — Fri, 29 May 2026 22:08:47 +0000</p> <p>When you're working with contracts, reports, invoices, manuals, or academic documents, page numbers make navigation much easier.</p> <p>Instead of manually editing every page, modern JavaScript libraries let you add page numbers directly inside the browser.</p> <p>In this tutorial, you'll build a browser-based PDF page numbering tool using JavaScript.</p> <p>Users will be able to upload a PDF, choose where page numbers appear, customize formatting options, preview the document, and download the updated PDF without uploading files to a server.</p> <p>Everything runs locally inside the browser for better privacy and faster processing.</p> <h2 id="heading-table-of-contents">Table of Contents</h2> <ol> <li><p><a href="#heading-how-pdf-page-numbering-works">How PDF Page Numbering Works</a></p> </li> <li><p><a href="#heading-project-setup">Project Setup</a></p> </li> <li><p><a href="#heading-what-library-are-we-using">What Library Are We Using?</a></p> </li> <li><p><a href="#heading-creating-the-upload-interface">Creating the Upload Interface</a></p> </li> <li><p><a href="#heading-reading-pdf-pages">Reading PDF Pages</a></p> </li> <li><p><a href="#heading-previewing-uploaded-pages">Previewing Uploaded Pages</a></p> </li> <li><p><a href="#heading-selecting-page-number-position">Selecting Page Number Position</a></p> </li> <li><p><a href="#heading-choosing-pages-to-number">Choosing Pages to Number</a></p> </li> <li><p><a href="#heading-configuring-number-format-and-style">Configuring Number Format and Style</a></p> </li> <li><p><a href="#heading-generating-the-updated-pdf">Generating the Updated PDF</a></p> </li> <li><p><a href="#heading-previewing-and-downloading-the-final-pdf">Previewing and Downloading the Final PDF</a></p> </li> <li><p><a href="#heading-how-pdf-page-numbers-help-in-real-world-documents">How PDF Page Numbers Help in Real-World Documents</a></p> </li> <li><p><a href="#heading-demo-how-the-pdf-page-number-tool-works">Demo: How the PDF Page Number Tool Works</a></p> </li> <li><p><a href="#heading-important-notes-from-real-world-use">Important Notes from Real-World Use</a></p> </li> <li><p><a href="#heading-common-mistakes-to-avoid">Common Mistakes to Avoid</a></p> </li> <li><p><a href="#heading-conclusion">Conclusion</a></p> </li> </ol> <h2 id="heading-how-pdf-page-numbering-works">How PDF Page Numbering Works</h2> <p>A PDF page numbering tool loads an existing PDF document, modifies selected pages, and inserts page numbers before generating a new downloadable file.</p> <p>Page numbering is commonly used in reports, contracts, invoices, legal documents, eBooks, manuals, and academic papers where readers need an easy way to navigate through multiple pages.</p> <p>Without page numbers, it can be difficult to reference specific sections or locate information inside larger documents.</p> <p>The browser reads the uploaded PDF, processes each page, applies numbering rules, and exports the updated document.</p> <p>Everything happens locally inside the browser.</p> <p>This means documents never leave the user's device, improving privacy and security.</p> <p>In this tutorial, we'll build a tool that allows users to upload a PDF, choose where page numbers appear, customize formatting options, preview the result, and download the updated document directly from the browser.</p> <h2 id="heading-project-setup">Project Setup</h2> <p>This project is intentionally simple.</p> <p>You only need an HTML file, a JavaScript file, and a PDF processing library.</p> <p>No backend server or database is required.</p> <h2 id="heading-what-library-are-we-using">What Library Are We Using?</h2> <p>We'll use PDF-lib because it allows us to load, modify, and export PDF documents directly inside JavaScript.</p> <p>Add it using a CDN:</p> <pre><code class="language-html"><script src="https://unpkg.com/pdf-lib"></script> </code></pre> <p>Once loaded, we can read PDF pages and add numbering information directly inside the browser.</p> <h2 id="heading-creating-the-upload-interface">Creating the Upload Interface</h2> <p>Users first need a way to upload PDF files.</p> <p>A simple file input works:</p> <pre><code class="language-html"><input type="file" id="pdfFile" accept=".pdf"> </code></pre> <p>After selecting a file, JavaScript can process the PDF and display a preview.</p> <h2 id="heading-reading-pdf-pages">Reading PDF Pages</h2> <p>After the file is uploaded, the PDF must be loaded into memory.</p> <p>For example:</p> <pre><code class="language-javascript">const bytes = await file.arrayBuffer(); const pdfDoc = await PDFLib.PDFDocument.load(bytes); const pages = pdfDoc.getPages(); </code></pre> <p>This gives us access to every page inside the document.</p> <h2 id="heading-previewing-uploaded-pages">Previewing Uploaded Pages</h2> <p>Before applying page numbers, users can preview document pages directly inside the browser.</p> <p>Showing page previews helps users verify the document before making changes.</p> <p>The preview section updates automatically after the PDF is uploaded.</p> <h2 id="heading-selecting-page-number-position">Selecting Page Number Position</h2> <p>Different documents require different page number placements.</p> <p>Some users prefer numbers at the bottom center, while others may use corners or top positions.</p> <p>The tool provides multiple positioning options.</p> <p>For example:</p> <pre><code class="language-javascript">page.drawText(pageNumber, { x: 250, y: 20 }); </code></pre> <p>This allows page numbers to be placed at different coordinates.</p> <h2 id="heading-choosing-pages-to-number">Choosing Pages to Number</h2> <p>Not every page needs numbering.</p> <p>Some users may want numbering applied to all pages. Others may choose a custom range or skip the first page.</p> <p>The tool supports all of these options.</p> <h2 id="heading-configuring-number-format-and-style">Configuring Number Format and Style</h2> <p>Users can customize how page numbers appear inside the document.</p> <p>The numbering format can use standard numbers, lowercase letters, or uppercase letters.</p> <p>For example:</p> <pre><code class="language-javascript">const pageNumber = `${index + 1}`; </code></pre> <p>Different numbering styles can also be generated dynamically.</p> <p>Users can also select different fonts.</p> <p>The tool allows changing text size, color, and appearance.</p> <p>Users can also customize numbering patterns.</p> <p>For example:</p> <ul> <li><p>Page 1</p> </li> <li><p>Page 1 of 20</p> </li> <li><p>Custom patterns</p> </li> </ul> <p>Margin settings control spacing between the page number and document edges.</p> <h2 id="heading-generating-the-updated-pdf">Generating the Updated PDF</h2> <p>Once configuration is complete, users can generate the updated document.</p> <p>For example:</p> <pre><code class="language-javascript">const pdfBytes = await pdfDoc.save(); </code></pre> <p>The browser processes the pages and inserts numbering automatically.</p> <h2 id="heading-previewing-and-downloading-the-final-pdf">Previewing and Downloading the Final PDF</h2> <p>After processing, the updated PDF is displayed inside a preview area.</p> <p>Users can review the results before downloading.</p> <p>The interface also shows document details such as total pages and file size.</p> <p>Navigation buttons allow users to browse through pages directly inside the browser.</p> <p>Finally, the completed PDF can be downloaded.</p> <h2 id="heading-how-pdf-page-numbers-help-in-real-world-documents">How PDF Page Numbers Help in Real-World Documents</h2> <p>Page numbers may seem like a small detail, but they become extremely important as documents grow larger.</p> <p>In business reports, page numbers help readers quickly locate specific sections during meetings, reviews, or presentations. Instead of scrolling through dozens of pages, someone can simply jump to the referenced page number.</p> <p>Contracts and legal documents also rely heavily on page numbering. When discussing terms or clauses, it's common to reference a specific page to avoid confusion and ensure everyone is looking at the same information.</p> <p>Academic papers, research documents, and project reports often require page numbers for citations, references, and formatting guidelines. Many institutions consider page numbering a standard requirement for professional submissions.</p> <p>Page numbers are also useful for manuals, ebooks, user guides, and training materials. Readers can easily return to a previous section or follow instructions that reference another page within the document.</p> <p>For example, a company handbook might contain 50 or more pages. Without page numbers, employees would need to manually search for information. With numbering applied, sections can simply reference pages such as "See page 24 for leave policy details."</p> <p>Similarly, invoices, proposals, and financial reports often use formats like "Page 3 of 12" so readers immediately understand how many pages are included in the document.</p> <p>Adding page numbers improves navigation, organization, professionalism, and overall readability, making documents easier to use for both creators and readers.</p> <h2 id="heading-demo-how-the-pdf-page-number-tool-works">Demo: How the PDF Page Number Tool Works</h2> <h3 id="heading-step-1-upload-a-pdf">Step 1: Upload a PDF</h3> <p>Users upload a PDF document into the browser.</p> <h3 id="heading-step-2-review-page-previews">Step 2: Review Page Previews</h3> <p>The uploaded document pages appear inside the preview section.</p> <h3 id="heading-step-3-configure-page-number-settings">Step 3: Configure Page Number Settings</h3> <p>Users choose position, page range, numbering style, font appearance, transparency, and formatting options.</p> <h3 id="heading-step-4-generate-the-pdf">Step 4: Generate the PDF</h3> <p>After configuration is complete, users click the generate button.</p> <h3 id="heading-step-5-review-and-download">Step 5: Review and Download</h3> <p>The finished PDF appears in the preview area.</p> <p>Users can browse pages, review numbering, rename, and download the updated document.</p> <h2 id="heading-important-notes-from-real-world-use">Important Notes from Real-World Use</h2> <p>When working with large PDF files, performance and memory usage become important considerations.</p> <p>Documents containing hundreds of pages may take longer to process inside the browser.</p> <p>A simple validation check can help prevent unsupported files from being processed:</p> <pre><code class="language-javascript">if (!file || file.type !== "application/pdf") { alert("Please upload a valid PDF file"); return; } </code></pre> <p>This ensures users upload a PDF before processing begins.</p> <p>Another useful optimization is limiting very large files before loading them:</p> <pre><code class="language-javascript">const MAX_SIZE = 20 * 1024 * 1024; if (file.size > MAX_SIZE) { alert("PDF file is too large"); return; } </code></pre> <p>This prevents excessive memory usage and improves browser performance.</p> <p>When generating page numbers, it's also helpful to process pages only once:</p> <pre><code class="language-javascript">const pages = pdfDoc.getPages(); pages.forEach((page, index) => { page.drawText(`${index + 1}`); }); </code></pre> <p>This keeps the numbering process efficient even for larger documents.</p> <p>Before downloading the final file, always preview the generated document.</p> <p>Reviewing the output helps verify that page numbers appear in the correct position, use the expected format, and don't overlap important document content.</p> <h2 id="heading-common-mistakes-to-avoid">Common Mistakes to Avoid</h2> <p>One common mistake is hardcoding page number positions.</p> <p>Different PDF documents can have different page sizes, so fixed coordinates may place page numbers in the wrong location.</p> <p>For example:</p> <pre><code class="language-javascript">page.drawText(pageNumber, { x: 250, y: 20 }); </code></pre> <p>Instead, it's usually better to calculate positions dynamically based on the page dimensions.</p> <p>Another mistake is applying numbering to every page when only a subset of pages should be updated.</p> <p>For example, users may want to skip the cover page or number only specific page ranges.</p> <p>Always verify page selection settings before generating the final file.</p> <p>It's also important to preview the output before downloading.</p> <p>For example:</p> <pre><code class="language-javascript">const previewPage = pdfDoc.getPage(0); renderPreview(previewPage); </code></pre> <p>This helps ensure page numbers appear exactly where expected.</p> <p>Another common issue is failing to validate uploaded files before processing:</p> <pre><code class="language-javascript">if (!file || file.type !== "application/pdf") { alert("Please upload a valid PDF file"); return; } </code></pre> <p>Adding basic validation helps prevent errors and improves the overall user experience.</p> <h2 id="heading-conclusion">Conclusion</h2> <p>In this tutorial, you built a browser-based PDF page numbering tool using JavaScript.</p> <p>You learned how to upload PDF files, preview pages, choose numbering positions, customize formatting options, and generate downloadable PDFs directly inside the browser.</p> <p>More importantly, you saw how modern browsers can handle document editing tasks locally without relying on a backend server.</p> <p>This approach keeps the tool fast, private, and easy to use.</p> <p>If you'd like to try a production-ready version, you can use the <a href="https://allinonetools.net/add-page-numbers/">AllInOneTools - PDF Page Number Tool</a>.</p> <p>Once you understand this workflow, you can extend it further with features like headers, footers, watermarks, PDF stamps, document annotations, or advanced page management.</p> <p>And that's where things start getting really interesting.</p> </article> <article> <h1> How to Build a Browser-Based PDF Rotator Using JavaScript </h1> <p>Bhavin Sheth — Wed, 27 May 2026 15:02:55 +0000</p> <p>Sometimes PDF pages appear upside down, sideways, or in the wrong orientation after scanning or exporting documents.</p> <p>Instead of re-creating the document manually, users usually just need a quick way to rotate pages and save the corrected version.</p> <p>Modern browsers make this possible directly with JavaScript.</p> <p>In this tutorial, you’ll build a browser-based PDF rotator using JavaScript.</p> <p>The tool will allow users to upload PDF files, preview pages, rotate selected pages, change orientation, generate an updated PDF, preview the final result, rename the file, and download everything directly from the browser.</p> <p>Everything works entirely client-side without a backend server.</p> <h2 id="heading-table-of-contents">Table of Contents</h2> <ol> <li><p><a href="#heading-how-pdf-rotation-works">How PDF Rotation Works</a></p> </li> <li><p><a href="#heading-project-setup">Project Setup</a></p> </li> <li><p><a href="#heading-what-library-are-we-using">What Library Are We Using?</a></p> </li> <li><p><a href="#heading-creating-the-upload-interface">Creating the Upload Interface</a></p> </li> <li><p><a href="#heading-previewing-uploaded-pdf-pages">Previewing Uploaded PDF Pages</a></p> </li> <li><p><a href="#heading-selecting-pages-to-rotate">Selecting Pages to Rotate</a></p> </li> <li><p><a href="#heading-applying-rotation-options">Applying Rotation Options</a></p> </li> <li><p><a href="#heading-generating-the-rotated-pdf">Generating the Rotated PDF</a></p> </li> <li><p><a href="#heading-previewing-and-downloading-the-final-pdf">Previewing and Downloading the Final PDF</a></p> </li> <li><p><a href="#heading-why-pdf-rotation-is-useful-in-real-world-documents">Why PDF Rotation Is Useful in Real-World Documents</a></p> </li> <li><p><a href="#heading-demo-how-the-pdf-rotator-tool-works">Demo: How the PDF Rotator Tool Works</a></p> </li> <li><p><a href="#heading-important-notes-from-real-world-use">Important Notes from Real-World Use</a></p> </li> <li><p><a href="#heading-common-mistakes-to-avoid">Common Mistakes to Avoid</a></p> </li> <li><p><a href="#heading-conclusion">Conclusion</a></p> </li> </ol> <h2 id="heading-how-pdf-rotation-works">How PDF Rotation Works</h2> <p>PDF rotation works by updating the orientation data of PDF pages.</p> <p>Instead of modifying the actual content manually, JavaScript libraries can rotate pages programmatically and export an updated version of the document.</p> <p>The browser loads the PDF file, reads page information, applies rotation values like 90°, 180°, or landscape orientation, and then generates a new downloadable PDF.</p> <p>Everything happens directly inside the browser.</p> <p>This keeps the process fast, private, and easy to use without uploading files to external servers.</p> <h2 id="heading-project-setup">Project Setup</h2> <p>This project is intentionally simple.</p> <p>You only need an HTML file, a JavaScript file, and a PDF processing library.</p> <p>Everything runs entirely inside the browser using JavaScript. No backend server or database is required.</p> <h2 id="heading-what-library-are-we-using">What Library Are We Using?</h2> <p>We’ll use the PDF-lib library for editing PDF files directly in the browser.</p> <p>Add it using a CDN:</p> <pre><code class="language-html"><script src="https://unpkg.com/pdf-lib/dist/pdf-lib.min.js"></script> </code></pre> <p>This library allows us to:</p> <ul> <li><p>load PDF documents</p> </li> <li><p>rotate pages</p> </li> <li><p>modify orientation</p> </li> <li><p>export updated PDFs</p> </li> </ul> <h2 id="heading-creating-the-upload-interface">Creating the Upload Interface</h2> <p>Start with a basic upload input:</p> <pre><code class="language-html"><input type="file" id="pdfUpload" accept="application/pdf"> <button onclick="rotatePDF()"> Rotate PDF </button> </code></pre> <p>This allows users to upload PDF files directly from the browser.</p> <p>Here’s what the upload section looks like inside the tool:</p> <h2 id="heading-previewing-uploaded-pdf-pages">Previewing Uploaded PDF Pages</h2> <p>After uploading a PDF file, users can preview pages directly inside the browser before applying rotations.</p> <p>The preview section also includes rotation controls so users can rotate pages individually as needed before generating the final PDF.</p> <p>To render previews, we first load the uploaded PDF document:</p> <pre><code class="language-javascript">const pdfDoc = await PDFLib.PDFDocument.load(arrayBuffer); const totalPages = pdfDoc.getPageCount(); </code></pre> <p>Next, we render page previews dynamically:</p> <pre><code class="language-javascript">for (let i = 0; i < totalPages; i++) { const page = pdfDoc.getPage(i); console.log("Rendering page:", i + 1); } </code></pre> <p>Users can then move between pages using left and right navigation buttons.</p> <p>Rotation buttons can also be attached to each preview card:</p> <pre><code class="language-javascript">rotateLeftBtn.addEventListener("click", () => { rotatePage(currentPage, -90); }); rotateRightBtn.addEventListener("click", () => { rotatePage(currentPage, 90); }); </code></pre> <p>This makes it easier to verify page orientation before generating the updated PDF.</p> <p>Here’s what the page preview section looks like:</p> <h2 id="heading-selecting-pages-to-rotate">Selecting Pages to Rotate</h2> <p>Not every document needs all pages rotated.</p> <p>Some users may only want to rotate even-numbered pages, odd-numbered pages, or specific pages within the document.</p> <p>The tool allows users to select which pages should receive the rotation changes before generating the final PDF.</p> <p>For example, users can choose the rotation scope like this:</p> <pre><code class="language-javascript">const selectedMode = document.querySelector( 'input[name="pageMode"]:checked' ).value; </code></pre> <p>Specific page ranges can also be supported:</p> <pre><code class="language-javascript">const customPages = document .getElementById("customPages") .value; </code></pre> <p>This gives users more control over which document pages are modified.</p> <p>Here’s how the page selection controls look inside the tool:</p> <h2 id="heading-applying-rotation-options">Applying Rotation Options</h2> <p>Once the pages are selected, users can apply different rotation actions directly inside the browser.</p> <p>Pages can be rotated left by 90 degrees, rotated right by 90 degrees, flipped by 180 degrees, or converted into portrait or landscape orientation.</p> <p>Here’s a simple example using PDF-lib:</p> <pre><code class="language-javascript">const page = pdfDoc.getPage(pageIndex); page.setRotation( PDFLib.degrees(90) ); </code></pre> <p>To rotate pages left:</p> <pre><code class="language-javascript">page.setRotation( PDFLib.degrees(-90) ); </code></pre> <p>You can also apply orientation presets dynamically:</p> <pre><code class="language-javascript">if (orientation === "landscape") { page.setRotation(PDFLib.degrees(90)); } </code></pre> <p>These controls allow users to fix scanned documents and incorrect page layouts directly inside the browser.</p> <p>Here’s what the rotation controls look like inside the tool:</p> <h2 id="heading-generating-the-rotated-pdf">Generating the Rotated PDF</h2> <p>After the rotation settings are configured, users can generate the updated PDF directly inside the browser.</p> <p>The tool processes selected pages, applies rotation changes, and exports a new downloadable PDF file instantly.</p> <p>For example:</p> <pre><code class="language-javascript">const pdfBytes = await pdfDoc.save(); </code></pre> <p>Next, create a downloadable file:</p> <pre><code class="language-javascript">const blob = new Blob( [pdfBytes], { type: "application/pdf" } ); const url = URL.createObjectURL(blob); </code></pre> <p>Finally, trigger the download:</p> <pre><code class="language-javascript">const link = document.createElement("a"); link.href = url; link.download = "rotated-document.pdf"; link.click(); </code></pre> <p>This entire workflow runs locally inside the browser without requiring a backend server.</p> <p>Here’s what the generate button looks like inside the tool:</p> <h2 id="heading-previewing-and-downloading-the-final-pdf">Previewing and Downloading the Final PDF</h2> <p>Once processing is complete, the tool displays a live preview of the rotated document.</p> <p>Users can review updated pages before downloading the final file.</p> <p>The interface also shows additional document details such as total pages and file size.</p> <p>A rename option is available before downloading the generated PDF.</p> <p>For example, users can rename the file like this:</p> <pre><code class="language-javascript">const fileName = prompt( "Enter PDF name:", "rotated-document" ); </code></pre> <p>The preview section also includes left and right navigation controls so users can browse through rotated pages directly inside the browser.</p> <p>Document details can also be displayed dynamically:</p> <pre><code class="language-javascript">fileSizeElement.textContent = formatFileSize(blob.size); pageCountElement.textContent = pdfDoc.getPageCount(); </code></pre> <p>This improves usability and helps users verify the final output before downloading.</p> <p>Here’s what the final output section looks like:</p> <h2 id="heading-why-pdf-rotation-is-useful-in-real-world-documents">Why PDF Rotation Is Useful in Real-World Documents</h2> <p>PDF rotation may seem like a small feature, but it solves a very common problem in everyday document handling.</p> <p>Many scanned documents, mobile scans, invoices, certificates, and office files are saved with incorrect orientation. Some pages appear sideways, upside down, or mixed between portrait and landscape layouts.</p> <p>Instead of reopening and rescanning those files, users can quickly fix page orientation directly inside the browser.</p> <p>For example, PDF rotation is commonly used for:</p> <ul> <li><p>scanned agreements</p> </li> <li><p>invoices and bills</p> </li> <li><p>government forms</p> </li> <li><p>academic documents</p> </li> <li><p>construction drawings</p> </li> <li><p>landscape reports</p> </li> <li><p>mobile camera scans</p> </li> </ul> <p>This becomes especially useful when working with multi-page PDFs where only certain pages need correction.</p> <p>Some users may only want to rotate:</p> <ul> <li><p>even-numbered pages</p> </li> <li><p>odd-numbered pages</p> </li> <li><p>specific pages</p> </li> <li><p>landscape pages only</p> </li> </ul> <p>That’s why page-based rotation controls are important in modern PDF tools.</p> <p>Browser-based PDF rotation also improves privacy because uploaded documents stay on the user’s device instead of being sent to external servers.</p> <h2 id="heading-demo-how-the-pdf-rotator-tool-works">Demo: How the PDF Rotator Tool Works</h2> <h3 id="heading-step-1-upload-the-pdf">Step 1: Upload the PDF</h3> <p>Users first upload a PDF document directly into the browser-based tool.</p> <p>The upload section supports drag-and-drop along with manual file selection.</p> <p>Here’s what the upload interface looks like:</p> <h3 id="heading-step-2-preview-pdf-pages">Step 2: Preview PDF Pages</h3> <p>After uploading the document, the tool generates page previews automatically.</p> <p>The preview section also includes a rotation option so users can rotate document pages as per required.</p> <p>Here’s the preview section inside the tool:</p> <h3 id="heading-step-3-configure-rotation-settings">Step 3: Configure Rotation Settings</h3> <p>Users can now choose how the PDF pages should rotate.</p> <p>The tool supports:</p> <ul> <li><p>rotate left</p> </li> <li><p>rotate right</p> </li> <li><p>flip 180 degrees</p> </li> <li><p>portrait orientation</p> </li> <li><p>landscape orientation</p> </li> </ul> <p>Users can also choose whether rotations apply to all pages or just certain pages.</p> <p>Here’s what the rotation settings panel looks like:</p> <h3 id="heading-step-4-generate-the-rotated-pdf">Step 4: Generate the Rotated PDF</h3> <p>Once everything is configured, users click the generate button to apply the rotations.</p> <p>The browser processes the document locally and creates the updated PDF instantly.</p> <p>Here’s the generate button inside the tool:</p> <h3 id="heading-step-5-preview-the-final-output">Step 5: Preview the Final Output</h3> <p>After processing is complete, the tool displays the rotated PDF preview directly inside the browser.</p> <p>Users can navigate page-by-page using the left and right controls to verify the final output.</p> <p>The interface also shows:</p> <ul> <li><p>total pages</p> </li> <li><p>file size</p> </li> <li><p>output filename</p> </li> </ul> <p>Here’s the final preview section:</p> <h3 id="heading-step-6-rename-and-download-the-pdf">Step 6: Rename and Download the PDF</h3> <p>Before downloading, users can rename the generated PDF file directly inside the browser.</p> <p>Once renamed, the updated document can be downloaded instantly.</p> <p>Here’s the rename and download section:</p> <h2 id="heading-important-notes-from-real-world-use">Important Notes from Real-World Use</h2> <p>When working with scanned PDFs, page orientation issues are very common.</p> <p>Some documents may contain mixed orientations where certain pages are portrait while others are landscape.</p> <p>Applying rotation changes page-by-page usually gives better results than rotating the entire document blindly.</p> <p>Large PDF files can also increase processing time inside the browser.</p> <p>For example:</p> <pre><code class="language-javascript">if (file.size > 50 * 1024 * 1024) { alert("Large PDF files may process slowly."); } </code></pre> <p>Another useful optimization is previewing pages before applying permanent changes.</p> <p>This helps users verify page orientation and reduces mistakes before downloading the updated document.</p> <p>Since everything runs locally in the browser, uploaded documents never leave the user’s device, which improves privacy and security.</p> <h2 id="heading-common-mistakes-to-avoid">Common Mistakes to Avoid</h2> <p>One common mistake is rotating pages multiple times accidentally.</p> <p>For example, applying two consecutive 90-degree rotations may result in unexpected orientation changes.</p> <p>Another issue is ignoring page selection before applying rotations.</p> <p>Users may accidentally rotate all pages instead of specific sections of the document.</p> <p>Large scanned PDFs can also slow down rendering and preview generation.</p> <p>Validating uploaded files before processing helps avoid broken workflows:</p> <pre><code class="language-javascript">if (!file || file.type !== "application/pdf") { alert("Please upload a valid PDF file."); return; } </code></pre> <p>Incorrect preview synchronization is another common issue.</p> <p>If page previews aren't refreshed after rotation, users may think the rotation failed even though the exported PDF is correct.</p> <p>Updating previews dynamically after each rotation improves the overall experience.</p> <h2 id="heading-conclusion">Conclusion</h2> <p>In this tutorial, you built a browser-based PDF rotator using JavaScript.</p> <p>You learned how to upload PDF files, preview document pages, rotate selected pages, change page orientation, generate updated PDFs, and download the final document directly inside the browser.</p> <p>More importantly, you saw how modern browsers can handle practical PDF editing tasks locally without relying on a backend server.</p> <p>This approach keeps the tool fast, private, and easy to use.</p> <p>You can also try the live tool here: <a href="https://allinonetools.net/rotate-pdf/">AllInOneTools - PDF Rotator Tool</a>.</p> <p>Once you understand this workflow, you can extend it further with features like PDF page extraction, annotations, document organization, digital signatures, or advanced editing tools.</p> <p>And that’s where things start getting really interesting.</p> </article> <article> <h1> How to Build a Browser-Based PDF Watermark Tool Using JavaScript </h1> <p>Bhavin Sheth — Tue, 19 May 2026 15:50:51 +0000</p> <p>PDF watermarks are commonly used for branding, document protection, approvals, confidential files, and internal document tracking.</p> <p>Whether it’s adding a company logo, a “CONFIDENTIAL” label, or a draft watermark, users often need a quick way to modify PDFs without uploading files to external servers.</p> <p>Modern browsers make this much easier than before. Instead of sending documents to a backend, we can process PDF files directly inside the browser using JavaScript. This keeps documents private while making the tool fast and easy to use.</p> <p>In this tutorial, you’ll build a browser-based PDF watermark tool using JavaScript.</p> <p>The tool will support both text and image watermarks, adjustable opacity, rotation, page selection, positioning controls, and downloadable PDF output directly from the browser.</p> <p>Everything works entirely client-side without any backend.</p> <h2 id="heading-table-of-contents">Table of Contents</h2> <ol> <li><p><a href="#heading-how-pdf-watermarking-works">How PDF Watermarking Works</a></p> </li> <li><p><a href="#heading-project-setup">Project Setup</a></p> </li> <li><p><a href="#heading-what-library-are-we-using">What Library Are We Using?</a></p> </li> <li><p><a href="#heading-creating-the-upload-interface">Creating the Upload Interface</a></p> </li> <li><p><a href="#heading-adding-text-watermarks">Adding Text Watermarks</a></p> </li> <li><p><a href="#heading-adding-image-watermarks">Adding Image Watermarks</a></p> </li> <li><p><a href="#heading-positioning-and-opacity-controls">Positioning and Opacity Controls</a></p> </li> <li><p><a href="#heading-selecting-pages-to-apply">Selecting Pages to Apply</a></p> </li> <li><p><a href="#heading-generating-and-downloading-the-final-pdf">Generating and Downloading the Final PDF</a></p> </li> <li><p><a href="#heading-demo-how-the-pdf-watermark-tool-works">Demo: How the PDF Watermark Tool Works</a></p> </li> <li><p><a href="#heading-important-notes-from-real-world-use">Important Notes from Real-World Use</a></p> </li> <li><p><a href="#heading-common-mistakes-to-avoid">Common Mistakes to Avoid</a></p> </li> <li><p><a href="#heading-conclusion">Conclusion</a></p> </li> </ol> <h2 id="heading-how-pdf-watermarking-works">How PDF Watermarking Works</h2> <p>A PDF watermark is simply additional text or an image layered on top of an existing PDF page.</p> <p>In the browser, JavaScript libraries can load PDF pages, modify them visually, and export a new downloadable version.</p> <p>The process starts when the user uploads a PDF file into the tool. JavaScript then reads the document, loads each page, and applies watermark elements like text or logos on top of the existing content. After positioning and opacity settings are applied, the updated PDF is generated and downloaded directly from the browser.</p> <p>Everything happens locally inside the browser. This means uploaded documents never leave the user’s device, which improves privacy and security.</p> <h2 id="heading-project-setup">Project Setup</h2> <p>This project is intentionally simple. Everything runs directly inside the browser using JavaScript, so no backend server is required.</p> <p>You only need:</p> <ul> <li><p>an HTML file</p> </li> <li><p>a JavaScript file</p> </li> <li><p>a PDF processing library</p> </li> </ul> <h2 id="heading-what-library-are-we-using">What Library Are We Using?</h2> <p>We’ll use the PDF-lib library for editing existing PDF documents inside the browser.</p> <p>Add it using a CDN:</p> <pre><code class="language-html"><script src="https://unpkg.com/pdf-lib/dist/pdf-lib.min.js"></script> </code></pre> <p>This library allows us to load PDF files directly in the browser, modify existing pages, insert custom text or image watermarks, and finally export the updated document as a new downloadable PDF.</p> <p>Because everything runs client-side with JavaScript, users can edit PDFs without uploading files to a server.</p> <h2 id="heading-how-to-create-the-upload-interface">How to Create the Upload Interface</h2> <p>Start with a basic upload input:</p> <pre><code class="language-html"><input type="file" id="pdfUpload" accept="application/pdf"> <button onclick="addWatermark()"> Apply Watermark </button> </code></pre> <p>This allows users to upload PDF files directly from the browser.</p> <p>The tool also includes watermark settings like text input, image upload, opacity controls, positioning, and page selection.</p> <p>Here’s what the watermark settings panel looks like inside the tool:</p> <h2 id="heading-how-to-add-text-watermarks">How to Add Text Watermarks</h2> <p>Text watermarks are commonly used for labels like “CONFIDENTIAL”, “DRAFT”, or “APPROVED”.</p> <p>For example:</p> <pre><code class="language-javascript">page.drawText("CONFIDENTIAL", { x: 200, y: 300, size: 48, opacity: 0.5 }); </code></pre> <p>This inserts watermark text directly onto the PDF page. Users can also customize the appearance of the watermark directly inside the tool.</p> <p>For text watermarks, users can adjust the font size, change the text color, apply bold or italic styling, control opacity levels, and rotate the watermark at different angles for better visibility and protection.</p> <p>Here’s an example of text watermark controls inside the tool:</p> <h2 id="heading-how-to-add-image-watermarks">How to Add Image Watermarks</h2> <p>Some users may want to apply logos or branded graphics instead of plain text.</p> <p>For example:</p> <pre><code class="language-javascript">const image = await pdfDoc.embedPng(imageBytes); page.drawImage(image, { x: 180, y: 250, width: 120, height: 120, opacity: 0.5 }); </code></pre> <p>This inserts an image watermark onto the PDF page.</p> <p>The tool also supports image scaling controls so users can resize uploaded logos before applying them.</p> <p>Here’s an example of image watermark settings inside the tool:</p> <h2 id="heading-positioning-and-opacity-controls">Positioning and Opacity Controls</h2> <p>Watermark placement is important for readability and document appearance.</p> <p>Users may want centered watermarks, corner positioning, or diagonal overlays depending on the document type.</p> <p>For example:</p> <pre><code class="language-javascript">page.drawText("CONFIDENTIAL", { x: 220, y: 250, rotate: degrees(45), opacity: 0.5 }); </code></pre> <p>This creates a rotated semi-transparent watermark.</p> <p>The tool also allows users to adjust watermark positioning and appearance directly inside the browser.</p> <p>Users can control the X and Y position, change opacity levels, rotate the watermark at different angles, and quickly move the watermark using directional placement controls.</p> <p>This makes it easier to place watermarks correctly without manually editing the PDF in external software.</p> <p>Here’s an example of positioning controls inside the tool:</p> <h2 id="heading-how-to-select-pages-to-apply">How to Select Pages to Apply</h2> <p>Not every watermark needs to appear on every page. Some users may only want watermarks on specific pages.</p> <p>For example:</p> <pre><code class="language-javascript">const selectedPages = [1, 3, 5]; </code></pre> <p>The tool allows users to control exactly where the watermark should appear.</p> <p>For example, a watermark can be applied to every page in the document, only even-numbered pages, only odd-numbered pages, or specific custom page ranges like 1-3,5.</p> <p>This makes the tool more flexible for real-world use cases such as contracts, invoices, reports, certificates, and branded documents..</p> <p>Here’s an example of page selection options inside the tool:</p> <h2 id="heading-how-to-generate-and-download-the-final-pdf">How to Generate and Download the Final PDF</h2> <p>Once watermark settings are configured, the browser generates the updated PDF directly inside the browser.</p> <p>For example:</p> <pre><code class="language-javascript">const pdfBytes = await pdfDoc.save(); </code></pre> <p>Then the updated file becomes downloadable:</p> <pre><code class="language-javascript">download(pdfBytes, "watermarked.pdf"); </code></pre> <p>This process happens locally without uploading files to external servers.</p> <h2 id="heading-demo-how-the-pdf-watermark-tool-works">Demo: How the PDF Watermark Tool Works</h2> <p>For this example, we’ll apply a custom watermark directly inside the browser.</p> <h3 id="heading-step-1-upload-the-pdf">Step 1: Upload the PDF</h3> <p>Users upload a PDF document into the watermark tool.</p> <h3 id="heading-step-2-preview-the-uploaded-pdf">Step 2: Preview the Uploaded PDF</h3> <p>After uploading the PDF, the tool generates a live preview directly inside the browser.</p> <p>Users can navigate through pages using the left and right arrow buttons to review the document before applying the watermark.</p> <p>This page-by-page preview helps users verify the correct file, check page content, and decide where the watermark should appear.</p> <p>Here’s how the PDF preview section looks inside the tool:</p> <h3 id="heading-step-3-configure-watermark-settings">Step 3: Configure Watermark Settings</h3> <p>Users can choose between text or image watermark mode.</p> <p>For text watermarks, users can customize font size, color, opacity, and rotation.</p> <p>For image watermarks, users can upload a logo and adjust image scale before applying it.</p> <h3 id="heading-step-4-position-and-apply-the-watermark">Step 4: Position and Apply the Watermark</h3> <p>Users can reposition the watermark visually before generating the final file.</p> <p>The tool also allows users to control where the watermark should be applied within the document. For example, the watermark can appear on all pages, only even-numbered pages, only odd-numbered pages, or specific custom page ranges.</p> <p>Opacity and rotation controls help improve visibility without blocking important document content.</p> <p>This gives users more flexibility when watermarking contracts, invoices, reports, certificates, or branded PDFs.</p> <h3 id="heading-step-5-generate-the-watermarked-pdf">Step 5: Generate the Watermarked PDF</h3> <p>Once the watermark settings are configured, users can click the generate button to process the document directly inside the browser.</p> <p>The tool applies the watermark to the selected pages and prepares the updated PDF instantly.</p> <p>Here’s how the generate PDF button looks inside the tool:</p> <h3 id="heading-step-6-preview-and-download-the-updated-pdf">Step 6: Preview and Download the Updated PDF</h3> <p>After processing is complete, the tool displays a live preview of the final watermarked PDF.</p> <p>Users can review the updated document before downloading it. The interface also shows useful file details such as total pages and final file size.</p> <p>A rename option is available before downloading the generated PDF.</p> <p>Here’s an example of the final output preview section:</p> <h2 id="heading-important-notes-from-real-world-use">Important Notes from Real-World Use</h2> <p>When working with large PDF documents, performance and rendering speed become important.</p> <p>Applying watermarks page-by-page is usually more stable than modifying everything simultaneously.</p> <p>For example:</p> <pre><code class="language-javascript">for (const page of pdfDoc.getPages()) { // apply watermark } </code></pre> <p>Another useful optimization is lowering image watermark size before embedding large logos. This reduces output file size and improves processing speed.</p> <p>Opacity is also important. Very dark watermarks can make documents difficult to read, especially on printed pages. Keeping watermark opacity between <code>0.3</code> and <code>0.5</code> usually works well in real-world situations.</p> <p>Since everything runs locally inside the browser, uploaded documents remain private and never leave the user’s device.</p> <h2 id="heading-common-mistakes-to-avoid">Common Mistakes to Avoid</h2> <p>One common mistake is applying watermarks at full opacity. This can make the document difficult to read.</p> <p>For example:</p> <pre><code class="language-javascript">opacity: 1 </code></pre> <p>Instead, use lower opacity values:</p> <pre><code class="language-javascript">opacity: 0.4 </code></pre> <p>Another issue is incorrect watermark positioning. If coordinates are hardcoded incorrectly, the watermark may appear outside the visible page area.</p> <p>Dynamic positioning usually works better across different page sizes. Large image watermarks can also increase PDF file size significantly. Resizing images before embedding them helps improve performance.</p> <p>Another common mistake is forgetting to validate uploaded files:</p> <pre><code class="language-javascript">if (!file || file.type !== "application/pdf") { alert("Please upload a valid PDF file."); return; } </code></pre> <p>This prevents unsupported files from breaking the tool.</p> <h2 id="heading-conclusion">Conclusion</h2> <p>In this tutorial, you built a browser-based PDF watermark tool using JavaScript.</p> <p>You learned how to upload PDF files, apply text or image watermarks, control positioning and opacity, and generate downloadable PDFs directly inside the browser.</p> <p>More importantly, you saw how modern browsers can handle document editing tasks locally without relying on a backend server.</p> <p>This approach keeps the tool fast, private, and easy to use.</p> <p>You can also try the live tool here: <a href="https://allinonetools.net/add-watermark-pdf/">All In One Tools PDF Watermark Tool</a></p> <p>Once you understand this workflow, you can extend it further with features like digital signatures, PDF annotations, stamping tools, password protection, or advanced document editing.</p> <p>And that’s where things start getting really interesting.</p> </article> <article> <h1> How to Build a Browser-Based PDF to Image Converter Using JavaScript </h1> <p>Bhavin Sheth — Mon, 11 May 2026 21:35:03 +0000</p> <p>Whether it’s invoices, scanned documents, reports, certificates, or receipts, users often need to convert PDF pages into image files quickly.</p> <p>Modern browsers make this much easier than before.</p> <p>Instead of uploading documents to a server, we can process PDF files directly inside the browser using JavaScript. This keeps the tool fast, private, and easy to use.</p> <p>In this tutorial, you’ll build a browser-based PDF to image converter using JavaScript.</p> <p>The tool will support uploading PDF files, previewing pages, selecting image formats like JPG or PNG, adjusting image quality, and downloading converted images directly from the browser.</p> <p>Everything runs entirely client-side without any backend.</p> <h2 id="heading-table-of-contents">Table of Contents</h2> <ol> <li><p><a href="#heading-how-pdf-to-image-conversion-works">How PDF to Image Conversion Works</a></p> </li> <li><p><a href="#heading-project-setup">Project Setup</a></p> </li> <li><p><a href="#heading-what-library-are-we-using">What Library Are We Using?</a></p> </li> <li><p><a href="#heading-creating-the-upload-interface">Creating the Upload Interface</a></p> </li> <li><p><a href="#heading-reading-the-pdf-file">Reading the PDF File</a></p> </li> <li><p><a href="#heading-rendering-pdf-pages-as-images">Rendering PDF Pages as Images</a></p> </li> <li><p><a href="#heading-selecting-image-format-and-quality">Selecting Image Format and Quality</a></p> </li> <li><p><a href="#heading-generating-and-downloading-images">Generating and Downloading Images</a></p> </li> <li><p><a href="#heading-demo-how-the-pdf-to-image-tool-works">Demo: How the PDF to Image Tool Works</a></p> </li> <li><p><a href="#heading-important-notes-from-real-world-use">Important Notes from Real-World Use</a></p> </li> <li><p><a href="#heading-common-mistakes-to-avoid">Common Mistakes to Avoid</a></p> </li> <li><p><a href="#heading-conclusion">Conclusion</a></p> </li> </ol> <h2 id="heading-how-pdf-to-image-conversion-works">How PDF to Image Conversion Works</h2> <p>A browser can't directly convert PDF files into images on its own.</p> <p>Instead, JavaScript libraries render PDF pages onto an HTML canvas, which can then be exported as image files like JPG or PNG.</p> <p>The process starts when users upload a PDF document into the browser. JavaScript then reads the file, renders each PDF page visually onto a canvas, converts those rendered pages into image files, and finally makes them available for download.</p> <p>Everything happens locally inside the browser.</p> <p>This means users don't need to upload private documents to external servers, making the process faster and more privacy-friendly.</p> <h2 id="heading-project-setup">Project Setup</h2> <p>This project is intentionally simple. Everything runs directly inside the browser using JavaScript, so no backend or server setup is required.</p> <p>You only need:</p> <ul> <li><p>an HTML file</p> </li> <li><p>a JavaScript file</p> </li> <li><p>the PDF.js library</p> </li> </ul> <h2 id="heading-what-library-are-we-using">What Library Are We Using?</h2> <p>We’ll use Mozilla’s PDF.js library to render PDF pages inside the browser.</p> <p>Add it using a CDN:</p> <pre><code class="language-html"><script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/3.11.174/pdf.min.js"></script> </code></pre> <p>Once loaded, the browser can read and render PDF pages directly using JavaScript.</p> <h2 id="heading-creating-the-upload-interface">Creating the Upload Interface</h2> <p>Start with a simple upload area:</p> <pre><code class="language-html"><input type="file" id="pdfUpload" accept="application/pdf"> <select id="format"> <option>JPG</option> <option>PNG</option> <option>WEBP</option> </select> <input type="range" id="quality" min="10" max="100" value="90"> <button onclick="convertPDF()"> Convert to Images </button> </code></pre> <p>This allows users to upload PDF files directly into the browser.</p> <p>Here’s what the upload section looks like inside the tool:</p> <h2 id="heading-reading-the-pdf-file">Reading the PDF File</h2> <p>After the file is uploaded, we need to read it using JavaScript.</p> <p>For example:</p> <pre><code class="language-javascript">const file = document.getElementById("pdfUpload").files[0]; const reader = new FileReader(); reader.onload = async function () { const typedArray = new Uint8Array(reader.result); const pdf = await pdfjsLib.getDocument(typedArray).promise; console.log(pdf.numPages); }; reader.readAsArrayBuffer(file); </code></pre> <p>This loads the PDF document directly inside the browser.</p> <p>You can then access each page individually.</p> <h2 id="heading-rendering-pdf-pages-as-images">Rendering PDF Pages as Images</h2> <p>Once the PDF is loaded, pages can be rendered onto a canvas.</p> <p>For example:</p> <pre><code class="language-javascript">const page = await pdf.getPage(1); const viewport = page.getViewport({ scale: 2 }); const canvas = document.createElement("canvas"); const context = canvas.getContext("2d"); canvas.width = viewport.width; canvas.height = viewport.height; await page.render({ canvasContext: context, viewport: viewport }).promise; </code></pre> <p>This renders the selected PDF page visually inside the browser.</p> <p>After rendering, the canvas can be converted into an image.</p> <p>For example:</p> <pre><code class="language-javascript">const imageData = canvas.toDataURL("image/jpeg", 0.9); </code></pre> <p>This creates a downloadable image version of the PDF page.</p> <h2 id="heading-selecting-image-format-and-quality">Selecting Image Format and Quality</h2> <p>Before generating the final images, users may want to customize output settings.</p> <p>Different image formats work better for different situations.</p> <p>For example:</p> <ul> <li><p>JPG works well for smaller file sizes</p> </li> <li><p>PNG preserves better quality</p> </li> <li><p>WEBP offers modern compression</p> </li> </ul> <p>Users can also control image quality using a slider.</p> <p>For example:</p> <pre><code class="language-javascript">canvas.toDataURL("image/jpeg", 0.8); </code></pre> <p>The value <code>0.8</code> controls compression quality.</p> <p>Here’s an example of image format and quality settings inside the tool:</p> <h2 id="heading-generating-and-downloading-images">Generating and Downloading Images</h2> <p>Once pages are rendered, images can be downloaded directly from the browser.</p> <p>For example:</p> <pre><code class="language-javascript">const link = document.createElement("a"); link.href = imageData; link.download = `page-${pageNumber}.jpg`; link.click(); </code></pre> <p>This downloads the generated image instantly.</p> <p>When working with multi-page PDFs, the same process can run for every page automatically.</p> <p>This allows users to export complete PDF documents as separate image files.</p> <h2 id="heading-demo-how-the-pdf-to-image-tool-works">Demo: How the PDF to Image Tool Works</h2> <p>For this example, we’ll convert PDF pages into downloadable image files directly inside the browser.</p> <h3 id="heading-step-1-upload-pdf-files">Step 1: Upload PDF Files</h3> <p>Users upload one or more PDF files into the converter.</p> <h3 id="heading-step-2-preview-uploaded-pages">Step 2: Preview Uploaded Pages</h3> <p>The tool generates page previews before conversion.</p> <p>This helps users verify the uploaded document visually.</p> <h3 id="heading-step-3-configure-output-settings">Step 3: Configure Output Settings</h3> <p>Users can choose image format and quality settings before generating images.</p> <p>This allows better control over output size and image clarity.</p> <h3 id="heading-step-4-convert-pdf-pages-into-images">Step 4: Convert PDF Pages into Images</h3> <p>Once settings are configured, users click the convert button.</p> <p>The browser processes the PDF locally and generates image files instantly.</p> <h3 id="heading-step-5-download-generated-images">Step 5: Download Generated Images</h3> <p>After conversion, every PDF page becomes a downloadable image.</p> <h2 id="heading-important-notes-from-real-world-use">Important Notes from Real-World Use</h2> <p>When working with large PDFs, performance and memory usage become important.</p> <p>Documents with many pages can slow down rendering if everything is processed at once.</p> <p>One practical optimization is processing pages step-by-step instead of rendering the entire document immediately.</p> <p>For example:</p> <pre><code class="language-javascript">for (let i = 1; i <= pdf.numPages; i++) { const page = await pdf.getPage(i); // render page } </code></pre> <p>This keeps browser memory usage more stable.</p> <p>Another useful optimization is reducing render scale for large documents.</p> <p>For example:</p> <pre><code class="language-javascript">const viewport = page.getViewport({ scale: 1.5 }); </code></pre> <p>Lower scale values generate smaller image files and improve performance.</p> <p>You can also resize generated images before export.</p> <p>For example:</p> <pre><code class="language-javascript">canvas.width = viewport.width; canvas.height = viewport.height; </code></pre> <p>This helps reduce unnecessary file size growth.</p> <p>Since everything runs locally inside the browser, uploaded PDF files never leave the user’s device, which improves privacy and security.</p> <h2 id="heading-common-mistakes-to-avoid">Common Mistakes to Avoid</h2> <p>One common mistake is not validating uploaded files before processing them.</p> <p>For example:</p> <pre><code class="language-javascript">if (!file || file.type !== "application/pdf") { alert("Please upload a valid PDF file."); return; } </code></pre> <p>This prevents unsupported files from breaking the tool.</p> <p>Another issue is rendering extremely large pages at very high scale values.</p> <p>Large canvas rendering can consume a lot of memory and slow down conversion significantly.</p> <p>Using smaller scale values usually improves performance.</p> <p>Another common mistake is forgetting to wait for page rendering before exporting the image.</p> <p>For example:</p> <pre><code class="language-javascript">await page.render({ canvasContext: context, viewport: viewport }).promise; </code></pre> <p>Without <code>await</code>, the image may export before rendering finishes.</p> <p>Incorrect file naming can also confuse users when multiple pages are generated.</p> <p>Adding page numbers to filenames improves organization:</p> <pre><code class="language-javascript">link.download = `page-${pageNumber}.jpg`; </code></pre> <h2 id="heading-conclusion">Conclusion</h2> <p>In this tutorial, you built a browser-based PDF to image converter using JavaScript.</p> <p>You learned how to upload PDF files, render pages inside the browser, generate images, and download them directly without using a backend server.</p> <p>More importantly, you saw how modern browsers can handle document processing tasks locally while keeping user files private.</p> <p>This approach keeps the tool fast, lightweight, and easy to use.</p> <p>Once you understand this workflow, you can extend it further with features like ZIP downloads, batch exports, page selection, watermarking, or image compression.</p> <p>You can also try a real working version here:</p> <p><a href="https://allinonetools.net/pdf-to-image-converter/">https://allinonetools.net/pdf-to-image-converter/</a></p> <p>And that’s where things start getting really interesting.</p> </article> <article> <h1> How to Convert Images to PDF in the Browser Using JavaScript – A Step-by-Step Guide </h1> <p>Bhavin Sheth — Fri, 08 May 2026 17:18:29 +0000</p> <p>Whether it’s scanned documents, screenshots, receipts, notes, certificates, or multiple photos, users often need a quick way to combine images into a downloadable PDF.</p> <p>Modern browsers make this much easier than before.</p> <p>Instead of uploading files to a server, we can now process images directly in the browser using JavaScript. This keeps the tool fast, private, and easy to use.</p> <p>In this tutorial, you’ll build a browser-based Image to PDF converter using JavaScript.</p> <p>The tool will support uploading multiple images, sorting files, choosing orientation and page size, configuring margins, and merging images into either a single PDF or separate PDF files. Users will also be able to preview and download the generated document directly in the browser.</p> <p>Everything runs entirely client-side without any backend.</p> <h2 id="heading-table-of-contents">Table of Contents</h2> <ol> <li><p><a href="#heading-how-image-to-pdf-conversion-works">How Image to PDF Conversion Works</a></p> </li> <li><p><a href="#heading-project-setup">Project Setup</a></p> </li> <li><p><a href="#heading-what-library-are-we-using">What Library Are We Using?</a></p> </li> <li><p><a href="#heading-creating-the-upload-interface">Creating the Upload Interface</a></p> </li> <li><p><a href="#heading-reading-uploaded-images">Reading Uploaded Images</a></p> </li> <li><p><a href="#heading-generating-the-pdf">Generating the PDF</a></p> </li> <li><p><a href="#heading-handling-multiple-images">Handling Multiple Images</a></p> </li> <li><p><a href="#heading-configuring-pdf-settings">Configuring PDF Settings</a></p> </li> <li><p><a href="#heading-renaming-and-downloading-the-pdf">Renaming and Downloading the PDF</a></p> </li> <li><p><a href="#heading-demo-how-the-image-to-pdf-tool-works">Demo: How the Image to PDF Tool Works</a></p> </li> <li><p><a href="#heading-important-notes-from-real-world-use">Important Notes from Real-World Use</a></p> </li> <li><p><a href="#heading-common-mistakes-to-avoid">Common Mistakes to Avoid</a></p> </li> <li><p><a href="#heading-conclusion">Conclusion</a></p> </li> </ol> <h2 id="heading-how-image-to-pdf-conversion-works">How Image to PDF Conversion Works</h2> <p>The browser can't directly combine images into a PDF by itself.</p> <p>Instead, we'll use a JavaScript PDF library that creates pages, inserts images, and exports everything as a downloadable PDF document.</p> <p>The process starts when users upload one or multiple images into the browser. JavaScript then reads the image data and prepares it for PDF generation. After that, the tool creates PDF pages, inserts the uploaded images into those pages, and finally exports everything as a downloadable PDF document.</p> <p>Everything happens locally inside the browser.</p> <p>This means users don’t need to upload private files to a server, which makes the process faster and more privacy-friendly.</p> <h2 id="heading-project-setup">Project Setup</h2> <p>This project is intentionally simple.</p> <p>You only need:</p> <ul> <li><p>an HTML file</p> </li> <li><p>a JavaScript file</p> </li> <li><p>a PDF library</p> </li> </ul> <p>No backend or database is required.</p> <h2 id="heading-what-library-are-we-using">What Library Are We Using?</h2> <p>We’ll use the jsPDF library. It allows us to generate PDF files directly in JavaScript.</p> <p>Add it using a CDN:</p> <pre><code class="language-html"><script src="https://cdnjs.cloudflare.com/ajax/libs/jspdf/2.5.1/jspdf.umd.min.js"></script> </code></pre> <p>Once loaded, we can create and export PDF files directly from the browser.</p> <h2 id="heading-creating-the-upload-interface">Creating the Upload Interface</h2> <p>Start with a basic upload area:</p> <pre><code class="language-html"><input type="file" id="upload" multiple accept="image/*"> <button onclick="convertToPDF()"> Convert to PDF </button> </code></pre> <p>This allows users to upload multiple image files and generate the PDF.</p> <p>Here’s what the upload section looks like inside the tool:</p> <p>You can also expand the interface with additional controls for sorting, page settings, margins, and merge modes.</p> <h2 id="heading-reading-uploaded-images">Reading Uploaded Images</h2> <p>After users select files, we need to read them in JavaScript.</p> <p>We can use <code>FileReader</code> for this:</p> <pre><code class="language-javascript">const fileInput = document.getElementById("upload"); const files = fileInput.files; for (const file of files) { const reader = new FileReader(); reader.onload = function (e) { const imageData = e.target.result; console.log(imageData); }; reader.readAsDataURL(file); } </code></pre> <p>This converts uploaded images into readable Base64 data that can later be inserted into the PDF.</p> <h2 id="heading-generating-the-pdf">Generating the PDF</h2> <p>Now we can create the PDF document.</p> <pre><code class="language-javascript">const { jsPDF } = window.jspdf; const pdf = new jsPDF(); </code></pre> <p>Once the PDF is created, images can be inserted into pages:</p> <pre><code class="language-javascript">pdf.addImage(imageData, "JPEG", 10, 10, 180, 120); </code></pre> <p>This inserts the uploaded image into the PDF page at a specific position and size.</p> <p>Finally, export the document:</p> <pre><code class="language-javascript">pdf.save("images.pdf"); </code></pre> <p>This downloads the generated PDF instantly.</p> <h2 id="heading-handling-multiple-images">Handling Multiple Images</h2> <p>If users upload multiple files, each image can be added to its own PDF page automatically.</p> <p>For example:</p> <pre><code class="language-javascript">files.forEach((file, index) => { if (index !== 0) { pdf.addPage(); } }); </code></pre> <p>This creates a new page before inserting the next image into the document.</p> <p>In some situations, users may also want multiple images on the same page instead of one image per page.</p> <p>For example:</p> <pre><code class="language-javascript">pdf.addImage(img1, "JPEG", 10, 20, 80, 80); pdf.addImage(img2, "JPEG", 110, 20, 80, 80); </code></pre> <p>This allows more flexible layouts for galleries, reports, or grouped documents.</p> <h2 id="heading-configuring-pdf-settings">Configuring PDF Settings</h2> <p>Before generating the final PDF, users can customize several layout and output settings.</p> <p>These settings improve document quality and give users more control over the generated file.</p> <p>Here’s what the configuration panel looks like inside the tool:</p> <h3 id="heading-sorting-images">Sorting Images</h3> <p>When multiple images are uploaded, organizing them properly becomes important before generating the PDF.</p> <p>Users may want to sort images alphabetically, reverse the order, or arrange them based on file size.</p> <p>For example, images can be sorted alphabetically like this:</p> <pre><code class="language-javascript">files.sort((a, b) => a.name.localeCompare(b.name)); </code></pre> <p>You can also sort files by size:</p> <pre><code class="language-javascript">files.sort((a, b) => a.size - b.size); </code></pre> <p>Here’s an example of sorting options inside the tool:</p> <p>This helps users organize documents more efficiently before converting them into a PDF.</p> <h3 id="heading-choosing-orientation">Choosing Orientation</h3> <p>Different images work better in different page orientations.</p> <p>Portrait orientation works well for vertical images, while landscape orientation is better for wider images.</p> <p>For example:</p> <pre><code class="language-javascript">const pdf = new jsPDF({ orientation: "portrait" }); </code></pre> <p>You can also switch to <code>"landscape"</code> when needed.</p> <p>Here’s an example of orientation options inside the tool:</p> <h3 id="heading-selecting-page-size">Selecting Page Size</h3> <p>PDF page size controls the dimensions of the generated document.</p> <p>For example:</p> <pre><code class="language-javascript">const pdf = new jsPDF({ unit: "mm", format: "a4" }); </code></pre> <p>This creates an A4-sized PDF document using millimeter units.</p> <p>Other formats like letter, legal, or custom page sizes can also be supported.</p> <p>Here’s an example of selecting page size options inside the tool:</p> <h3 id="heading-adding-margins">Adding Margins</h3> <p>Margins create spacing between the image and the edges of the page.</p> <p>Without margins, images may touch the borders and appear cramped.</p> <p>For example:</p> <pre><code class="language-javascript">const margin = 10; pdf.addImage(imageData, "JPEG", margin, margin, 180, 120); </code></pre> <p>Here’s an example of margins options inside the tool:</p> <p>This creates cleaner spacing around the inserted image.</p> <h3 id="heading-automatic-image-fitting">Automatic Image Fitting</h3> <p>One common issue when generating PDFs from images is incorrect sizing.</p> <p>If images are inserted with fixed dimensions, they may stretch, overflow outside the page, or appear distorted.</p> <p>Instead, it’s better to calculate image dimensions dynamically.</p> <p>For example:</p> <pre><code class="language-javascript">const pageWidth = pdf.internal.pageSize.getWidth(); const imgWidth = pageWidth - 20; const imgHeight = (image.height * imgWidth) / image.width; pdf.addImage(imageData, "JPEG", 10, 10, imgWidth, imgHeight); </code></pre> <p>This automatically scales images proportionally while maintaining margins and layout consistency.</p> <h3 id="heading-merge-options">Merge Options</h3> <p>One useful feature is allowing different output modes.</p> <p>For example, users may want to merge all uploaded images into a single PDF document when creating reports, notes, or combined files.</p> <p>In some cases, users may prefer generating separate PDFs for each image instead of combining everything together. This can be useful when exporting individual documents or scanned pages.</p> <p>Custom grouping is another helpful option because it allows users to combine selected images into multiple PDFs based on their own arrangement or categories.</p> <p>These different output modes make the tool much more flexible for different real-world use cases.</p> <p>A simple selection dropdown works well:</p> <pre><code class="language-html"><select id="mergeMode"> <option>Merge all into Single PDF</option> <option>Create Separate PDFs</option> <option>Custom Grouping</option> </select> </code></pre> <p>Once selected, JavaScript can apply different generation logic based on the chosen mode.</p> <p>Here’s an example of merge mode options inside the tool:</p> <p>This makes the tool more flexible for handling different document workflows.</p> <h2 id="heading-renaming-and-downloading-the-pdf">Renaming and Downloading the PDF</h2> <p>After generating the document, users may want to rename the file before downloading.</p> <p>You can prompt for a filename like this:</p> <pre><code class="language-javascript">const fileName = prompt("Enter PDF name:", "images"); pdf.save(`${fileName}.pdf`); </code></pre> <p>This gives users more control over the exported file.</p> <p>Here’s an example of the rename popup inside the tool:</p> <h2 id="heading-demo-how-the-image-to-pdf-tool-works">Demo: How the Image to PDF Tool Works</h2> <h3 id="heading-step-1-upload-images">Step 1: Upload Images</h3> <p>Users upload one or multiple image files into the browser-based tool.</p> <p>The tool supports common formats like JPG, PNG, and WEBP.</p> <h3 id="heading-step-2-configure-pdf-settings">Step 2: Configure PDF Settings</h3> <p>Users can customize layout settings before generating the PDF.</p> <p>This includes:</p> <ul> <li><p>sorting images</p> </li> <li><p>orientation</p> </li> <li><p>page size</p> </li> <li><p>margins</p> </li> <li><p>merge mode</p> </li> </ul> <p>These settings help create cleaner PDF output.</p> <h3 id="heading-step-3-generate-the-pdf">Step 3: Generate the PDF</h3> <p>Once settings are configured, users click the convert button.</p> <p>The browser processes all uploaded images locally and generates the PDF instantly.</p> <h3 id="heading-step-4-rename-the-generated-file">Step 4: Rename the Generated File</h3> <p>Before downloading, users can rename the generated PDF.</p> <p>This improves organization when exporting multiple documents.</p> <h3 id="heading-step-5-download-the-pdf">Step 5: Download the PDF</h3> <p>Finally, the generated PDF becomes available for download directly in the browser.</p> <p>The entire process works without uploading files to any server.</p> <h2 id="heading-important-notes-from-real-world-use">Important Notes from Real-World Use</h2> <p>When working with large images, performance and memory usage become important.</p> <p>Large images can slow down PDF generation and create unnecessarily large output files.</p> <p>For example, you can limit upload size before processing:</p> <pre><code class="language-plaintext">const MAX_SIZE = 10 * 1024 * 1024; if (file.size > MAX_SIZE) { alert("Image is too large."); return; } </code></pre> <p>Another useful optimization is resizing images before inserting them into the PDF.</p> <p>For example:</p> <pre><code class="language-plaintext">const canvas = document.createElement("canvas"); const ctx = canvas.getContext("2d"); canvas.width = image.width * 0.5; canvas.height = image.height * 0.5; ctx.drawImage(image, 0, 0, canvas.width, canvas.height); const resizedImage = canvas.toDataURL("image/jpeg", 0.7); </code></pre> <p>This reduces image dimensions and compression quality before generating the PDF.</p> <p>It also helps reduce memory usage and improves PDF generation speed for large files.</p> <p>Since everything runs directly inside the browser, uploaded images never leave the user’s device, which improves privacy.</p> <h2 id="heading-common-mistakes-to-avoid">Common Mistakes to Avoid</h2> <p>One common mistake is not validating uploaded files before processing them.</p> <p>For example, users may upload unsupported formats or attempt to generate a PDF without selecting images.</p> <p>Always validate input before processing:</p> <pre><code class="language-javascript">if (!fileInput.files.length) { alert("Please upload images first."); return; } </code></pre> <p>Another issue is inserting very large images without resizing them first.</p> <p>Large images can create oversized PDFs and reduce performance significantly.</p> <p>Incorrect image positioning is also common.</p> <p>If dimensions are hardcoded incorrectly, images may overflow outside the page or become distorted.</p> <p>Using dynamic image sizing and margins helps prevent these layout issues.</p> <h2 id="heading-conclusion">Conclusion</h2> <p>In this tutorial, you built a browser-based image to PDF converter using JavaScript.</p> <p>You learned how to upload images, generate PDF documents, configure layout settings, and export files directly inside the browser.</p> <p>More importantly, you saw how modern browsers can handle document generation locally without relying on a backend server.</p> <p>This approach keeps the tool fast, private, and easy to use.</p> <p>Once you understand this workflow, you can extend it further with features like compression, drag-and-drop sorting, watermarking, batch exports, or advanced PDF editing tools.</p> <p>You can also try a full working version here:</p> <p><a href="https://allinonetools.net/image-to-pdf-converter/">https://allinonetools.net/image-to-pdf-converter/</a></p> <p>And that’s where things start getting really interesting.</p> </article> <article> <h1> How to Build a Self-Learning RAG System with Knowledge Reflection </h1> <p>Daniel Nwaneri — Fri, 24 Apr 2026 20:52:49 +0000</p> <p>Every RAG system I've seen — including the one I wrote a handbook about on this site — has the same fundamental problem.</p> <p>It doesn't learn.</p> <p>You ingest 500 documents. You ask a question. The system retrieves the three most similar chunks and hands them to the LLM. Repeat for the next query.</p> <p>The system knows exactly as much as it did on day one. It's a library that never builds a card catalog, never cross-references its own shelves, never notices that three of its books are saying contradictory things.</p> <p>That's what I set out to fix with a knowledge reflection layer. After every ingest, the system finds semantically related documents already in the index and asks an LLM to synthesise what's new, how it connects, and what gap remains. That synthesis gets embedded, stored, and boosted in search results.</p> <p>The knowledge base gets smarter as you add more documents — not just bigger.</p> <p>This tutorial shows you exactly how to build it.</p> <h2 id="heading-table-of-contents">Table of Contents</h2> <ol> <li><p><a href="#heading-what-you-will-build">What You Will Build</a></p> </li> <li><p><a href="#heading-prerequisites">Prerequisites</a></p> </li> <li><p><a href="#heading-how-to-set-up-the-base-system">How to Set Up the Base System</a></p> </li> <li><p><a href="#heading-why-standard-rag-has-a-memory-problem">Why Standard RAG Has a Memory Problem</a></p> </li> <li><p><a href="#heading-step-1-schema-update">Step 1: Schema Update</a></p> </li> <li><p><a href="#heading-step-2-the-reflection-engine">Step 2: The Reflection Engine</a></p> </li> <li><p><a href="#heading-step-3-consolidation">Step 3: Consolidation</a></p> </li> <li><p><a href="#heading-step-4-wire-it-into-your-ingest-handler">Step 4: Wire It Into Your Ingest Handler</a></p> </li> <li><p><a href="#heading-step-5-boost-reflections-in-search">Step 5: Boost Reflections in Search</a></p> </li> <li><p><a href="#heading-step-6-filtering-by-doc_type">Step 6: Filtering by doc_type</a></p> </li> <li><p><a href="#heading-what-changes-after-you-build-this">What Changes After You Build This</a></p> </li> <li><p><a href="#heading-deploying">Deploying</a></p> </li> <li><p><a href="#heading-what-to-build-next">What to Build Next</a></p> </li> </ol> <h2 id="heading-what-you-will-build">What You Will Build</h2> <p>In this tutorial, you'll build a post-ingest reflection pipeline that:</p> <ol> <li><p>Fires automatically after every document ingest</p> </li> <li><p>Finds the most semantically related documents already in the index</p> </li> <li><p>Asks Kimi K2.5 to synthesise a three-sentence insight linking the new document to existing knowledge</p> </li> <li><p>Stores that reflection with <code>doc_type=reflection</code> and a 1.5× ranking boost in search results</p> </li> <li><p>Consolidates reflections into summaries every three ingests</p> </li> </ol> <p>By the end, searching your knowledge base will surface both raw document chunks and reflection artifacts the system wrote on ingest.</p> <h2 id="heading-prerequisites">Prerequisites</h2> <p>You will need:</p> <ul> <li><p>A Cloudflare account — free tier works</p> </li> <li><p>Node.js v18+ and Wrangler CLI installed (<code>npm install -g wrangler</code>)</p> </li> <li><p>Basic TypeScript familiarity</p> </li> </ul> <p>No external API keys. Everything runs on Cloudflare's infrastructure.</p> <h2 id="heading-how-to-set-up-the-base-system">How to Set Up the Base System</h2> <p>If you have already built the RAG system from my <a href="https://www.freecodecamp.org/news/build-a-production-rag-system-with-cloudflare-workers-handbook">freeCodeCamp handbook</a>, skip this section — your system is ready for the reflection layer.</p> <p>If you're starting fresh, this section gets you to a working base in about 15 minutes.</p> <h3 id="heading-scaffold-the-project">Scaffold the Project</h3> <pre><code class="language-bash">npm create cloudflare@latest rag-reflection-system cd rag-reflection-system </code></pre> <p>Choose: Hello World example → TypeScript → No deploy yet.</p> <h3 id="heading-create-the-vectorize-index-and-d1-database">Create the Vectorize Index and D1 Database</h3> <pre><code class="language-bash">npx wrangler vectorize create rag-index --dimensions=384 --metric=cosine npx wrangler d1 create rag-db </code></pre> <h3 id="heading-configure-wranglertoml">Configure wrangler.toml</h3> <pre><code class="language-toml">name = "rag-reflection-system" main = "src/index.ts" compatibility_date = "2026-01-01" [[vectorize]] binding = "VECTORIZE" index_name = "rag-index" [[d1_databases]] binding = "DB" database_name = "rag-db" database_id = "YOUR_DB_ID" [ai] binding = "AI" </code></pre> <h3 id="heading-create-the-documents-table">Create the <code>documents</code> Table</h3> <pre><code class="language-sql">-- migrations/001_init.sql CREATE TABLE IF NOT EXISTS documents ( id TEXT PRIMARY KEY, content TEXT NOT NULL, source TEXT, date_created TEXT DEFAULT (datetime('now')) ); </code></pre> <pre><code class="language-bash">npx wrangler d1 execute rag-db --remote --file=./migrations/001_init.sql </code></pre> <h3 id="heading-add-the-ingest-and-search-endpoints">Add the <code>ingest</code> and <code>search</code> endpoints</h3> <p>Replace <code>src/index.ts</code> with this minimal working system:</p> <pre><code class="language-typescript">export interface Env { VECTORIZE: VectorizeIndex; DB: D1Database; AI: Ai; } export default { async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> { const url = new URL(request.url); if (url.pathname === '/ingest' && request.method === 'POST') { const { id, content, source } = await request.json() as any; const embResult = await env.AI.run('@cf/baai/bge-small-en-v1.5', { text: [content.slice(0, 512)], }) as any; const vector = embResult.data[0]; await env.VECTORIZE.upsert([{ id, values: vector, metadata: { content: content.slice(0, 1000), source, doc_type: 'raw' }, }]); await env.DB.prepare( 'INSERT OR REPLACE INTO documents (id, content, source) VALUES (?, ?, ?)' ).bind(id, content, source ?? '').run(); return Response.json({ success: true, id }); } if (url.pathname === '/search' && request.method === 'POST') { const { query } = await request.json() as any; const embResult = await env.AI.run('@cf/baai/bge-small-en-v1.5', { text: [query], }) as any; const vector = embResult.data[0]; const results = await env.VECTORIZE.query(vector, { topK: 5, returnMetadata: 'all', }); const context = results.matches .map(m => m.metadata?.content as string) .filter(Boolean) .join('\n\n'); const answer = await env.AI.run('@cf/moonshotai/kimi-k2.5', { messages: [ { role: 'system', content: 'Answer using only the context provided.' }, { role: 'user', content: `Context:\n${context}\n\nQuestion: ${query}` }, ], max_tokens: 256, }) as any; return Response.json({ answer: answer.response, sources: results.matches.map(m => m.id) }); } return new Response('RAG system running', { status: 200 }); }, }; </code></pre> <h3 id="heading-deploy-and-verify">Deploy and Verify</h3> <pre><code class="language-bash">npx wrangler deploy </code></pre> <p>Test it:</p> <pre><code class="language-bash"># Ingest a document curl -X POST https://your-worker.workers.dev/ingest \ -H "Content-Type: application/json" \ -d '{"id": "doc-001", "content": "Cursor pagination beats offset pagination for live-updating datasets because offset becomes unreliable when rows are inserted or deleted during pagination."}' # Search curl -X POST https://your-worker.workers.dev/search \ -H "Content-Type: application/json" \ -d '{"query": "what pagination approach should I use?"}' </code></pre> <p>If you get a grounded answer back, the base system is working. The next sections add the reflection layer on top of this foundation.</p> <h2 id="heading-why-standard-rag-has-a-memory-problem">Why Standard RAG Has a Memory Problem</h2> <p>Standard RAG retrieval is stateless. Every query goes in cold. The system has no memory of what it found before, no synthesis of what it learned across documents, and no growing understanding of what questions remain unanswered.</p> <p>Imagine you've ingested 200 documents about your product. Twelve of them touch on a pricing decision made last year. No single one has the full picture — it's distributed across quarterly reports, meeting notes, an internal Slack export, a few Notion pages.</p> <p>A user asks: "Why did we change our pricing structure?"</p> <p>Standard RAG retrieves the three most similar chunks. If those three chunks collectively have the answer, great. If they don't — if the real answer requires synthesising across those twelve documents — the system has no mechanism for that. It returns fragments. The LLM makes its best guess.</p> <p>The reflection layer addresses this directly. When the twelfth pricing document gets ingested, the system finds the eleven related documents, synthesises what connects them, and stores that synthesis as a retrievable artifact. The answer to "why did we change our pricing structure" exists in the index before anyone asks the question.</p> <p>Not smarter retrieval — smarter indexing.</p> <h2 id="heading-step-1-schema-update">Step 1: Schema Update</h2> <p>The reflection layer needs two new fields in your D1 documents table. Run this migration:</p> <pre><code class="language-sql">-- migrations/003_add_reflection_fields.sql ALTER TABLE documents ADD COLUMN doc_type TEXT DEFAULT 'raw'; ALTER TABLE documents ADD COLUMN reflection_score REAL DEFAULT 0; ALTER TABLE documents ADD COLUMN parent_reflection_id TEXT; </code></pre> <p>Apply it:</p> <pre><code class="language-bash">wrangler d1 execute mcp-knowledge-db --remote --file=./migrations/003_add_reflection_fields.sql </code></pre> <p><code>doc_type</code> distinguishes raw documents (<code>raw</code>), single-document reflections (<code>reflection</code>), and consolidated multi-reflection summaries (<code>summary</code>). You'll use this field to filter — exposing only reflections to users who want the distilled view, or excluding them for users who want raw source chunks.</p> <h2 id="heading-step-2-the-reflection-engine">Step 2: The Reflection Engine</h2> <p>Create <code>src/engines/reflection.ts</code>. This is the core of the layer.</p> <pre><code class="language-typescript">import { Env } from '../types/env'; import { resolveEmbeddingModel, resolveReflectionModel } from '../config/models'; const REFLECTION_BOOST = 1.5; const CONSOLIDATION_THRESHOLD = 3; // consolidate every N new reflections export async function reflect( newDocId: string, newDocContent: string, env: Env ): Promise<void> { // 1. Find semantically related documents already in the index const embModel = resolveEmbeddingModel(env.EMBEDDING_MODEL); const embResult = await env.AI.run(embModel.id as any, { text: [newDocContent.slice(0, 512)], }); const queryVector = (embResult as any).data?.[0]; if (!queryVector) return; const related = await env.VECTORIZE.query(queryVector, { topK: 5, filter: { doc_type: { $eq: 'raw' } }, returnMetadata: 'all', }); const relatedDocs = (related.matches ?? []).filter( m => m.id !== newDocId && (m.score ?? 0) > 0.65 ); if (relatedDocs.length === 0) return; // nothing related yet — skip // 2. Build synthesis prompt const relatedSummaries = relatedDocs .slice(0, 3) .map((m, i) => `Document ${i + 1}: ${String(m.metadata?.content ?? '').slice(0, 300)}`) .join('\n\n'); const prompt = `You are synthesising knowledge across documents in a knowledge base. New document: ${newDocContent.slice(0, 600)} Related existing documents: ${relatedSummaries} Write exactly three sentences: 1. What the new document adds that the existing documents don't already cover 2. How the new document connects to or extends the existing documents 3. What gap or question remains unanswered across all these documents Be specific. Reference actual content. Do not summarise — synthesise.`; // 3. Call the reflection model const reflModel = resolveReflectionModel(env.REFLECTION_MODEL); const llmResp = await env.AI.run(reflModel.id as any, { messages: [{ role: 'user', content: prompt }], max_tokens: 180, }); const reflectionText = (llmResp as any)?.response?.trim(); if (!reflectionText || reflectionText.length < 40) return; // 4. Embed and store the reflection const reflEmbResult = await env.AI.run(embModel.id as any, { text: [reflectionText], }); const reflVector = (reflEmbResult as any).data?.[0]; if (!reflVector) return; const reflectionId = `refl_${newDocId}_${Date.now()}`; await env.VECTORIZE.upsert([ { id: reflectionId, values: reflVector, metadata: { content: reflectionText, doc_type: 'reflection', parent_id: newDocId, reflection_score: REFLECTION_BOOST, source_doc_ids: relatedDocs.map(m => m.id).join(','), date_created: new Date().toISOString(), }, }, ]); await env.DB.prepare( `INSERT INTO documents (id, content, doc_type, reflection_score, parent_id, date_created) VALUES (?, ?, 'reflection', ?, ?, ?)` ) .bind(reflectionId, reflectionText, REFLECTION_BOOST, newDocId, new Date().toISOString()) .run(); // 5. Check if consolidation is due const recentCount = await env.DB .prepare(`SELECT COUNT(*) as cnt FROM documents WHERE doc_type = 'reflection' AND date_created > datetime('now', '-1 hour')`) .first<{ cnt: number }>(); if ((recentCount?.cnt ?? 0) >= CONSOLIDATION_THRESHOLD) { await consolidate(env); } } </code></pre> <p>Two things worth noting here.</p> <p>First, the semantic threshold (<code>score > 0.65</code>) matters. Too low and you're synthesising unrelated documents. Too high and you're rarely finding connections. 0.65 works well with <code>bge-small</code>. You can bump it to 0.72 with <code>qwen3-0.6b</code> (1024d) where scores cluster higher.</p> <p>The prompt structure is deliberate. Three sentences, each doing a specific job: what's new, how it connects, what remains. This keeps reflections useful for retrieval. A freeform synthesis prompt produces beautiful prose that doesn't retrieve well. This structure produces retrievable artifacts.</p> <h2 id="heading-step-3-consolidation">Step 3: Consolidation</h2> <p>As reflections accumulate, they need their own synthesis layer — otherwise you're adding noise at a higher abstraction level.</p> <p>Add this to <code>src/engines/reflection.ts</code>:</p> <pre><code class="language-typescript">export async function consolidate(env: Env): Promise<void> { // Fetch recent reflections not yet consolidated const recent = await env.DB .prepare( `SELECT id, content FROM documents WHERE doc_type = 'reflection' AND id NOT IN ( SELECT DISTINCT parent_id FROM documents WHERE doc_type = 'summary' AND parent_id IS NOT NULL ) ORDER BY date_created DESC LIMIT 6` ) .all<{ id: string; content: string }>(); if (!recent.results || recent.results.length < CONSOLIDATION_THRESHOLD) return; const reflectionTexts = recent.results.map((r, i) => `Reflection ${i + 1}: ${r.content}`).join('\n\n'); const prompt = `You are consolidating multiple knowledge reflections into a single compressed insight. ${reflectionTexts} Write two to three sentences that capture the most important cross-cutting pattern or tension across these reflections. What does the knowledge base now understand that it didn't before these documents were added? What's the most important open question? Be precise. No preamble.`; const reflModel = resolveReflectionModel(env.REFLECTION_MODEL); const llmResp = await env.AI.run(reflModel.id as any, { messages: [{ role: 'user', content: prompt }], max_tokens: 320, }); const summaryText = (llmResp as any)?.response?.trim(); if (!summaryText || summaryText.length < 40) return; const embModel = resolveEmbeddingModel(env.EMBEDDING_MODEL); const embResult = await env.AI.run(embModel.id as any, { text: [summaryText] }); const summaryVector = (embResult as any).data?.[0]; if (!summaryVector) return; const summaryId = `summary_${Date.now()}`; await env.VECTORIZE.upsert([ { id: summaryId, values: summaryVector, metadata: { content: summaryText, doc_type: 'summary', reflection_score: REFLECTION_BOOST * 1.2, source_reflection_ids: recent.results.map(r => r.id).join(','), date_created: new Date().toISOString(), }, }, ]); await env.DB.prepare( `INSERT INTO documents (id, content, doc_type, reflection_score, date_created) VALUES (?, ?, 'summary', ?, ?)` ) .bind(summaryId, summaryText, REFLECTION_BOOST * 1.2, new Date().toISOString()) .run(); } </code></pre> <p>Summaries get a 1.2× multiplier on top of the base reflection boost. In search results, a summary synthesising twelve related documents should rank above any single document chunk on broad conceptual queries. On specific factual queries, the raw chunks will score higher. The ranking sorts itself.</p> <h2 id="heading-step-4-wire-it-into-your-ingest-handler">Step 4: Wire It Into Your Ingest Handler</h2> <p>The reflection runs as a background job. It doesn't block the ingest response — that would add 2–3 seconds to every ingest call.</p> <p>In your <code>src/handlers/ingest.ts</code>, after you've stored the document:</p> <pre><code class="language-typescript">import { reflect } from '../engines/reflection'; // ... existing ingest logic ... // After VECTORIZE.upsert() and DB insert succeed: ctx.waitUntil( reflect(documentId, content, env).catch(err => { console.warn('[reflection] failed for', documentId, err.message); }) ); return new Response(JSON.stringify({ success: true, documentId, chunks: chunkCount, // ... rest of response }), { headers: { 'Content-Type': 'application/json' } }); </code></pre> <p><code>ctx.waitUntil()</code> is the Cloudflare Workers primitive for background work. The response returns immediately. The reflection runs after. The ingest API stays fast.</p> <p>The <code>.catch()</code> is important. A failed reflection should never fail an ingest. Raw documents are the source of truth. Reflections are derived value — useful, but not critical path.</p> <h2 id="heading-step-5-boost-reflections-in-search">Step 5: Boost Reflections in Search</h2> <p>Add the reflection boost to your ranking logic in <code>src/engines/hybrid.ts</code>. After RRF fusion and before returning results:</p> <pre><code class="language-typescript">// Apply reflection boost const boosted = results.map(r => ({ ...r, score: r.doc_type === 'reflection' || r.doc_type === 'summary' ? r.score * (r.reflection_score ?? 1.5) : r.score, })); return boosted.sort((a, b) => b.score - a.score); </code></pre> <p>This is a post-fusion boost, not a pre-fusion rerank. The reasoning: apply RRF across all results first, so reflections earn their place on raw relevance before getting boosted. A reflection that would not rank in the top 20 on raw similarity shouldn't appear just because it has a boost multiplier.</p> <h2 id="heading-step-6-filtering-by-doctype">Step 6: Filtering by <code>doc_type</code></h2> <p>Your search endpoint should accept a <code>doc_type</code> filter so callers can control what they see:</p> <pre><code class="language-typescript">// In your search request handler: const docTypeFilter = body.filters?.doc_type; // Pass to Vectorize query: const vectorFilter: Record<string, unknown> = {}; if (docTypeFilter) { vectorFilter.doc_type = docTypeFilter; } </code></pre> <p>This gives callers three modes:</p> <pre><code class="language-bash"># Only reflections and summaries POST /search { "query": "pricing decisions", "filters": { "doc_type": { "$in": ["reflection", "summary"] } } } # Only source documents POST /search { "query": "pricing decisions", "filters": { "doc_type": { "$eq": "raw" } } } # Default: all types, reflections boosted POST /search { "query": "pricing decisions" } </code></pre> <p>The default (no filter) is the most useful. Let the boost do its job. Restrict to raw when you need citations. Restrict to reflections when you want the synthesised view.</p> <h2 id="heading-what-changes-after-you-build-this">What Changes After You Build This</h2> <p>At 200 documents, the difference becomes noticeable. Queries that previously returned five fragmented chunks now surface a reflection that already synthesised those chunks. Broad conceptual queries — "what do we know about X?" — start returning genuinely useful summaries instead of just the most-similar individual paragraph.</p> <p>At 2,000 documents, the reflection layer is the most valuable part of the system. The raw chunks answer specific factual questions. The reflections and summaries answer conceptual questions that could not be answered from any single document. The system has learned something no individual document contains.</p> <p>One failure mode worth knowing: if your embedding model has poor semantic clustering — old <code>bge-small</code> at 384d with mixed-domain documents — the related-documents retrieval step will surface weak connections and produce shallow reflections. The 0.65 threshold filters most of this out, but if you're seeing reflections that seem off-topic, your embeddings are the first thing to check.</p> <h2 id="heading-deploying">Deploying</h2> <pre><code class="language-bash">wrangler d1 execute mcp-knowledge-db --remote --file=./migrations/003_add_reflection_fields.sql wrangler deploy </code></pre> <p>Then ingest a few documents and watch what happens:</p> <pre><code class="language-bash"># Ingest document 1 curl -X POST https://your-worker.workers.dev/ingest \ -H "Authorization: Bearer YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{"id": "doc-001", "content": "Your document text here..."}' # After a few seconds, check if a reflection was created curl "https://your-worker.workers.dev/search" \ -H "Authorization: Bearer YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{"query": "your topic", "filters": {"doc_type": {"$eq": "reflection"}}}' </code></pre> <p>Reflections won't appear until there are related documents to synthesise. Ingest at least three documents on similar topics before expecting to see them.</p> <h2 id="heading-what-to-build-next">What to Build Next</h2> <p>The reflection layer as described here fires after every ingest. That's expensive at high ingest volume: if you're batch-importing 10,000 documents, you don't want 10,000 individual reflection calls.</p> <p>For bulk ingestion, gate it: call <code>reflect()</code> only when a document's similarity search returns a match above 0.8, or batch-run reflection after the bulk import completes. The <code>POST /ingest/batch</code> endpoint in the <a href="https://github.com/dannwaneri/vectorize-mcp-worker">full repo</a> does this.</p> <p>The second thing worth building: surfacing reflections in your UI with a visual distinction. A search result that's a reflection should look different from a raw chunk. In the dashboard included in the repo, reflections render with a <code>💡</code> badge and a "synthesised from N documents" note.</p> <p>Full source at <a href="https://github.com/dannwaneri/vectorize-mcp-worker">github.com/dannwaneri/vectorize-mcp-worker</a> — reflection engine, consolidation, batch ingest, dashboard, OpenAPI spec.</p> <p>The codebase is TypeScript, deploys with a single <code>wrangler deploy</code>, runs for roughly $1–5/month at 10,000 queries/day.</p> <p>Standard RAG retrieves. This learns.</p> </article> <article> <h1> How to Merge PDF Files in the Browser Using JavaScript (Step-by-Step) </h1> <p>Bhavin Sheth — Wed, 22 Apr 2026 16:36:06 +0000</p> <p>Working with PDFs is something almost every developer needs to know how to do.</p> <p>Sometimes you need to combine reports or invoices, or simply merge multiple documents into a single clean file.</p> <p>Most tools that handle this either require installing software or uploading files to a server, which can be slow and not always ideal – especially when dealing with private documents.</p> <p>But what if you could merge PDFs directly in the browser, without any backend?</p> <p>That’s exactly what we’ll build in this tutorial.</p> <p>By the end, you’ll have a fully working browser-based PDF merger. It will allow users to upload files, preview them, reorder documents using drag-and-drop, select specific pages, and download the final merged PDF instantly.</p> <h2 id="heading-table-of-contents">Table of Contents</h2> <ol> <li><p><a href="#heading-how-pdf-merging-works-in-the-browser">How PDF Merging Works in the Browser</a></p> </li> <li><p><a href="#heading-project-setup">Project Setup</a></p> </li> <li><p><a href="#heading-what-library-are-we-using">What Library Are We Using?</a></p> </li> <li><p><a href="#heading-creating-the-upload-interface">Creating the Upload Interface</a></p> </li> <li><p><a href="#heading-rendering-pdf-previews">Rendering PDF Previews</a></p> </li> <li><p><a href="#heading-reordering-files-drag-and-drop">Reordering Files Drag and Drop</a></p> </li> <li><p><a href="#heading-sorting-and-reordering-pdfs-important">Sorting and Reordering PDFs (Important)</a></p> </li> <li><p><a href="#heading-merging-pdfs-using-javascript">Merging PDFs Using JavaScript</a></p> </li> <li><p><a href="#heading-improving-user-experience">Improving User Experience</a></p> </li> <li><p><a href="#heading-demo-how-the-pdf-merger-works">Demo: How the PDF Merger Works</a></p> </li> <li><p><a href="#heading-important-notes-from-real-world-use">Important Notes from Real-World Use</a></p> </li> <li><p><a href="#heading-common-mistakes-to-avoid">Common Mistakes to Avoid</a></p> </li> <li><p><a href="#heading-conclusion">Conclusion</a></p> </li> </ol> <h2 id="heading-how-pdf-merging-works-in-the-browser">How PDF Merging Works in the Browser</h2> <p>At a high level, merging PDFs means loading multiple PDF files, extracting pages from each, and combining them into a single document.</p> <p>Traditionally, this process happens on a server. Files are uploaded, processed, and then returned to the user.</p> <p>But modern JavaScript libraries make it possible to do all of this directly in the browser. Instead of sending files anywhere, the entire process runs locally on the user’s device.</p> <p>This approach has a few practical advantages. It makes the process faster because there’s no upload time involved. It also improves privacy, since files never leave the user’s system. And from a development perspective, it removes the need for backend processing altogether.</p> <h2 id="heading-project-setup">Project Setup</h2> <p>We’ll keep this project simple.</p> <p>You only need:</p> <ul> <li><p>an HTML file</p> </li> <li><p>JavaScript</p> </li> <li><p>a few libraries</p> </li> </ul> <p>No backend required.</p> <h2 id="heading-what-library-are-we-using">What Library Are We Using?</h2> <p>We’ll use two important libraries:</p> <pre><code class="language-html"><script src="https://unpkg.com/pdf-lib@1.17.1/dist/pdf-lib.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.16.105/pdf.min.js"></script> </code></pre> <ul> <li><p>We'll use <strong>pdf-lib</strong> to merge and modify PDFs</p> </li> <li><p>We'll use <strong>pdf.js</strong> to render previews in the browser</p> </li> </ul> <p>This combination is very powerful and commonly used in real projects.</p> <h2 id="heading-creating-the-upload-interface">Creating the Upload Interface</h2> <p>Start with a simple drag-and-drop area:</p> <pre><code class="language-html"><div id="upload-area"> <input type="file" id="file-input" multiple accept="application/pdf"> </div> </code></pre> <p>Users can either drag files or click to select.</p> <p>Once files are selected, we read them using:</p> <pre><code class="language-JavaScript">const arrayBuffer = await file.arrayBuffer(); </code></pre> <p>This allows us to pass the file into our PDF libraries.</p> <h2 id="heading-rendering-pdf-previews">Rendering PDF Previews</h2> <p>To improve usability, we'll show a preview of each uploaded PDF.</p> <p>Using <strong>pdf.js</strong>, we can render pages like this:</p> <pre><code class="language-js">const pdf = await pdfjsLib.getDocument(arrayBuffer).promise; const page = await pdf.getPage(1); const viewport = page.getViewport({ scale: 1.5 }); canvas.height = viewport.height; canvas.width = viewport.width; page.render({ canvasContext: context, viewport: viewport }); </code></pre> <p>This gives users visual feedback before merging.</p> <h2 id="heading-reordering-files-drag-and-drop">Reordering Files (Drag and Drop)</h2> <p>Order matters when merging PDFs.</p> <p>Instead of forcing users to upload in sequence, we'll allow reordering.</p> <p>We can use a library like <strong>Sortable.js</strong> for this:</p> <pre><code class="language-js">new Sortable(document.getElementById('pdf-grid'), { animation: 150 }); </code></pre> <p>This enables drag-and-drop sorting and instant visual updates.</p> <h2 id="heading-sorting-and-reordering-pdfs-important">Sorting and Reordering PDFs (Important)</h2> <p>This is where the tool becomes more practical in real-world use.</p> <p>Instead of forcing users to upload files in a specific order, the tool allows them to rearrange PDFs before merging.</p> <p>Users can manually drag and drop files to adjust the sequence, or use built-in sorting options such as arranging files alphabetically or by file size. This makes it easy to quickly organize multiple documents without re-uploading them.</p> <p>This flexibility ensures that the final merged document follows the exact order the user needs. In real-world scenarios, this is especially useful when combining reports, invoices, or other documents where sequence is important.</p> <p>Here’s a simple example of how you might sort uploaded files:</p> <pre><code class="language-javascript">function sortFiles(files, type) { return files.sort((a, b) => { if (type === "name-asc") { return a.name.localeCompare(b.name); } if (type === "name-desc") { return b.name.localeCompare(a.name); } if (type === "size-asc") { return a.size - b.size; } if (type === "size-desc") { return b.size - a.size; } return 0; }); } </code></pre> <p>This allows precise control over what gets merged.</p> <h2 id="heading-merging-pdfs-using-javascript">Merging PDFs Using JavaScript</h2> <p>Now comes the core logic. We'll use <strong>pdf-lib</strong> to combine pages:</p> <pre><code class="language-js">const { PDFDocument } = PDFLib; const mergedPdf = await PDFDocument.create(); for (const file of files) { const pdf = await PDFDocument.load(file.arrayBuffer); const pages = await mergedPdf.copyPages(pdf, selectedPages); pages.forEach(page => mergedPdf.addPage(page)); } const pdfBytes = await mergedPdf.save(); </code></pre> <p>Finally, we'll create a downloadable file:</p> <pre><code class="language-js">const blob = new Blob([pdfBytes], { type: 'application/pdf' }); </code></pre> <h2 id="heading-improving-user-experience">Improving User Experience</h2> <p>A simple merge tool works, but a good tool feels smooth.</p> <p>Small improvements make a big difference.</p> <p>For example:</p> <ul> <li><p>showing previews before merging</p> </li> <li><p>allowing users to remove files</p> </li> <li><p>enabling page navigation</p> </li> <li><p>providing instant feedback</p> </li> </ul> <p>These details turn a basic feature into a real product.</p> <h2 id="heading-demo-how-the-pdf-merger-works">Demo: How the PDF Merger Works</h2> <p>Here’s how the full flow looks in practice:</p> <h3 id="heading-step-1-upload-pdfs">Step 1: Upload PDFs</h3> <p>Users can drag and drop PDF files into the upload area or select them manually.</p> <h3 id="heading-step-2-preview-files">Step 2: Preview Files</h3> <p>Each uploaded file is displayed with a preview as well as pdf files details (name, size, nos of page, and so on), so users can verify the content before merging.</p> <h3 id="heading-step-3-reorder-files">Step 3: Reorder Files</h3> <p>Users can arrange the order of PDFs using drag-and-drop or sorting options as well as manual options. This ensures the final merged document follows the correct sequence.</p> <h3 id="heading-step-4-merge-pdfs">Step 4: Merge PDFs</h3> <p>Once everything is arranged, users can click the merge button to combine all selected PDFs into a single file.</p> <h3 id="heading-step-5-download-the-final-pdf">Step 5: Download the Final PDF</h3> <p>The merged PDF is generated instantly in the browser, and users can preview , rename, and download it without any server interaction.</p> <h2 id="heading-important-notes-from-real-world-use">Important Notes from Real-World Use</h2> <p>When building tools like a PDF merger, handling large files efficiently becomes important.</p> <p>If multiple large PDFs are loaded at once, it can slow down the browser or consume too much memory. Instead of processing everything at once, it’s better to handle files step by step.</p> <p>For example, instead of loading all PDFs together, you can process them one by one:</p> <pre><code class="language-javascript">const { PDFDocument } = PDFLib; const mergedPdf = await PDFDocument.create(); for (const file of files) { const arrayBuffer = await file.arrayBuffer(); const pdf = await PDFDocument.load(arrayBuffer); const pages = await mergedPdf.copyPages(pdf, pdf.getPageIndices()); pages.forEach(page => mergedPdf.addPage(page)); } </code></pre> <p>This approach keeps memory usage lower and avoids freezing the browser when working with larger files.</p> <p>You can also improve performance by limiting file size or the number of files users can upload at once. This helps keep the tool responsive even on lower-powered devices.</p> <p>Another important aspect is privacy. Since everything runs directly in the browser, files are never uploaded to a server. This means sensitive documents stay on the user’s device.</p> <p>But it’s still important to be transparent about this. In real-world tools, you should clearly mention that all processing happens locally and no files are stored or transmitted.</p> <p>This client-side approach improves both performance and user trust, especially when working with private or confidential documents.</p> <h2 id="heading-common-mistakes-to-avoid">Common Mistakes to Avoid</h2> <p>A common mistake is skipping validation. If users upload invalid files or empty inputs, the merge process can fail.</p> <p>Another issue is ignoring page ranges. If parsing is incorrect, users may get unexpected results.</p> <p>Also, relying on fixed layouts or assumptions can break the experience across different files. Testing with different PDF types is important.</p> <h2 id="heading-conclusion">Conclusion</h2> <p>In this tutorial, you built a browser-based PDF merger using JavaScript.</p> <p>More importantly, you learned how to process files locally in the browser, render previews for better usability, handle user input safely, and manage dynamic document structures when working with PDFs.</p> <p>This approach removes the need for a backend and keeps everything fast, private, and efficient.</p> <p>Once you understand this pattern, you can extend it to build more advanced tools. For example, you could create features like PDF splitting, compression, editing, or other document-based utilities using the same core ideas.</p> <p>And that’s where things start getting really interesting.</p> </article> <article> <h1> How to Build a Fashion App That Helps You Organize Your Wardrobe </h1> <p>Mokshita V P — Tue, 14 Apr 2026 16:26:39 +0000</p> <p>I used to spend too long deciding what to wear, even when my closet was full.</p> <p>That frustration made the problem feel very clear to me: it was not about having fewer clothes. It was about having better organization, better visibility, and better guidance when making outfit decisions.</p> <p>So I built a fashion web app that helps users organize their wardrobe, get outfit suggestions, evaluate shopping decisions, and improve recommendations over time using feedback.</p> <p>In this article, I’ll walk through what the app does, how I built it, the decisions I made along the way, and the challenges that shaped the final result.</p> <h2 id="heading-table-of-contents">Table of Contents</h2> <ul> <li><p><a href="#heading-table-of-contents">Table of Contents</a></p> </li> <li><p><a href="#heading-what-the-app-does">What the App Does</a></p> </li> <li><p><a href="#heading-why-i-built-it">Why I Built It</a></p> </li> <li><p><a href="#heading-tech-stack">Tech Stack</a></p> </li> <li><p><a href="#heading-product-walkthrough-what-users-see">Product Walkthrough (What Users See)</a></p> </li> <li><p><a href="#heading-how-i-built-it">How I Built It</a></p> </li> <li><p><a href="#heading-challenges-i-faced">Challenges I Faced</a></p> </li> <li><p><a href="#heading-what-i-learned">What I Learned</a></p> </li> <li><p><a href="#heading-what-i-want-to-improve-next">What I Want to Improve Next</a></p> </li> <li><p><a href="#heading-future-improvements">Future Improvements</a></p> </li> <li><p><a href="#heading-conclusion">Conclusion</a></p> </li> </ul> <h2 id="heading-what-the-app-does">What the App Does</h2> <p>At a high level, the app combines six core capabilities:</p> <ol> <li><p>Wardrobe management</p> </li> <li><p>Outfit recommendations</p> </li> <li><p>Shopping suggestions</p> </li> <li><p>Discard recommendations</p> </li> <li><p>Feedback and usage tracking</p> </li> <li><p>Secure multi-user accounts</p> </li> </ol> <p>Users can upload clothing items, explore suggested outfits, and mark recommendations as helpful or not helpful. They can also rate outfits and track whether items are worn, kept, or discarded.</p> <p>That feedback becomes structured data for improving future recommendation quality.</p> <h2 id="heading-why-i-built-it">Why I Built It</h2> <p>I wanted to create something that felt personal and actually useful. A lot of fashion apps look polished, but they do not always help with everyday decisions. My goal was to build something that could make wardrobe management easier and outfit selection less overwhelming. The app needed to do three things well:</p> <ul> <li><p>store each user’s wardrobe data</p> </li> <li><p>personalize recommendations</p> </li> <li><p>learn from user feedback over time .</p> </li> </ul> <p>That feedback loop mattered to me because it makes the app feel more alive instead of static.</p> <h2 id="heading-tech-stack">Tech Stack</h2> <p>Here are the tools I used to built the app:</p> <ul> <li><p>Frontend: React + Vite</p> </li> <li><p>Backend: FastAPI</p> </li> <li><p>Database: SQLite (local development)</p> </li> <li><p>Background jobs: Celery + Redis</p> </li> <li><p>Authentication: JWT (access + refresh token flow)</p> </li> <li><p>Deployment support: Docker and GitHub Codespaces</p> </li> </ul> <p>This ended up giving me a pretty modular setup, which helped a lot as features started increasing: fast frontend iteration, clean API boundaries, and room to evolve recommendations separately from UI.</p> <h2 id="heading-product-walkthrough-what-users-see">Product Walkthrough (What Users See)</h2> <h3 id="heading-1-onboarding-and-account-setup">1. Onboarding and Account Setup</h3> <p>To start using the app, a user needs to register, verify their email, and complete some profile basics.</p> <p>Each account is isolated, so wardrobe history and recommendations stay user-specific.</p> <p>In this onboarding screen above, you can see account creation, email verification, and profile fields for body shape, height, weight, and style preferences.</p> <h3 id="heading-2-wardrobe-upload">2. Wardrobe Upload</h3> <p>Users can upload clothing images .</p> <p>Image analysis labels each item and makes it searchable for recommendations. The wardrobe upload form shows image analysis results with category, dominant color, secondary color, and pattern details listed.</p> <h3 id="heading-3-outfit-recommendations">3. Outfit Recommendations</h3> <p>Users can request recommendations, then rate outputs.</p> <p>Above you can see the outfit recommendation dashboard that shows ranked outfit cards with feedback and rating actions. Recommendations are ranked by a weighted scoring model.</p> <h3 id="heading-4-shopping-and-discard-assistants">4. Shopping and Discard Assistants</h3> <p>The app evaluates new items against existing wardrobe data and flags low-value wardrobe items that may be worth removing.</p> <p>You can see the recommendation scores, written reasons (not just a binary decision), and styling guidance for each item above. It also features a "how to style it" incase the user still wants to keep the item.</p> <h2 id="heading-how-i-built-it">How I Built It</h2> <h3 id="heading-1-frontend-setup-react-vite">1. Frontend Setup (React + Vite)</h3> <p>I used React + Vite because I wanted fast iteration and a clean component structure.</p> <p>The frontend is split into feature areas like onboarding, wardrobe management, outfits, shopping, and discarded-item suggestions. I also keep API calls in a service layer so the UI components stay focused on rendering and interaction.</p> <p>The snippet below is a simplified example of the API service pattern used in the app. It is not meant to be copy-pasted as-is, but it shows the same structure the frontend uses when talking to the backend.</p> <p>Example API client pattern:</p> <pre><code class="language-javascript">export async function getOutfitRecommendations(userId, params = {}) { const query = new URLSearchParams(params).toString(); const url = `/users/${userId}/outfits/recommend${query ? `?${query}` : ""}`; const response = await fetch(url, { headers: { Authorization: `Bearer ${localStorage.getItem("access_token")}`, }, }); if (!response.ok) { throw new Error("Failed to fetch outfit recommendations"); } return response.json(); } </code></pre> <p>Here's what's happening in that snippet:</p> <ul> <li><p><code>URLSearchParams</code> builds optional query strings like <code>occasion</code>, <code>season</code>, or <code>limit</code>.</p> </li> <li><p>The request path is user-scoped, which keeps each user’s recommendations isolated.</p> </li> <li><p>The <code>Authorization</code> header sends the access token so the backend can verify the session.</p> </li> <li><p>The response is checked before parsing so the UI can surface a useful error if the request fails.</p> </li> </ul> <p>This pattern kept the frontend simple and reusable as the number of API calls grew.</p> <h3 id="heading-2-backend-architecture-with-fastapi">2. Backend Architecture with FastAPI</h3> <p>The backend is organized around clear route groups:</p> <ul> <li><p>auth routes for register, login, refresh, logout, and sessions</p> </li> <li><p>user analysis routes</p> </li> <li><p>wardrobe CRUD routes</p> </li> <li><p>recommendation routes for outfits, shopping, and discard analysis</p> </li> <li><p>feedback routes for ratings and helpfulness signals</p> </li> </ul> <p>One of the most important design choices was enforcing ownership checks on user-scoped resources. That prevented one user from accessing another user’s wardrobe or feedback data.</p> <p>The backend snippet below is another simplified example from the app’s route layer. It shows the request validation and orchestration logic, while the actual scoring work stays in the recommendation service.</p> <pre><code class="language-python">@app.get("/users/{user_id}/outfits/recommend") def recommend_outfits(user_id: int, occasion: str | None = None, season: str | None = None, limit: int = 10): user = get_user_or_404(user_id) wardrobe_items = get_user_wardrobe(user_id) if len(wardrobe_items) < 2: raise HTTPException(status_code=400, detail="Not enough wardrobe items") recommendations = outfit_generator.generate_outfit_recommendations( wardrobe_items=wardrobe_items, body_shape=user.body_shape, undertone=user.undertone, occasion=occasion, season=season, top_k=limit, ) return {"user_id": user_id, "recommendations": recommendations} </code></pre> <p>Here's how to read that code:</p> <ul> <li><p><code>get_user_or_404</code> loads the profile data needed for personalization.</p> </li> <li><p><code>get_user_wardrobe</code> fetches only the current user’s items.</p> </li> <li><p>The minimum wardrobe check prevents the recommendation logic from running on incomplete data.</p> </li> <li><p><code>generate_outfit_recommendations</code> handles the scoring logic separately, which keeps the route handler small and easier to test.</p> </li> <li><p>The response returns the results in a shape the frontend can consume directly.</p> </li> </ul> <p>That separation helped keep the API layer readable while the recommendation logic stayed isolated in its own service.</p> <h3 id="heading-3-recommendation-logic">3. Recommendation Logic</h3> <p>I intentionally started with deterministic rules before introducing heavy ML. That made behavior easier to debug and explain.</p> <p>The outfit recommender scores combinations using weighted signals:</p> <p>$$\text{outfit score} = 0.4 \cdot \text{color harmony} + 0.4 \cdot \text{body-shape fit} + 0.2 \cdot \text{undertone fit}$$</p> <p>The snippet below is a simplified example from the recommendation engine. It shows how the app combines multiple signals into a single score:</p> <pre><code class="language-python">def score_outfit(combo, user_context): color_score = color_harmony.score(combo) shape_score = body_shape_rules.score(combo, user_context.body_shape) undertone_score = undertone_rules.score(combo, user_context.undertone) total = 0.4 * color_score + 0.4 * shape_score + 0.2 * undertone_score return round(total, 3) </code></pre> <p>The logic behind this approach is straightforward:</p> <ul> <li><p>color harmony helps the outfit feel visually coherent</p> </li> <li><p>body-shape scoring helps the outfit feel flattering</p> </li> <li><p>undertone scoring helps the colors work better with the user’s profile</p> </li> </ul> <p>I used a similar structure for discard recommendations and shopping suggestions, but with different factors and thresholds.</p> <h3 id="heading-4-authentication-and-secure-multi-user-design">4. Authentication and Secure Multi-user Design</h3> <p>Security was one of the most important parts of this build.</p> <p>I implemented:</p> <ul> <li><p>short-lived access tokens</p> </li> <li><p>refresh tokens with JTI tracking</p> </li> <li><p>token rotation on refresh</p> </li> <li><p>session revocation (single session and all sessions)</p> </li> <li><p>email verification and password reset flows</p> </li> </ul> <p>The snippet below is a simplified example of the refresh-token lifecycle used in the app. It shows the important control points rather than every helper function:</p> <pre><code class="language-python">def refresh_access_token(refresh_token: str): payload = decode_jwt(refresh_token) jti = payload["jti"] token_record = db.get_refresh_token(jti) if not token_record or token_record.revoked: raise AuthError("Invalid refresh token") new_refresh, new_jti = issue_refresh_token(payload["sub"]) token_record.revoked = True token_record.replaced_by_jti = new_jti new_access = issue_access_token(payload["sub"]) return {"access_token": new_access, "refresh_token": new_refresh} </code></pre> <p>What this code is doing:</p> <ul> <li><p>It decodes the refresh token and looks up its JTI in the database.</p> </li> <li><p>It rejects reused or revoked sessions, which helps prevent replay attacks.</p> </li> <li><p>It rotates the refresh token instead of reusing it.</p> </li> <li><p>It issues a fresh access token so the session stays valid without forcing the user to log in again.</p> </li> </ul> <p>This design made multi-device sessions safer and gave me server-side control over logout behavior.</p> <h3 id="heading-5-background-jobs-for-long-running-operations">5. Background Jobs for Long-running Operations</h3> <p>Image analysis can be expensive, especially when the app needs to classify clothing, analyze colors, and estimate body-shape-related signals. To keep the request path responsive, I added Celery + Redis support for background tasks.</p> <p>That gave the app two modes:</p> <ul> <li><p>synchronous processing for simpler local development</p> </li> <li><p>queued processing for heavier or slower jobs</p> </li> </ul> <p>That tradeoff mattered because it let me keep the developer experience simple without blocking the app during more expensive work.</p> <h3 id="heading-6-data-model-and-feedback-capture">6. Data Model and Feedback Capture</h3> <p>A recommendation system only improves if it captures the right signals.</p> <p>So I added dedicated feedback tables for:</p> <ul> <li><p>outfit ratings (1-5 + optional comments)</p> </li> <li><p>recommendation helpful/unhelpful feedback</p> </li> <li><p>item usage actions (worn/kept/discarded)</p> </li> </ul> <p>Here is the shape of one of those models:</p> <pre><code class="language-python">class RecommendationFeedback(Base): __tablename__ = "recommendation_feedback" id = Column(Integer, primary_key=True) user_id = Column(Integer, ForeignKey("users.id"), nullable=False) recommendation_type = Column(String(50), nullable=False) recommendation_id = Column(Integer, nullable=False) helpful = Column(Boolean, nullable=False) created_at = Column(DateTime, default=datetime.utcnow) </code></pre> <p>How to read this model:</p> <ul> <li><p><code>user_id</code> ties feedback to the person who gave it.</p> </li> <li><p><code>recommendation_type</code> tells me whether the feedback belongs to outfits, shopping, or discard suggestions.</p> </li> <li><p><code>recommendation_id</code> identifies the exact recommendation.</p> </li> <li><p><code>helpful</code> stores the user’s direct response.</p> </li> <li><p><code>created_at</code> makes it possible to analyze feedback trends over time.</p> </li> </ul> <p>This part of the system gives the app a real learning foundation, even though the feedback-to-model-update loop is still a future improvement.</p> <h2 id="heading-challenges-i-faced">Challenges I Faced</h2> <p>This was the section that taught me the most.</p> <h3 id="heading-1-image-heavy-endpoints-were-slower-than-i-wanted">1. Image-heavy endpoints were slower than I wanted</h3> <p>The analyze and wardrobe upload flows were doing a lot of work at once: image validation, classification, color extraction, storage, and database writes.</p> <p>At first, that made the request flow feel heavier than it should have.</p> <p>What I changed:</p> <ul> <li><p>I bounded concurrent image jobs so the app wouldn't try to do too much at once.</p> </li> <li><p>I separated slower jobs into background processing where possible.</p> </li> <li><p>I used load-test results to confirm which endpoints were actually expensive.</p> </li> </ul> <p>The practical effect was that heavy image requests stopped competing with each other so aggressively. Instead of letting many expensive tasks pile up inside the same request cycle, I limited the active work and pushed slower operations into the queue when needed.</p> <p>Why this fixed it:</p> <ul> <li><p>Bounding concurrency prevented the system from overloading CPU-bound tasks.</p> </li> <li><p>Moving expensive work into async jobs kept the main request/response cycle more responsive.</p> </li> <li><p>Load testing gave me evidence instead of guesswork, so I could tune the system based on real performance behavior.</p> </li> </ul> <p>In other words, I didn't just “optimize” the endpoint in theory. I changed the execution model so expensive analysis could not block every other request behind it.</p> <h3 id="heading-2-jwt-sessions-needed-real-server-side-control">2. JWT sessions needed real server-side control</h3> <p>A basic JWT setup is easy to get working, but it becomes less useful if you cannot revoke sessions or manage multiple devices cleanly.</p> <p>What I changed:</p> <ul> <li><p>I stored refresh tokens in the database.</p> </li> <li><p>I tracked token JTI values.</p> </li> <li><p>I rotated refresh tokens when users refreshed their session.</p> </li> <li><p>I added endpoints for logging out a single session or all sessions.</p> </li> </ul> <p>The important shift here was moving from “token exists, therefore session is valid” to “token exists, matches the database record, and has not been revoked or replaced.” That gave the server the authority to invalidate old sessions immediately.</p> <p>Why this fixed it:</p> <ul> <li><p>Server-side token tracking made revocation possible.</p> </li> <li><p>Rotation reduced the chance of token reuse.</p> </li> <li><p>Session management became visible to the user, which made the app feel more trustworthy.</p> </li> </ul> <p>This is what made logout-all and multi-device management work in a real way instead of just being cosmetic UI actions.</p> <h3 id="heading-3-user-data-isolation-had-to-be-explicit">3. User data isolation had to be explicit</h3> <p>Because this is a multi-user app, I had to be careful that one account could never accidentally see another account’s wardrobe data.</p> <p>What I changed:</p> <ul> <li><p>I added ownership checks to user-scoped routes.</p> </li> <li><p>I kept all wardrobe and feedback queries filtered by <code>user_id</code>.</p> </li> <li><p>I used encrypted image storage instead of exposing raw paths.</p> </li> </ul> <p>In practice, this meant every route had to ask the same question: “Does this user own the resource they are trying to access?” If the answer was no, the request stopped immediately.</p> <p>Why this fixed it:</p> <ul> <li><p>Ownership checks made data access rules explicit.</p> </li> <li><p>User-filtered queries prevented accidental cross-account reads.</p> </li> <li><p>Encrypted storage improved privacy and reduced the risk of exposing image data directly.</p> </li> </ul> <p>That combination is what kept wardrobe data, feedback history, and images separated correctly across accounts.</p> <h3 id="heading-4-docker-made-the-project-easier-to-share-but-only-after-the-stack-was-organized">4. Docker made the project easier to share, but only after the stack was organized</h3> <p>The app includes the frontend, backend, Redis, Celery worker, and Celery Beat, so the first challenge was making the setup feel reproducible instead of fragile.</p> <p>What I changed:</p> <ul> <li><p>I defined the stack in Docker Compose.</p> </li> <li><p>I documented the required environment variables.</p> </li> <li><p>I kept the dev stack aligned with how the app runs in practice.</p> </li> </ul> <p>This removed a lot of setup ambiguity. Instead of asking someone to manually figure out how the frontend, backend, Redis, and workers fit together, I made the stack describe itself.</p> <p>Why this fixed it:</p> <ul> <li><p>Docker let contributors start the project with fewer manual steps.</p> </li> <li><p>Clear environment configuration reduced setup mistakes.</p> </li> <li><p>Matching the stack to the architecture made the app easier to understand and test.</p> </li> </ul> <p>That was important because the app depends on several moving parts, and the simplest way to make the project approachable was to make startup behavior predictable.</p> <h2 id="heading-what-i-learned">What I Learned</h2> <p>This project taught me a few important lessons:</p> <ul> <li><p>Small features become much more valuable when they work together.</p> </li> <li><p>Feedback data is one of the strongest signals for improving recommendations.</p> </li> <li><p>Clean data modeling matters a lot when multiple users are involved.</p> </li> <li><p>Docker and clear setup instructions make a project much easier for other people to try.</p> </li> </ul> <p>I also learned that a project does not need to be huge to be useful. A focused app that solves one problem well can still feel meaningful.</p> <h2 id="heading-what-i-want-to-improve-next">What I Want to Improve Next</h2> <p>My roadmap from here:</p> <ol> <li><p>Integrate feedback directly into ranking updates</p> </li> <li><p>Add visual analytics for recommendation quality trends</p> </li> <li><p>Improve mobile UX parity</p> </li> <li><p>Deploy with persistent cloud storage and production database defaults</p> </li> <li><p>Provide a public demo mode for easier evaluation</p> </li> </ol> <h2 id="heading-future-improvements">Future Improvements</h2> <p>There are still a few things I would like to add later:</p> <ul> <li><p>a more advanced recommendation engine</p> </li> <li><p>visual analytics for user feedback</p> </li> <li><p>better mobile support</p> </li> <li><p>live deployment with persistent cloud storage</p> </li> <li><p>a public demo mode for easier testing</p> </li> </ul> <h2 id="heading-conclusion">Conclusion</h2> <p>This project began as a personal frustration and turned into a full web application with authentication, wardrobe storage, recommendation logic, and feedback infrastructure.</p> <p>The most rewarding part was seeing how practical software decisions, not just flashy UI, can help people make everyday choices faster.</p> <p>If you want to explore or run the project, <a href="https://github.com/Mokshitavp1/fashion_assistant">check out the repo</a>. You can try the flows and share feedback. I would especially love input on recommendation quality, UX clarity, and what features would make this genuinely useful in daily life.</p> </article> <article> <h1> How to Build a Cost-Efficient AI Agent with Tiered Model Routing </h1> <p>Daniel Nwaneri — Wed, 08 Apr 2026 22:59:09 +0000</p> <p>Most AI agent tutorials make the same mistake: they route every task to the most expensive model available.</p> <p>A character count doesn't need GPT-4. A presence check doesn't need Sonnet. A regex doesn't need anything except Python.</p> <p>The mistake isn't using AI — it's not knowing when to stop using it.</p> <p>This tutorial shows you how to build a tiered routing system that sends tasks to the cheapest model that can solve them. The pattern is called the cost curve. It comes from a comment thread on a DEV.to article, implemented by three developers over a weekend, and it cut the per-URL cost of a real SEO audit agent from $0.006 to effectively $0 for most pages.</p> <p>By the end, you'll have a working <code>cost_curve.py</code> module you can drop into any agent project.</p> <h2 id="heading-what-youll-build">What You'll Build</h2> <p>A three-tier routing function that:</p> <ul> <li><p>Runs deterministic Python checks first — zero API cost</p> </li> <li><p>Escalates to Claude Haiku only for genuinely ambiguous cases — ~$0.0001 per call</p> </li> <li><p>Escalates to Claude Sonnet only when semantic judgment is required — ~$0.006 per call</p> </li> <li><p>Falls back gracefully when any tier fails</p> </li> <li><p>Returns a consistent result schema regardless of which tier handled the request</p> </li> </ul> <p>The full implementation is part of <a href="https://github.com/dannwaneri/seo-agent">dannwaneri/seo-agent</a>, an open-core SEO audit agent. The cost curve module is the premium routing layer, and the principle applies to any agent with mixed-complexity tasks.</p> <h2 id="heading-prerequisites">Prerequisites</h2> <ul> <li><p>Python 3.11 or higher</p> </li> <li><p>An Anthropic API key</p> </li> <li><p>Basic familiarity with Python and the Claude API</p> </li> </ul> <h2 id="heading-table-of-contents">Table of Contents</h2> <ol> <li><p><a href="#heading-the-problem-with-calling-claude-on-everything">The Problem with Calling Claude on Everything</a></p> </li> <li><p><a href="#heading-the-cost-curve-explained">The Cost Curve Explained</a></p> </li> <li><p><a href="#heading-project-setup">Project Setup</a></p> </li> <li><p><a href="#heading-tier-1-deterministic-python">Tier 1: Deterministic Python</a></p> </li> <li><p><a href="#heading-tier-2-claude-haiku-for-ambiguous-cases">Tier 2: Claude Haiku for Ambiguous Cases</a></p> </li> <li><p><a href="#heading-tier-3-claude-sonnet-for-semantic-judgment">Tier 3: Claude Sonnet for Semantic Judgment</a></p> </li> <li><p><a href="#heading-the-router-audit_url">The Router: audit_url()</a></p> </li> <li><p><a href="#heading-graceful-fallback">Graceful Fallback</a></p> </li> <li><p><a href="#heading-testing-the-cost-curve">Testing the Cost Curve</a></p> </li> <li><p><a href="#heading-applying-this-pattern-to-your-agent">Applying This Pattern to Your Agent</a></p> </li> </ol> <h2 id="heading-the-problem-with-calling-claude-on-everything">The Problem with Calling Claude on Everything</h2> <p>Here's what most agent code looks like:</p> <pre><code class="language-python">def audit_url(snapshot: dict) -> dict: response = client.messages.create( model="claude-sonnet-4-20250514", messages=[{"role": "user", "content": build_prompt(snapshot)}] ) return parse_response(response) </code></pre> <p>This works. It also calls Sonnet for every URL in the list — including the ones where the title is 142 characters long and the answer is obviously FAIL without any model involvement.</p> <p>Claude Sonnet 4 is priced at $3 per million input tokens and $15 per million output tokens. A typical page snapshot is around 500 input tokens. That's $0.0015 per URL just for input — before output tokens. Across a 20-URL weekly audit, the total is around $0.12. Not expensive. But most of those pages have mechanical SEO issues: missing descriptions, titles over 60 characters, no canonical tag. A character count catches all of that. You don't need a model.</p> <p>The cost curve fixes this by routing based on what the task actually requires, not on what the model is capable of.</p> <h2 id="heading-the-cost-curve-explained">The Cost Curve Explained</h2> <p>In the cost curve, we have three tiers, three tools, and three price points:</p> <p><strong>Tier 1 — Deterministic Python. Cost: $0.</strong> Check title length, description length, H1 count, canonical presence. These are not judgment calls. They're string operations. If title length > 60, FAIL. No model needed.</p> <p><strong>Tier 2 — Claude Haiku. Cost: ~$0.0001 per call.</strong> Title present but only 4 characters long. Description present but only 30 characters. Status code is a redirect. These pass the mechanical audit but something is off. Haiku is fast and cheap enough that escalating ambiguous cases costs less than the debugging time you'd spend on false positives.</p> <p><strong>Tier 3 — Claude Sonnet. Cost: ~$0.006 per call.</strong> Pages Haiku flags as needing semantic judgment. "This title passes length but reads like a navigation label." "This description duplicates the title verbatim." Sonnet earns its cost on genuinely hard cases — not on every URL in the list.</p> <p>The routing decision happens before any API call. The result schema is identical regardless of which tier handled the request.</p> <h2 id="heading-project-setup">Project Setup</h2> <pre><code class="language-bash">mkdir cost-curve-demo && cd cost-curve-demo pip install anthropic </code></pre> <p>Set your API key:</p> <pre><code class="language-bash"># macOS/Linux export ANTHROPIC_API_KEY="sk-ant-..." # Windows PowerShell $env:ANTHROPIC_API_KEY = "sk-ant-..." </code></pre> <p>Create <code>cost_curve.py</code> — you'll build this module step by step.</p> <h2 id="heading-tier-1-deterministic-python">Tier 1: Deterministic Python</h2> <p>Tier 1 runs first on every URL. It checks four fields using only Python string operations. There's no API call, no latency, and no cost.</p> <pre><code class="language-python">import json import logging import os import re from datetime import datetime, timezone import anthropic logger = logging.getLogger(__name__) REDIRECT_CODES = {301, 302, 307, 308} # Fields that trigger Tier 2 escalation # Title or description present but suspiciously short AMBIGUOUS_TITLE_MAX = 10 # chars — present but too short to be real AMBIGUOUS_DESC_MAX = 50 # chars — present but too short to be useful def _now_iso() -> str: return datetime.now(timezone.utc).isoformat() def _build_result(snapshot: dict, method: str) -> dict: """Base result skeleton — same schema regardless of tier.""" return { "url": snapshot.get("final_url", ""), "final_url": snapshot.get("final_url", ""), "status_code": snapshot.get("status_code"), "title": {"value": None, "length": 0, "status": "PASS"}, "description": {"value": None, "length": 0, "status": "PASS"}, "h1": {"count": 0, "value": None, "status": "PASS"}, "canonical": {"value": None, "status": "PASS"}, "flags": [], "human_review": False, "audited_at": _now_iso(), "method": method, "needs_tier3": False, } def tier1_check(snapshot: dict) -> dict: """ Pure Python SEO checks. Zero API calls. Returns a result dict with method="deterministic". Sets needs_tier3=False always — Tier 1 never escalates to Tier 3 directly. Escalation to Tier 2 is decided by the router, not here. """ result = _build_result(snapshot, "deterministic") title = snapshot.get("title") or "" description = snapshot.get("meta_description") or "" h1s = snapshot.get("h1s") or [] canonical = snapshot.get("canonical") or "" # Title check result["title"]["value"] = title or None result["title"]["length"] = len(title) if not title or len(title) > 60: result["title"]["status"] = "FAIL" msg = "Title is missing" if not title else f"Title is {len(title)} characters (max 60)" result["flags"].append(msg) # Description check result["description"]["value"] = description or None result["description"]["length"] = len(description) if not description or len(description) > 160: result["description"]["status"] = "FAIL" msg = "Meta description is missing" if not description else f"Meta description is {len(description)} characters (max 160)" result["flags"].append(msg) # H1 check result["h1"]["count"] = len(h1s) result["h1"]["value"] = h1s[0] if h1s else None if len(h1s) == 0: result["h1"]["status"] = "FAIL" result["flags"].append("H1 tag is missing") elif len(h1s) > 1: result["h1"]["status"] = "FAIL" result["flags"].append(f"Multiple H1 tags found ({len(h1s)})") # Canonical check result["canonical"]["value"] = canonical or None if not canonical: result["canonical"]["status"] = "FAIL" result["flags"].append("Canonical tag is missing") return result </code></pre> <p>The key design decision: <code>tier1_check()</code> never decides whether to escalate. It just runs the checks and returns. The router decides escalation based on the result.</p> <h2 id="heading-tier-2-claude-haiku-for-ambiguous-cases">Tier 2: Claude Haiku for Ambiguous Cases</h2> <p>Tier 2 runs when Tier 1 detects something mechanical but the result might need a second look. A 4-character title present but clearly wrong. A 30-character description that's technically there but useless. A redirect status that needs a human-readable explanation.</p> <p>Haiku is the right model here. It's fast, cheap ($1 input / $5 output per million tokens), and sufficient for triage-level judgment. The prompt asks a narrow question: is this ambiguous enough to need Sonnet?</p> <pre><code class="language-python">def tier2_check(snapshot: dict) -> dict: """ Claude Haiku call for ambiguous cases. Returns result with method="haiku". Sets needs_tier3=True if Haiku determines the case needs semantic judgment. Falls back to Tier 1 result on API error. """ api_key = os.environ.get("ANTHROPIC_API_KEY") if not api_key: raise OSError("ANTHROPIC_API_KEY is not set.") client = anthropic.Anthropic(api_key=api_key) title = snapshot.get("title") or "" description = snapshot.get("meta_description") or "" status_code = snapshot.get("status_code") prompt = f"""You are an SEO auditor doing a quick triage check. Page data: - Title: {repr(title)} ({len(title)} chars) - Meta description: {repr(description)} ({len(description)} chars) - Status code: {status_code} Answer these two questions with only "yes" or "no": 1. Does this page need semantic judgment beyond simple length/presence checks? (e.g. title is present but clearly wrong, description is present but meaningless) 2. Is the status code a redirect that needs investigation? Respond in this exact JSON format and nothing else: {{"needs_tier3": true_or_false, "reason": "one sentence explanation"}}""" try: response = client.messages.create( model="claude-haiku-4-5-20251001", max_tokens=150, messages=[{"role": "user", "content": prompt}], ) raw = response.content[0].text.strip() # Strip markdown fences if present if raw.startswith("```"): lines = raw.splitlines() raw = "\n".join(lines[1:-1] if lines[-1].strip() == "```" else lines[1:]) parsed = json.loads(raw) result = _build_result(snapshot, "haiku") # Copy Tier 1 field checks — Haiku doesn't redo those t1 = tier1_check(snapshot) result["title"] = t1["title"] result["description"] = t1["description"] result["h1"] = t1["h1"] result["canonical"] = t1["canonical"] result["flags"] = t1["flags"] result["needs_tier3"] = parsed.get("needs_tier3", False) if result["needs_tier3"]: result["flags"].append(f"Escalated to Tier 3: {parsed.get('reason', '')}") return result except Exception as exc: logger.warning("[tier2] Haiku API error: %s — falling back to Tier 1 result", exc) fallback = tier1_check(snapshot) fallback["method"] = "haiku-fallback" return fallback </code></pre> <p>The fallback is the critical piece. If Haiku fails — rate limit, network error, malformed response — the function returns the Tier 1 result rather than crashing. The audit continues. The URL gets flagged with <code>method="haiku-fallback"</code> so you can identify it later.</p> <h2 id="heading-tier-3-claude-sonnet-for-semantic-judgment">Tier 3: Claude Sonnet for Semantic Judgment</h2> <p>Tier 3 is where the full extraction prompt runs. This is the same call you'd make in a naïve implementation — the difference is that only a small fraction of URLs reach this tier.</p> <pre><code class="language-python">def tier3_check(snapshot: dict) -> dict: """ Claude Sonnet call for semantic judgment. Returns result with method="sonnet". This is the full extraction prompt — same as calling the model directly. """ api_key = os.environ.get("ANTHROPIC_API_KEY") if not api_key: raise OSError("ANTHROPIC_API_KEY is not set.") client = anthropic.Anthropic(api_key=api_key) prompt = f"""You are an SEO auditor. Analyze this page snapshot and return ONLY a JSON object. No prose. No explanation. No markdown fences. Raw JSON only. Page data: - URL: {snapshot.get('final_url')} - Status code: {snapshot.get('status_code')} - Title: {snapshot.get('title')} - Meta description: {snapshot.get('meta_description')} - H1 tags: {snapshot.get('h1s')} - Canonical: {snapshot.get('canonical')} Return this exact schema: {{ "url": "string", "final_url": "string", "status_code": number, "title": {{"value": "string or null", "length": number, "status": "PASS or FAIL"}}, "description": {{"value": "string or null", "length": number, "status": "PASS or FAIL"}}, "h1": {{"count": number, "value": "string or null", "status": "PASS or FAIL"}}, "canonical": {{"value": "string or null", "status": "PASS or FAIL"}}, "flags": ["array of strings describing specific issues"], "human_review": false, "audited_at": "ISO timestamp" }} PASS/FAIL rules: - title: FAIL if null or length > 60 characters, or if present but clearly not a real title - description: FAIL if null or length > 160 characters, or if present but meaningless - h1: FAIL if count is 0 or count > 1 - canonical: FAIL if null - audited_at: use current UTC time""" try: response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1000, messages=[{"role": "user", "content": prompt}], ) raw = response.content[0].text.strip() if raw.startswith("```"): lines = raw.splitlines() raw = "\n".join(lines[1:-1] if lines[-1].strip() == "```" else lines[1:]) result = json.loads(raw) result["method"] = "sonnet" result["needs_tier3"] = False return result except Exception as exc: logger.warning("[tier3] Sonnet API error: %s — falling back to Tier 1 result", exc) fallback = tier1_check(snapshot) fallback["method"] = "sonnet-fallback" return fallback </code></pre> <p>Note the prompt addition in Tier 3 that isn't in Tier 1: <code>"or if present but clearly not a real title"</code> and <code>"or if present but meaningless"</code>. That's the semantic judgment Haiku identified as needed. Tier 3 acts on it.</p> <h2 id="heading-the-router-auditurl">The Router: audit_url()</h2> <p>The router is the public interface. Everything else is an implementation detail.</p> <pre><code class="language-python">def audit_url(snapshot: dict, tiered: bool = False) -> dict: """ Route a page snapshot through the appropriate audit tier. Args: snapshot: Page data from browser.py — must contain final_url, status_code, title, meta_description, h1s, canonical. tiered: If False, delegates directly to Tier 3 (Sonnet). If True, routes through the cost curve. Returns: Audit result dict with method field indicating which tier ran. """ if not tiered: # Non-tiered mode: call Sonnet directly, same as v1 behavior return tier3_check(snapshot) # Tier 1: always runs first t1_result = tier1_check(snapshot) # Check if escalation to Tier 2 is warranted title = snapshot.get("title") or "" description = snapshot.get("meta_description") or "" status_code = snapshot.get("status_code") needs_tier2 = ( # Title present but suspiciously short (title and len(title) < AMBIGUOUS_TITLE_MAX) or # Description present but suspiciously short (description and len(description) < AMBIGUOUS_DESC_MAX) or # Redirect status — may need explanation (status_code in REDIRECT_CODES) ) if not needs_tier2: # Tier 1 result is definitive — return without any API call return t1_result # Tier 2: Haiku triage t2_result = tier2_check(snapshot) if not t2_result.get("needs_tier3", False): # Haiku determined no semantic judgment needed return t2_result # Tier 3: Sonnet for semantic judgment return tier3_check(snapshot) </code></pre> <p>The router logic is explicit and readable. Each decision point is a named condition. When <code>tiered=False</code>, behavior is identical to the v1 naive implementation — this is the backward compatibility guarantee that lets you add the cost curve incrementally without breaking existing audits.</p> <h2 id="heading-graceful-fallback">Graceful Fallback</h2> <p>The fallback pattern appears in both Tier 2 and Tier 3. It's worth making explicit:</p> <pre><code class="language-python"># Pattern used in both tier2_check() and tier3_check() except Exception as exc: logger.warning("[tierN] API error: %s — falling back to Tier 1 result", exc) fallback = tier1_check(snapshot) fallback["method"] = "tierN-fallback" return fallback </code></pre> <p>Three things this does:</p> <ol> <li><p>Logs the error with enough context to debug later</p> </li> <li><p>Returns a valid result — the Tier 1 deterministic check always runs regardless</p> </li> <li><p>Tags the result with the fallback method so you can filter these in your report</p> </li> </ol> <p>An agent that crashes on API errors is not production-ready. An agent that degrades gracefully and continues is.</p> <h2 id="heading-testing-the-cost-curve">Testing the Cost Curve</h2> <p>Create <code>test_cost_curve.py</code> to verify routing behavior without live API calls:</p> <pre><code class="language-python">import json from unittest import mock from cost_curve import audit_url, tier1_check def make_snapshot(title="Normal Title Under 60 Chars", description="A normal meta description that is under 160 characters and describes the page content well.", h1s=["Single H1"], canonical="https://example.com/page", status_code=200, final_url="https://example.com/page"): return { "title": title, "meta_description": description, "h1s": h1s, "canonical": canonical, "status_code": status_code, "final_url": final_url, } def test_clean_page_returns_tier1_no_api_calls(): """Clean page: all checks pass deterministically — no API call.""" snapshot = make_snapshot() with mock.patch("anthropic.Anthropic") as mock_client: result = audit_url(snapshot, tiered=True) assert result["method"] == "deterministic" mock_client.assert_not_called() print("PASS: clean page → Tier 1, zero API calls") def test_long_title_returns_tier1_fail_no_api_call(): """Title >60 chars: FAIL from Tier 1, no API call.""" snapshot = make_snapshot(title="A" * 70) with mock.patch("anthropic.Anthropic") as mock_client: result = audit_url(snapshot, tiered=True) assert result["method"] == "deterministic" assert result["title"]["status"] == "FAIL" mock_client.assert_not_called() print("PASS: title >60 → Tier 1 FAIL, zero API calls") def test_suspiciously_short_title_escalates_to_tier2(): """Title present but 4 chars: escalates to Tier 2.""" snapshot = make_snapshot(title="SEO") # 3 chars — under AMBIGUOUS_TITLE_MAX mock_response = mock.MagicMock() mock_response.content = [mock.MagicMock( text='{"needs_tier3": false, "reason": "title is short but not ambiguous"}' )] with mock.patch("anthropic.Anthropic") as mock_client: mock_client.return_value.messages.create.return_value = mock_response result = audit_url(snapshot, tiered=True) assert result["method"] == "haiku" assert mock_client.return_value.messages.create.call_count == 1 print("PASS: short title → Tier 2 (Haiku called once)") def test_tiered_false_calls_sonnet_directly(): """tiered=False: Sonnet called regardless of snapshot content.""" snapshot = make_snapshot() # clean page, would be Tier 1 in tiered mode mock_response = mock.MagicMock() mock_response.content = [mock.MagicMock(text=json.dumps({ "url": "https://example.com/page", "final_url": "https://example.com/page", "status_code": 200, "title": {"value": "Normal Title Under 60 Chars", "length": 27, "status": "PASS"}, "description": {"value": "desc", "length": 4, "status": "PASS"}, "h1": {"count": 1, "value": "Single H1", "status": "PASS"}, "canonical": {"value": "https://example.com/page", "status": "PASS"}, "flags": [], "human_review": False, "audited_at": "2026-04-01T00:00:00+00:00", }))] with mock.patch("anthropic.Anthropic") as mock_client: mock_client.return_value.messages.create.return_value = mock_response result = audit_url(snapshot, tiered=False) assert result["method"] == "sonnet" assert mock_client.return_value.messages.create.call_count == 1 print("PASS: tiered=False → Sonnet called directly") def test_haiku_api_failure_falls_back_to_tier1(): """Haiku failure: falls back to Tier 1 result, no crash.""" snapshot = make_snapshot(title="SEO") # triggers Tier 2 with mock.patch("anthropic.Anthropic") as mock_client: mock_client.return_value.messages.create.side_effect = Exception("rate limit") result = audit_url(snapshot, tiered=True) assert result["method"] == "haiku-fallback" print("PASS: Haiku failure → fallback to Tier 1, no crash") if __name__ == "__main__": test_clean_page_returns_tier1_no_api_calls() test_long_title_returns_tier1_fail_no_api_call() test_suspiciously_short_title_escalates_to_tier2() test_tiered_false_calls_sonnet_directly() test_haiku_api_failure_falls_back_to_tier1() print("\nAll tests passed.") </code></pre> <p>Run it:</p> <pre><code class="language-bash">python test_cost_curve.py </code></pre> <p>Expected output:</p> <pre><code class="language-plaintext">PASS: clean page → Tier 1, zero API calls PASS: title >60 → Tier 1 FAIL, zero API calls PASS: short title → Tier 2 (Haiku called once) PASS: tiered=False → Sonnet called directly PASS: Haiku failure → fallback to Tier 1, no crash </code></pre> <h2 id="heading-applying-this-pattern-to-your-agent">Applying This Pattern to Your Agent</h2> <p>The cost curve is not SEO-specific. Any agent with mixed-complexity tasks can use it.</p> <p>The principle: classify tasks by what they actually require before deciding which model to invoke.</p> <p><strong>Customer support agent:</strong></p> <ul> <li><p>Tier 1: keyword matching for known FAQ topics — no model</p> </li> <li><p>Tier 2: Haiku for intent classification on ambiguous queries</p> </li> <li><p>Tier 3: Sonnet for complex complaints requiring judgment</p> </li> </ul> <p><strong>Code review agent:</strong></p> <ul> <li><p>Tier 1: lint rules, syntax checks — no model</p> </li> <li><p>Tier 2: Haiku for common pattern detection</p> </li> <li><p>Tier 3: Sonnet for architectural review</p> </li> </ul> <p><strong>Content moderation agent:</strong></p> <ul> <li><p>Tier 1: blocklist matching — no model</p> </li> <li><p>Tier 2: Haiku for borderline cases</p> </li> <li><p>Tier 3: Sonnet for context-dependent judgment</p> </li> </ul> <p>The implementation pattern is the same in all three cases. The <code>audit_url()</code> router becomes <code>route_task()</code>. The tier functions change their prompts and escalation conditions. The fallback logic stays identical.</p> <p>The key question to ask before writing any agent code: what fraction of my inputs are mechanically solvable? That fraction goes to Tier 1. The rest escalate. The cost curve routes everything else.</p> <h2 id="heading-wrapping-up">Wrapping Up</h2> <p>The full implementation — including the SEO audit agent that uses this module in production — is at <a href="https://github.com/dannwaneri/seo-agent">dannwaneri/seo-agent</a>. The <code>core/</code> directory is MIT licensed. The tiered routing lives in <code>premium/cost_curve.py</code>.</p> <p><em>This tutorial is the companion piece to</em> <a href="https://dev.to/dannwaneri/i-was-paying-0006-per-url-for-seo-audits-until-i-realized-most-needed-0-132j">I Was Paying $0.006 Per URL for SEO Audits Until I Realized Most Needed $0</a> <em>on DEV.to, which covers the architecture decisions behind the cost curve.</em></p> </article> <article> <h1> How to Build a Barcode Generator Using JavaScript (Step-by-Step) </h1> <p>Bhavin Sheth — Fri, 03 Apr 2026 15:41:15 +0000</p> <p>If you’ve ever worked on something like an inventory system, billing dashboard, or even a small internal tool, chances are you’ve needed to generate barcodes at some point.</p> <p>Most developers either rely on external tools or assume this requires backend processing. That’s usually where things get slower, more complex, and harder to maintain.</p> <p>But modern browsers have quietly become powerful enough to handle this entirely on their own.</p> <p>In this tutorial, you’ll build a barcode generator that runs completely in the browser. It won’t upload data anywhere, and it won’t require any server logic. Everything happens instantly on the client side.</p> <p>Along the way, you’ll also learn how barcode formats work, how to validate inputs properly, and how to create a real-time preview experience that feels responsive and practical.</p> <h2 id="heading-table-of-contents">Table of Contents</h2> <ol> <li><p><a href="#heading-how-barcode-generation-works">How Barcode Generation Works</a></p> </li> <li><p><a href="#heading-project-setup">Project Setup</a></p> </li> <li><p><a href="#heading-what-library-are-we-using">What Library Are We Using?</a></p> </li> <li><p><a href="#heading-creating-the-html-structure">Creating the HTML Structure</a></p> </li> <li><p><a href="#heading-adding-javascript-for-barcode-generation">Adding JavaScript for Barcode Generation</a></p> </li> <li><p><a href="#heading-how-the-barcode-is-generated">How the Barcode Is Generated</a></p> </li> <li><p><a href="#heading-types-of-barcodes-you-can-generate">Types of Barcodes You Can Generate</a></p> </li> <li><p><a href="#heading-adding-real-time-preview">Adding Real-Time Preview</a></p> </li> <li><p><a href="#heading-how-to-validate-input-properly">How to Validate Input Properly</a></p> </li> <li><p><a href="#heading-how-to-download-the-barcode">How to Download the Barcode</a></p> </li> <li><p><a href="#heading-important-notes-from-real-world-use">Important Notes from Real-World Use</a></p> </li> <li><p><a href="#heading-common-mistakes-to-avoid">Common Mistakes to Avoid</a></p> </li> <li><p><a href="#heading-demo-how-the-barcode-generator-works">Demo: How the Barcode Generator Works</a></p> </li> <li><p><a href="#heading-conclusion">Conclusion</a></p> </li> </ol> <h2 id="heading-how-barcode-generation-works">How Barcode Generation Works</h2> <p>A barcode is simply a visual encoding of data. Instead of displaying text directly, it represents that data using a pattern of lines and spaces.</p> <p>Different barcode formats use different encoding rules. Some support only numbers, while others allow full text input. When you generate a barcode in the browser, you’re essentially converting user input into a structured visual pattern.</p> <p>The key idea here is that we don’t draw these lines manually. A library takes care of encoding the data and rendering it as an SVG element, which the browser can display instantly.</p> <h2 id="heading-project-setup">Project Setup</h2> <p>We’ll keep this project intentionally simple so the focus stays on understanding how it works.</p> <p>All you need is a basic HTML file, a small JavaScript file, and a barcode library. There’s no backend involved, and nothing gets stored or uploaded.</p> <p>This makes the tool fast, private, and easy to integrate into other projects.</p> <h2 id="heading-what-library-are-we-using">What Library Are We Using?</h2> <p>In this project, we use the <strong>JsBarcode</strong> library.</p> <p>It’s a lightweight JavaScript library that can generate barcodes directly inside the browser using SVG. It supports multiple formats and works without any external dependencies.</p> <p>You can include it using a CDN:</p> <pre><code class="language-html"><script src="https://cdn.jsdelivr.net/npm/jsbarcode@3.11.5/dist/JsBarcode.all.min.js"></script> </code></pre> <h2 id="heading-creating-the-html-structure">Creating the HTML Structure</h2> <p>The interface is simple but practical. It includes an input field where users can enter data, a dropdown to choose the barcode format, and a preview area where the barcode is rendered.</p> <pre><code class="language-html"><input type="text" id="text" placeholder="Enter text or number"> <select id="format"> <option value="CODE128">Code128</option> <option value="EAN13">EAN13</option> </select> <button onclick="generateBarcode()">Generate</button> <svg id="barcode"></svg> </code></pre> <p>This structure is enough to handle input, display output, and connect everything through JavaScript.</p> <h2 id="heading-adding-javascript-for-barcode-generation">Adding JavaScript for Barcode Generation</h2> <p>Now we'll connect the user input to barcode generation.</p> <pre><code class="language-javascript">function generateBarcode() { const text = document.getElementById("text").value; const format = document.getElementById("format").value; if (!text) { alert("Please enter a value"); return; } JsBarcode("#barcode", text, { format: format, width: 2, height: 100, displayValue: true }); } </code></pre> <p>This function reads the input, checks if it exists, and then generates the barcode using the selected format.</p> <h2 id="heading-how-the-barcode-is-generated">How the Barcode Is Generated</h2> <p>When you call the JsBarcode function, the library handles everything behind the scenes.</p> <p>It encodes the input into a barcode standard, converts that into a pattern of lines, and renders it as an SVG element. Because SVG is vector-based, the barcode remains sharp even when resized.</p> <p>All of this happens instantly in the browser, which is why the experience feels fast.</p> <h2 id="heading-types-of-barcodes-you-can-generate">Types of Barcodes You Can Generate</h2> <p>Different barcode formats are used in different industries, and understanding them helps you build more practical tools.</p> <ol> <li><p><strong>Code128</strong> is the most flexible format. It supports letters, numbers, and special characters, which makes it ideal for general-purpose use.</p> </li> <li><p><strong>EAN-13</strong> is commonly used in retail products. It works only with 13-digit numbers, so it requires strict validation.</p> </li> <li><p><strong>UPC</strong> is similar to EAN and is widely used in billing systems, especially in the US. It also expects numeric input with a fixed length.</p> </li> <li><p><strong>Code39</strong> is simpler and supports uppercase letters and numbers, but it’s less compact compared to Code128.</p> </li> <li><p><strong>ITF-14</strong> is mostly used in logistics and packaging. It’s designed for numeric data and is common in shipping environments.</p> </li> </ol> <p>In most cases, starting with Code128 is the safest option unless you have a specific requirement.</p> <h2 id="heading-adding-real-time-preview">Adding Real-Time Preview</h2> <p>One of the biggest improvements you can make to a tool like this is real-time feedback.</p> <p>Instead of requiring users to click a button every time, you can generate the barcode as they type.</p> <pre><code class="language-javascript">document.getElementById("text").addEventListener("input", generateBarcode); document.getElementById("format").addEventListener("change", generateBarcode); </code></pre> <p>This small change makes the tool feel much more responsive.</p> <p>As soon as the user types or changes the format, the barcode updates automatically. This is the same kind of interaction you see in polished production tools.</p> <h2 id="heading-how-to-validate-input-properly">How to Validate Input Properly</h2> <p>Validation is where many simple tools break.</p> <p>Since different barcode formats have different rules, if you don’t validate input correctly, the barcode may fail silently or produce incorrect output.</p> <p>Here’s a simple example:</p> <pre><code class="language-javascript">function isValidInput(text, format) { if (format === "EAN13") { return /^\d{13}$/.test(text); } if (format === "UPC") { return /^\d{12}$/.test(text); } return text.length > 0; } </code></pre> <p>Then use it inside your generator:</p> <pre><code class="language-javascript">if (!isValidInput(text, format)) { alert("Invalid input for selected format"); return; } </code></pre> <p>This ensures users get immediate feedback instead of confusion.</p> <h2 id="heading-how-to-download-the-barcode">How to Download the Barcode</h2> <p>Once the barcode is generated, you can allow users to download it.</p> <pre><code class="language-javascript">function downloadBarcode() { const svg = document.getElementById("barcode"); const serializer = new XMLSerializer(); const source = serializer.serializeToString(svg); const blob = new Blob([source], { type: "image/svg+xml" }); const url = URL.createObjectURL(blob); const link = document.createElement("a"); link.href = url; link.download = "barcode.svg"; link.click(); } </code></pre> <p>This converts the SVG into a file that can be downloaded directly from the browser.</p> <h2 id="heading-important-notes-from-real-world-use">Important Notes from Real-World Use</h2> <p>When building tools like this in production, small details matter.</p> <p>Large input values can sometimes affect readability, so it’s important to test how dense the barcode becomes. Choosing the right format also makes a difference depending on whether you need flexibility or strict standards.</p> <p>Another important detail is rendering quality. Using SVG instead of raster formats ensures that the barcode remains sharp even when printed.</p> <h2 id="heading-common-mistakes-to-avoid">Common Mistakes to Avoid</h2> <p>One common issue is skipping validation. This leads to broken or unreadable barcodes, especially with strict formats like EAN or UPC.</p> <p>Another mistake is relying too much on button-based interactions. Real-time updates create a much better user experience.</p> <p>Finally, developers sometimes forget to include the library correctly, which leads to silent failures. Always verify that your CDN is loaded.</p> <h2 id="heading-demo-how-the-barcode-generator-works">Demo: How the Barcode Generator Works</h2> <p>To better understand how everything comes together, here’s a quick walkthrough of how the tool works in the browser.</p> <h3 id="heading-step-1-select-a-barcode-type">Step 1: Select a Barcode Type</h3> <p>Start by choosing the barcode format. In most cases, Code128 is a good default since it supports both text and numbers.</p> <h3 id="heading-step-2-enter-your-data">Step 2: Enter Your Data</h3> <p>Next, enter the value you want to encode. This could be a product ID, URL, or any text depending on the selected format.</p> <h3 id="heading-step-3-customize-the-design">Step 3: Customize the Design</h3> <p>You can adjust things like bar width, height, and colors. These settings help control how the barcode looks and how readable it is in different use cases.</p> <h3 id="heading-step-4-generate-and-preview">Step 4: Generate and Preview</h3> <p>As you type or change settings, the barcode updates instantly. This real-time preview makes it easier to experiment and see results immediately.</p> <h3 id="heading-step-5-download-the-barcode">Step 5: Download the Barcode</h3> <p>Once you're satisfied with the result, you can download the barcode in formats like PNG, JPG, or SVG.</p> <p>This entire process happens in the browser, without uploading any data to a server.</p> <h2 id="heading-conclusion">Conclusion</h2> <p>In this tutorial, you built a browser-based barcode generator using JavaScript.</p> <p>More importantly, you learned how to think about building tools that run entirely on the client side. This approach reduces complexity, improves performance, and gives users a faster experience.</p> <p>Once you understand this pattern, you can apply it to many other tools like QR generators, image converters, and file processors.</p> <p>And that’s where things start to get interesting.</p> </article> <article> <h1> How to Build Your Own Claude Code Skill </h1> <p>Daniel Nwaneri — Fri, 27 Mar 2026 20:47:26 +0000</p> <p>Every developer eventually has a workflow they repeat. A way they write commit messages. A checklist they run before opening a pull request. A structure they follow when reviewing code. They do it manually, explain it to their agents in every session, and watch the agent interpret it differently each time.</p> <p>Agent skills fix this. A skill is a markdown file that loads into Claude Code's context automatically when you need it. You write the workflow once. The agent follows it every time. And because skills follow an open standard, the same file works in Claude Code, GitHub Copilot, Cursor, and Gemini CLI.</p> <p>This tutorial shows you how to build a skill from scratch. You will build a commit-message-writer — a skill that reads your staged changes and generates a structured commit message following the Conventional Commits standard. By the end, you will have a working skill installed and ready to use, and you will understand the structure well enough to build any skill you need.</p> <h2 id="heading-table-of-contents">Table of Contents</h2> <ol> <li><p><a href="#heading-what-an-agent-skill-is">What an Agent Skill Is</a></p> </li> <li><p><a href="#heading-how-to-choose-what-to-build">How to Choose What to Build</a></p> </li> <li><p><a href="#heading-how-to-structure-your-skill">How to Structure Your Skill</a></p> </li> <li><p><a href="#heading-how-to-write-the-description">How to Write the Description</a></p> </li> <li><p><a href="#heading-how-to-write-the-instructions">How to Write the Instructions</a></p> </li> <li><p><a href="#heading-how-to-build-the-commit-message-writer-skill">How to Build the commit-message-writer Skill</a></p> </li> <li><p><a href="#heading-how-to-install-and-test-your-skill">How to Install and Test Your Skill</a></p> </li> <li><p><a href="#heading-how-to-improve-your-skill-over-time">How to Improve Your Skill Over Time</a></p> </li> <li><p><a href="#heading-where-to-go-next">Where to Go Next</a></p> </li> </ol> <h2 id="heading-what-an-agent-skill-is">What an Agent Skill Is</h2> <p>A skill is a folder containing a <code>SKILL.md</code> file. That file has two parts: a YAML frontmatter block at the top, and a markdown body below it.</p> <pre><code class="language-plaintext">my-skill/ └── SKILL.md </code></pre> <p>The frontmatter tells the agent what the skill is called and when to use it. The body tells the agent what to do when it loads the skill. Here is the minimal structure:</p> <pre><code class="language-yaml">--- name: my-skill description: What this skill does and when to use it. --- # My Skill Instructions for the agent go here. </code></pre> <p>When you invoke a skill — either explicitly with <code>/skill-name</code> or by describing what you want — the agent reads the SKILL.md body and follows the instructions inside it. The frontmatter never reaches the agent's instructions. It's metadata the skill system uses to decide whether to load the skill at all.</p> <h3 id="heading-how-the-agent-decides-to-load-a-skill">How the Agent Decides to Load a Skill</h3> <p>This is the most important thing to understand before you write your first skill: <strong>the agent decides whether to load your skill based entirely on the description field.</strong></p> <p>Skills appear in Claude Code's context as a list of names and descriptions. When you make a request, the agent scans that list and loads any skill whose description matches what you're asking for. If the description is vague, the skill won't load when you need it. If the description is too narrow, it won't load for variations of the same request.</p> <p>The instructions in the body only matter after the skill loads. Getting the description right is what determines whether the skill loads at all.</p> <h3 id="heading-what-skills-are-not">What Skills Are Not</h3> <p>Skills are instruction files. They cannot run code on their own — but they can instruct the agent to run code using its existing tools. They are not plugins, extensions, or packages. They have no runtime. They are markdown files the agent reads, like a recipe a chef follows.</p> <h2 id="heading-how-to-choose-what-to-build">How to Choose What to Build</h2> <p>The best skills share three properties.</p> <ol> <li><p><strong>They encode a repeatable workflow.</strong> If you do something differently every time, a skill won't help. If you follow the same steps every session — even if you explain them differently each time — that's a skill candidate.</p> </li> <li><p><strong>They have a clear trigger.</strong> You should be able to finish the sentence "I need this skill when I want to...". If you can't finish that sentence in one clause, the workflow isn't scoped enough for a skill.</p> </li> <li><p><strong>They produce a consistent output format.</strong> Skills that output in a fixed structure — a commit message, a code review, a spec — are easier to build and test than skills that produce open-ended prose.</p> </li> </ol> <p>Good candidates: commit messages, pull request descriptions, code reviews, changelog entries. Bad candidates: "help me think through this", "make this better" — too open-ended to encode in a skill.</p> <p>For this tutorial, commit message generation is the right scope. The trigger is obvious (you want to commit), the workflow is defined (read staged changes, apply Conventional Commits format), and the output is structured (a commit message with a specific shape).</p> <h2 id="heading-how-to-structure-your-skill">How to Structure Your Skill</h2> <p>Every skill starts as a single folder with a single file:</p> <pre><code class="language-plaintext">commit-message-writer/ └── SKILL.md </code></pre> <p>As skills grow, they can include additional files the agent loads as needed:</p> <pre><code class="language-plaintext">commit-message-writer/ ├── SKILL.md ← always loaded when skill triggers └── references/ └── examples.md ← loaded only when the agent needs examples </code></pre> <p>The SKILL.md body should stay under 500 lines. If your instructions are growing beyond that, move supporting detail into a <code>references/</code> subfolder and tell the agent when to read those files. This keeps the skill lean — the agent only loads what it needs.</p> <p>For this tutorial, a single SKILL.md is enough.</p> <h2 id="heading-how-to-write-the-description">How to Write the Description</h2> <p>The description field is the trigger condition. It determines when your skill loads and when it doesn't. Most skills fail not because the instructions are wrong, but because the description doesn't match how people actually ask for help.</p> <p>Here is a weak description:</p> <pre><code class="language-yaml">description: Generates commit messages. </code></pre> <p>This will undertrigger. "Generate a commit message" will load it. "Write a commit for my changes" probably won't. "Summarize my staged diff" definitely won't — even though all three are asking for the same thing.</p> <p>Here is a stronger description:</p> <pre><code class="language-yaml">description: Generates structured commit messages following the Conventional Commits standard. Use when you want to commit your changes and need a well-formatted message. Triggers on "write a commit message", "commit my changes", "summarize my staged diff", "what should my commit say", or any request to describe or document code changes for version control. </code></pre> <p>The pattern is: <strong>what the skill does + when to use it + specific trigger phrases</strong>. The trigger phrases cover the different ways a developer might ask for the same thing.</p> <p>Two rules for descriptions:</p> <p><strong>Be specific about the output.</strong> "Generates commit messages" is vague. "Generates structured commit messages following the Conventional Commits standard" tells the agent and the user exactly what they'll get.</p> <p><strong>Be slightly pushy.</strong> The agent has a natural tendency to undertrigger skills — to handle requests itself rather than loading a skill. A description that explicitly lists trigger phrases counteracts this. You are not being redundant. You are training the trigger.</p> <h2 id="heading-how-to-write-the-instructions">How to Write the Instructions</h2> <p>The body of SKILL.md is where you define what the agent does when the skill loads. Good instructions follow two principles.</p> <p><strong>Generate first, clarify second.</strong> The agent should produce output immediately rather than asking clarifying questions. If it needs to make assumptions, it should make them and flag them — not ask. Asking questions before producing output adds friction and loses the benefit of having a skill at all.</p> <p><strong>Define the output format explicitly.</strong> Don't say "write a good commit message." Say exactly what the structure is, what fields are required, what the character limits are. The more specific the output format, the more consistent the results.</p> <p>Here is what weak instructions look like:</p> <pre><code class="language-markdown"># Commit Message Writer Look at the staged changes and write a commit message that describes what changed. </code></pre> <p>That will produce different results every time — different formats, different lengths, different conventions. It's not a skill. It's a prompt.</p> <p>Here is what strong instructions look like:</p> <pre><code class="language-markdown"># Commit Message Writer Read the staged diff using `git diff --staged`. Generate a commit message following the Conventional Commits standard. Output format: type(scope): short description under 72 characters Body (if changes are non-trivial): - What changed and why, not how - One bullet per logical change Footer (if applicable): BREAKING CHANGE: description Closes #issue-number </code></pre> <p>The agent knows exactly what to produce. The output will be consistent across sessions, across projects, and across agents that support the standard.</p> <h2 id="heading-how-to-build-the-commit-message-writer-skill">How to Build the <code>commit-message-writer</code> Skill</h2> <p>Now build it. Create the skill directory:</p> <pre><code class="language-bash">mkdir -p ~/.claude/skills/commit-message-writer </code></pre> <p>On Windows PowerShell:</p> <p><strong>Note:</strong> PowerShell uses backtick (<code>`</code>) for line continuation, not backslash.</p> <pre><code class="language-powershell">New-Item -ItemType Directory -Force -Path "$HOME\.claude\skills\commit-message-writer" </code></pre> <p>Create the SKILL.md file inside that directory. Here is the complete content:</p> <pre><code class="language-markdown">--- name: commit-message-writer description: Generates structured commit messages following the Conventional Commits standard. Use when you want to commit your changes and need a well-formatted message. Triggers on "write a commit message", "commit my changes", "summarize my staged diff", "what should my commit say", or any request to describe or document staged changes for version control. --- # commit-message-writer You generate structured commit messages from staged git changes. ## How to invoke Run `git diff --staged` to read the staged changes. If nothing is staged, tell the user and suggest they run `git add` first. Generate first. Do not ask clarifying questions before producing the commit message. If you need to make assumptions about scope or type, make them and note them after the output. ## Output format ~~~ type(scope): short description [body — optional, include if changes are non-trivial] [footer — optional] ~~~ **Type** — choose one: - `feat` — a new feature - `fix` — a bug fix - `docs` — documentation changes only - `refactor` — code change that neither fixes a bug nor adds a feature - `test` — adding or updating tests - `chore` — build process, tooling, or dependency updates **Scope** — the module, file, or area affected. Use the directory name or component name. Omit if the change spans the entire codebase. **Short description** — imperative mood, under 72 characters, no period at the end. "Add user authentication" not "Added user authentication" or "Adds user authentication." **Body** — what changed and why, not how. One bullet per logical change. Skip if the short description is self-explanatory. **Footer** — include `BREAKING CHANGE:` if the commit breaks backward compatibility. Include `Closes #N` if it resolves a GitHub issue. ## Quality rules - Never use "updated", "changed", or "modified" in the short description — be specific - Never write "various improvements" or "misc fixes" — name what improved - If more than three files changed across unrelated concerns, flag it: "These changes may be better split into separate commits: [list concerns]" - The short description must be under 72 characters — count before outputting ## Example output Input: staged changes adding a rate limiter to an API endpoint ~~~ feat(api): add rate limiting to /query endpoint - Limits requests to 100 per minute per IP using Cloudflare's rate limit binding - Returns 429 with Retry-After header when limit is exceeded - Adds rate limit configuration to wrangler.toml Closes #47 ~~~ </code></pre> <p>Save that file. The skill is built.</p> <h2 id="heading-how-to-install-and-test-your-skill">How to Install and Test Your Skill</h2> <h3 id="heading-verify-the-file-exists">Verify the File Exists</h3> <pre><code class="language-bash">cat ~/.claude/skills/commit-message-writer/SKILL.md </code></pre> <p>You should see the full SKILL.md content. If you get an error, check the directory path.</p> <h3 id="heading-test-the-skill">Test the Skill</h3> <p>Open Claude Code in any git repository that has staged changes. Type:</p> <pre><code class="language-plaintext">/commit-message-writer </code></pre> <p>The agent will read your staged diff and produce a commit message following the format you defined.</p> <p>You can also trigger it naturally:</p> <pre><code class="language-plaintext">write a commit message for my staged changes </code></pre> <pre><code class="language-plaintext">what should my commit say </code></pre> <pre><code class="language-plaintext">summarize my diff for git </code></pre> <p>All three should load the skill and produce a structured commit message. If the skill doesn't trigger on natural language requests, the description needs more trigger phrases — see the improvement section below.</p> <h3 id="heading-test-edge-cases">Test Edge Cases</h3> <p>Test these cases before relying on the skill in production:</p> <pre><code class="language-bash"># Stage nothing, then ask for a commit message git add -p # stage nothing # In Claude Code: "write a commit message" # Expected: skill tells you nothing is staged and suggests git add </code></pre> <pre><code class="language-bash"># Stage changes across unrelated files git add src/api.ts src/styles.css README.md # In Claude Code: "write a commit message" # Expected: skill flags that commits may be better split </code></pre> <h2 id="heading-how-to-improve-your-skill-over-time">How to Improve Your Skill Over Time</h2> <p>The first version of any skill is a draft. You improve it by observing where it produces inconsistent or wrong output, then updating the instructions.</p> <h3 id="heading-when-the-skill-undertriggers">When the Skill Undertriggers</h3> <p>If you type "summarize my changes for git" and the skill doesn't load, add that phrase to the description's trigger list:</p> <pre><code class="language-yaml">description: ... Triggers on "write a commit message", "commit my changes", "summarize my staged diff", "summarize my changes for git", ... </code></pre> <p>The description is your primary lever for fixing triggering problems.</p> <h3 id="heading-when-the-output-format-drifts">When the Output Format Drifts</h3> <p>If the agent starts producing commit messages that don't match your format — wrong type, missing scope, body in the wrong style — the instructions need to be more explicit. Add a concrete example that shows the failure and the correct output:</p> <pre><code class="language-markdown">## Common mistakes to avoid Wrong: "Updated the authentication flow" Right: "refactor(auth): simplify token validation logic" Wrong: "Fixed bugs" Right: "fix(api): handle null response from upstream service" </code></pre> <p>Concrete counterexamples are more effective than abstract rules.</p> <h3 id="heading-when-the-scope-grows">When the Scope Grows</h3> <p>If you find yourself wanting the skill to handle related tasks — reviewing commit messages, generating changelogs, writing PR descriptions — resist the urge to add everything to one skill. Build separate skills. Each skill should do one thing well. The Agent Skills standard is designed for composition, not for monolithic instructions.</p> <h2 id="heading-where-to-go-next">Where to Go Next</h2> <p>The commit-message-writer covers the core pattern. The same structure works for any repeatable workflow.</p> <p><strong>Pull request descriptions</strong> follow the same shape — read the diff, apply a structure, produce consistent output. The trigger phrases are different ("write a PR description", "summarize my branch for review") and the output format adds sections for motivation and testing, but the SKILL.md structure is identical.</p> <p><strong>Code review checklists</strong> work well as skills when your team has a standard review process. The trigger is "review this code" or "check this PR", and the instructions encode whatever your team actually checks — security concerns, test coverage, naming conventions.</p> <p>The commit-message-writer is the simplest skill architecture — instructions only. As your skills grow more specialized, two other patterns become useful.</p> <p>The first adds a <code>references/</code> directory: the voice-humanizer skill loads a CORPUS.md file containing the author's published writing, which the agent reads when it needs to check output against a specific style. The second adds quality rules and structured output formats that make results stricter and more consistent — that's the pattern spec-writer uses to surface assumptions inline. Each is the same SKILL.md structure at a different level of complexity.</p> <p>Start with instructions only. Add references when the agent needs external context. Add output format rules when consistency matters more than flexibility.</p> <p>The Agent Skills standard is supported in Claude Code, GitHub Copilot in VS Code, Cursor, and Gemini CLI. A skill you build once installs across all of them. The install path differs by agent:</p> <table> <thead> <tr> <th>Agent</th> <th>Skills directory</th> </tr> </thead> <tbody><tr> <td>Claude Code</td> <td><code>~/.claude/skills/</code></td> </tr> <tr> <td>GitHub Copilot</td> <td><code>~/.copilot/skills/</code> or <code>.github/skills/</code></td> </tr> <tr> <td>Cursor</td> <td><code>~/.cursor/skills/</code></td> </tr> <tr> <td>Gemini CLI</td> <td><code>~/.gemini/skills/</code></td> </tr> </tbody></table> <p>The SKILL.md format is the same across all of them.</p> <p>The commit-message-writer you just built is a working skill. The next one will take less time. By the third, you will start seeing workflows you repeat and immediately think: that should be a skill.</p> <p>That's the point.</p> </article> <article> <h1> How to Stop Letting AI Agents Guess Your Requirements </h1> <p>Daniel Nwaneri — Tue, 24 Mar 2026 00:35:37 +0000</p> <p>I spent 64% of my weekly Claude budget before Wednesday building a tool designed to reduce Claude usage. That's the kind of irony that deserves its own specification.</p> <p>The tool is spec-writer: a Claude Code skill that takes a vague feature request and generates a structured spec, technical plan, and task breakdown before a single line of code gets written.</p> <p>The problem it solves is one most developers hit within their first week of using AI coding agents seriously: the agent writes confidently in the wrong direction and you pay for it twice, once in tokens, once in rewrites.</p> <p>This tutorial shows you how to install spec-writer, how to invoke it on a real feature, and how to read the output so you can catch the assumptions that would have wasted your time.</p> <h2 id="heading-table-of-contents">Table of Contents</h2> <ol> <li><p><a href="#heading-the-problem-with-prompting-agents-directly">The Problem with Prompting Agents Directly</a></p> </li> <li><p><a href="#heading-what-specdriven-development-is">What Spec-Driven Development Is</a></p> </li> <li><p><a href="#heading-how-spec-writer-works">How spec-writer Works</a></p> </li> <li><p><a href="#heading-how-to-install-spec-writer">How to Install spec-writer</a></p> </li> <li><p><a href="#heading-how-to-write-your-first-spec">How to Write Your First Spec</a></p> </li> <li><p><a href="#heading-how-to-read-the-output">How to Read the Output</a></p> </li> <li><p><a href="#heading-how-to-hand-the-spec-to-your-agent">How to Hand the Spec to Your Agent</a></p> </li> <li><p><a href="#heading-where-to-go-next">Where to Go Next</a></p> </li> </ol> <h2 id="heading-the-problem-with-prompting-agents-directly">The Problem with Prompting Agents Directly</h2> <p>Here is what happens when you skip the spec.</p> <p>You have a feature in your head: "Add a way for users to export their data." You open Claude Code and describe it. The agent produces code. It looks right. You run it. It's mostly right – except it exports everything including soft-deleted records, it doesn't paginate, it times out on large accounts, and it has no authentication check on the export endpoint.</p> <p>None of those things were in your prompt. The agent guessed, and it guessed plausibly – which is worse than guessing obviously wrong. You didn't notice until testing.</p> <p>This is the fundamental problem with prompting agents directly on anything non-trivial: your prompt carries your conscious requirements, but every feature has a shadow of requirements you didn't think to state. And the agent fills that shadow with assumptions.</p> <p>Most of the time, those assumptions are reasonable. Some of the time, they're wrong in ways that take hours to unravel.</p> <p>The failure mode isn't hallucination. It's the agent being exactly as helpful as the prompt allowed, which wasn't helpful enough.</p> <p>Spec-Driven Development addresses this directly. The methodology – documented extensively by practitioners like Julián Deangelis – argues that a written spec isn't documentation overhead. It's the mechanism that forces you to make decisions before the agent does.</p> <h2 id="heading-what-spec-driven-development-is">What Spec-Driven Development Is</h2> <p>Spec-Driven Development is the practice of writing a structured specification before you write code or prompt an agent. The spec defines what the feature must do, what assumptions are being made, and what tasks the implementation breaks into.</p> <p>The key insight is what a spec is <em>for</em>. A spec is not trying to replace code. It's trying to surface the decisions that would otherwise be invisible. The agent will make those decisions either way: with a spec, you make them first. Without a spec, you discover them during testing.</p> <p>The strongest counterargument to SDD comes from Gabriella Gonzalez: <em>a sufficiently detailed spec is just code</em>. She's right that some specs devolve into pseudocode so specific they might as well be implementations.</p> <p>But that's a spec written at the wrong level of abstraction. The goal is to name the decisions, not to pre-implement them. "Only authenticated users can trigger this export" is a decision. "Call <code>verifyJWT(token)</code> and return 401 if it fails" is implementation. The spec needs the first. The agent handles the second.</p> <p>SDD has three levels:</p> <ol> <li><p><strong>Spec-First</strong>: write a spec before every feature and hand it to the agent as context. This is the entry point and the workflow this tutorial focuses on.</p> </li> <li><p><strong>Spec-Anchored</strong>: the spec lives in the repository and evolves alongside the code. When requirements change, you update the spec and re-prompt the agent to realign.</p> </li> <li><p><strong>Spec-as-Source</strong>: the spec is the primary artifact. Code is generated from it and considered disposable. This is the most ambitious level and the direction many teams are moving toward.</p> </li> </ol> <p>spec-writer gets you to Spec-First immediately, with no ceremony.</p> <h2 id="heading-how-spec-writer-works">How spec-writer Works</h2> <p>spec-writer is a Claude Code skill – a markdown file that loads into the agent's context and changes how it responds when invoked.</p> <p>The skill follows one rule: generate first, flag assumptions inline. Instead of asking you clarifying questions before producing output, it generates the full spec immediately and marks every decision it made without your explicit input using <code>[ASSUMPTION: ...]</code> tags. Then you correct what's wrong.</p> <p>This is faster than Q&A because it makes the decisions visible in a form you can react to rather than anticipate.</p> <p>The output has three sections in fixed order:</p> <ol> <li><p><strong>SPEC</strong>: the what. One-line purpose, user stories, requirements, edge cases, and acceptance criteria in Given/When/Then format.</p> </li> <li><p><strong>PLAN</strong>: the how. Stack and architecture decisions, data model changes, API contracts, testing strategy, and security constraints.</p> </li> <li><p><strong>TASKS</strong>: the breakdown. Ordered, self-contained tasks each completable in a single agent session, each with its own acceptance criteria.</p> </li> </ol> <p>After the three sections, the skill produces an <strong>Assumptions summary</strong>: every <code>[ASSUMPTION: ...]</code> from the output, ranked by impact. This is the part you review before handing anything to the agent.</p> <p>The skill is compatible with <a href="https://github.com/github/spec-kit">GitHub Spec Kit</a> and <a href="https://github.com/Fission-AI/OpenSpec">OpenSpec</a>. If you use either framework, save the spec output to your <code>.specify/</code> or <code>openspec/changes/</code> directory and continue from there.</p> <h2 id="heading-how-to-install-spec-writer">How to Install spec-writer</h2> <p>spec-writer uses the Agent Skills standard, which means the same SKILL.md file works across Claude Code, Cursor, GitHub Copilot, Gemini CLI, and any other agent that supports the standard. You install it once and it works everywhere.</p> <h3 id="heading-installation">Installation</h3> <p>Create the skills directory if it doesn't exist and clone the repo:</p> <pre><code class="language-bash">mkdir -p ~/.claude/skills git clone https://github.com/dannwaneri/spec-writer.git ~/.claude/skills/spec-writer </code></pre> <p>On Windows PowerShell:</p> <p><strong>(Note:</strong> PowerShell uses backtick (<code>`</code>) for line continuation, not backslash.)</p> <pre><code class="language-powershell">New-Item -ItemType Directory -Force -Path "$HOME\.claude\skills" git clone https://github.com/dannwaneri/spec-writer.git "$HOME\.claude\skills\spec-writer" </code></pre> <p>That's the entire installation. No package to install, no configuration file to edit, no API key. The skill is a markdown file. The agent reads it.</p> <h3 id="heading-verification">Verification</h3> <p>Open Claude Code and type:</p> <pre><code class="language-plaintext">/spec-writer test </code></pre> <p>If the skill is installed correctly, the agent will read the SKILL.md and produce a spec structure – even for "test" as input. You'll see the three sections and the Assumptions summary. If nothing happens, confirm that the <code>~/.claude/skills/spec-writer/SKILL.md</code> file exists.</p> <h2 id="heading-how-to-write-your-first-spec">How to Write Your First Spec</h2> <p>The invocation pattern is:</p> <pre><code class="language-plaintext">/spec-writer [your feature description] </code></pre> <p>The description can be as rough as you'd write in a Slack message. The skill's job is to structure it. You don't need to be precise. Being precise is what the spec is for.</p> <h3 id="heading-a-real-example">A Real Example</h3> <p>Here is a real invocation from the Foundation project, a federated knowledge base built on Cloudflare Workers:</p> <pre><code class="language-plaintext">/spec-writer Add CLI capture for Claude Code sessions stored in ~/.claude/projects/ </code></pre> <p>That's twelve words. No mention of the tech stack, no data model details, no authentication requirements, no error handling spec. All of that comes out in the spec.</p> <p>Here is the abbreviated output:</p> <p><strong>SPEC</strong></p> <p><strong>One-line purpose:</strong> Developers can capture Claude Code session logs from their local filesystem into Foundation before they are deleted after 30 days.</p> <p><strong>Users and use cases:</strong></p> <ul> <li><p>As a developer using Claude Code daily, I want my coding sessions captured automatically so that architectural decisions and debugging patterns don't disappear after 30 days.</p> </li> <li><p>As a Foundation user, I want CLI sessions and browser extension captures in the same knowledge base so that I have one place to query across all my AI interactions.</p> </li> </ul> <p><strong>Acceptance criteria:</strong></p> <pre><code class="language-plaintext">Given a developer with Claude Code sessions in ~/.claude/projects/ When they run the Foundation CLI capture command Then a list of uncaptured sessions is displayed, sorted by most recent first Given a session already in Foundation When the developer runs capture again Then that session does not appear in the review list Given a malformed .jsonl file When the capture command processes it Then an error is logged for that file and processing continues on remaining sessions </code></pre> <p><strong>PLAN</strong></p> <p><strong>Stack and architecture:</strong></p> <pre><code class="language-plaintext">CLI capture runs locally — it needs to read the local filesystem, which Cloudflare Workers cannot do. [ASSUMPTION: CLI capture is a local Node.js or Bun script that calls the Foundation API to insert sessions, rather than a Worker itself] </code></pre> <p><strong>API contracts:</strong></p> <ul> <li><p><code>POST /api/sessions/import</code> – accepts parsed session JSON, returns session ID</p> </li> <li><p>Returns 409 if session already exists</p> </li> <li><p>Triggers evaluator pipeline after successful insert</p> </li> </ul> <p><strong>TASKS</strong></p> <pre><code class="language-plaintext">## Task 1: .jsonl parser **What to build:** A parser that reads a Claude Code .jsonl session file and outputs Foundation's internal message format. **Files likely affected:** cli/src/parsers/claudeCode.js (create) **Acceptance criteria:** 1. Parses a valid .jsonl file into the correct message format 2. Handles malformed lines without crashing — logs and skips 3. Returns empty array for empty file **Dependencies:** none </code></pre> <p><strong>Assumptions to review:</strong></p> <pre><code class="language-plaintext">1. CLI capture is a local script calling the Foundation API, not a Worker Impact: HIGH Correct this if: you want a purely serverless approach 2. Manual curation before capture, not automatic bulk import Impact: HIGH Correct this if: you want automatic background capture 3. Session ID from .jsonl filename is the deduplication key Impact: MEDIUM Correct this if: session IDs are stored differently in your schema 4. No sensitive data scrubbing in v1 Impact: MEDIUM Correct this if: your sessions contain credentials or keys </code></pre> <p>Twelve words in, four decisions surfaced immediately – three of which had real architectural implications.</p> <p>The third assumption ("Session ID from .jsonl filename is the deduplication key") is the one that would have caused the most subtle bug. The agent would have implemented deduplication based on the filename and it would have worked until a session was renamed. The spec caught it before a line of code was written.</p> <h2 id="heading-how-to-read-the-output">How to Read the Output</h2> <p>The output is designed to be scanned for <code>[ASSUMPTION: ...]</code> tags first, read for the tasks second.</p> <h3 id="heading-reading-the-assumptions">Reading the Assumptions</h3> <p>Every <code>[ASSUMPTION: ...]</code> tag marks a place where the agent filled in something you didn't specify. Your job is to go through the Assumptions summary and decide for each one:</p> <ul> <li><p><strong>Correct</strong>: the assumption is right, leave it</p> </li> <li><p><strong>Override</strong>: the assumption is wrong, restate it and re-run the spec</p> </li> <li><p><strong>Defer</strong>: the assumption doesn't matter for this iteration, mark it and move on</p> </li> </ul> <p>The impact rating tells you which assumptions to fix before you start coding. HIGH-impact assumptions affect architecture or data model. If they're wrong, fixing them requires rework. LOW-impact assumptions affect behavior details that are easy to change later.</p> <h3 id="heading-reading-the-acceptance-criteria">Reading the Acceptance Criteria</h3> <p>The acceptance criteria in Given/When/Then format are the most useful part of the spec for catching scope errors. Read each one and ask: is this actually what I want?</p> <p>Criteria are binary by design. "Returns 401 when unauthenticated" is a criterion. "Works correctly" is not. If you find yourself reading a criterion and thinking "well, it depends", then that's a signal that the criterion is hiding an assumption. Restate it.</p> <h3 id="heading-reading-the-tasks">Reading the Tasks</h3> <p>The tasks are ordered and self-contained. Each task produces a verifiable change. Before you hand any task to an agent, check two things:</p> <ol> <li><p>Does the task have all the context it needs? If a task says "follow the existing auth pattern" and you haven't pointed the agent at your auth code, it will guess.</p> </li> <li><p>Does the acceptance criteria match what you'd actually test? If the criteria are vague, tighten them before the agent sees the task.</p> </li> </ol> <h2 id="heading-how-to-hand-the-spec-to-your-agent">How to Hand the Spec to Your Agent</h2> <p>The spec is context, not a prompt. When you start an agent session for a task, include the relevant spec sections alongside the task description.</p> <p>For Task 1 from the example above, your agent session might open like this:</p> <pre><code class="language-plaintext">Context: - This is a federated knowledge base built on Cloudflare Workers, D1, and Vectorize - Sessions are stored in ~/.claude/projects/ as .jsonl files - The API runs at https://<your-worker>.workers.dev Spec: [paste the SPEC and PLAN sections] Task: [paste Task 1] </code></pre> <p>The context block is just an example. Replace it with your own project's tech stack, file locations, and API URL. The point is to give the agent the same context a new team member would need on day one.</p> <p>The agent now has requirements, architecture context, and a single scoped task with binary acceptance criteria. It cannot guess the deduplication key incorrectly because the spec already resolved that assumption. It cannot skip error handling because the acceptance criteria explicitly require it.</p> <p>This is the workflow the spec is designed for. The spec doesn't replace the agent. Rather, it removes the decisions from the agent's hands and puts them in yours, before the work starts.</p> <h3 id="heading-saving-the-spec-for-later">Saving the Spec for Later</h3> <p>If you want to move toward Spec-Anchored development – where the spec lives in the repository – save the output to a <code>specs/</code> directory in your project:</p> <pre><code class="language-bash"># Create specs directory mkdir -p specs # Save your spec # Paste the output into specs/cli-capture.md </code></pre> <p>When requirements change, update the spec and re-prompt the agent to realign the implementation. The spec becomes the source of truth, not the code comments.</p> <h2 id="heading-where-to-go-next">Where to Go Next</h2> <p>Try it on your next feature before you write a line of code. The assumptions it flags will tell you something about your feature you hadn't consciously decided yet – and correcting the HIGH-impact ones before you hand anything to an agent is the whole point. Skipping that step is the same as prompting directly.</p> <p>If your project is growing, move toward Spec-Anchored. Save specs in your repository under <code>specs/</code>. When a new contributor joins or an agent starts a session cold, the specs give them the decisions that got made without requiring them to reverse-engineer the code.</p> <p>The strongest ongoing challenge to this workflow is Gabriella Gonzalez's argument that detailed specs become code. If your specs are getting implementation-specific, you've crossed a line. Pull back to decisions – "only authenticated users can trigger this" – and leave implementation to the agent. The spec's job is to name what the agent would have guessed wrong, not to write the feature in prose.</p> <p>The Agent Skills standard now works across Claude Code, GitHub Copilot, Cursor, and Gemini CLI. The spec-writer repo is at <a href="https://github.com/dannwaneri/spec-writer">github.com/dannwaneri/spec-writer</a>.</p> <p>The irony of spending 64% of a Claude budget building a token-efficiency tool is real. But the spec surfaced four decisions on a twelve-word prompt. The fourth one – the deduplication key assumption – would have produced a bug that worked perfectly until a session got renamed.</p> <p>That's not a hallucination. That's the agent being exactly as helpful as the prompt allowed.</p> <p>The spec is how you raise the ceiling on what "helpful" means.</p> </article> <article> <h1> How to Build a Production RAG System with Cloudflare Workers – a Handbook for Devs </h1> <p>Daniel Nwaneri — Wed, 18 Mar 2026 23:05:13 +0000</p> <p>Most RAG tutorials show you a working demo and call it done. You copy the code, it runs locally, and then you try to put it in production and everything falls apart.</p> <p>This tutorial is different. I run a production RAG system (<a href="https://github.com/dannwaneri/vectorize-mcp-worker">vectorize-mcp-worker</a>) that handles real traffic at a total cost of $5/month. The alternatives I evaluated ranged from $100–$200/month. The difference isn't magic. It's architecture.</p> <p>Here, you'll build <code>rag-tutorial-simple</code>: a clean, minimal RAG chatbot deployed on Cloudflare Workers. No external API keys. No paid vector database subscriptions. No servers to manage. Just Cloudflare's free tier – Workers, Vectorize, and Workers AI – doing the heavy lifting at the edge.</p> <h2 id="heading-table-of-contents">Table of Contents</h2> <ol> <li><p><a href="#heading-what-you-will-build">What You Will Build</a></p> </li> <li><p><a href="#heading-prerequisites">Prerequisites</a></p> </li> <li><p><a href="#heading-how-rag-works">How RAG Works</a></p> </li> <li><p><a href="#heading-how-to-set-up-your-project">How to Set Up Your Project</a></p> </li> <li><p><a href="#heading-how-to-build-the-data-pipeline">How to Build the Data Pipeline</a></p> </li> <li><p><a href="#heading-how-to-build-the-query-pipeline">How to Build the Query Pipeline</a></p> </li> <li><p><a href="#heading-how-to-add-error-handling-and-security">How to Add Error Handling and Security</a></p> </li> <li><p><a href="#heading-performance-and-cost-analysis">Performance and Cost Analysis</a></p> </li> <li><p><a href="#heading-conclusion">Conclusion</a></p> </li> </ol> <h2 id="heading-what-you-will-build">What You Will Build</h2> <p>By the end of this tutorial, you'll have a globally deployed RAG API that:</p> <ul> <li><p>Accepts a natural language question via HTTP</p> </li> <li><p>Converts it to a vector embedding using Workers AI</p> </li> <li><p>Searches a knowledge base stored in Cloudflare Vectorize</p> </li> <li><p>Passes the retrieved context to an LLM (also on Workers AI) to generate an answer</p> </li> <li><p>Returns a grounded, accurate response (not a hallucination)</p> </li> </ul> <p>The complete source code is available at <a href="https://github.com/dannwaneri/rag-tutorial-simple">github.com/dannwaneri/rag-tutorial-simple</a>.</p> <h2 id="heading-prerequisites">Prerequisites</h2> <p>This is an intermediate-level tutorial. You should be comfortable with:</p> <ul> <li><p><strong>JavaScript/TypeScript</strong>: async/await, promises, basic types</p> </li> <li><p><strong>HTTP APIs</strong>: REST, request/response, JSON</p> </li> <li><p><strong>Command line basics</strong>: running npm commands, navigating directories</p> </li> </ul> <p>You will need:</p> <ul> <li><p><strong>Node.js 18 or higher</strong>: check with <code>node --version</code></p> </li> <li><p><strong>A Cloudflare account</strong>: free tier is fine, sign up at <a href="https://dash.cloudflare.com/sign-up">cloudflare.com</a></p> </li> <li><p><strong>A code editor</strong>: VS Code recommended for TypeScript support</p> </li> </ul> <p>That's it. No OpenAI key. No credit card for embeddings. Let's build.</p> <h2 id="heading-how-rag-works">How RAG Works</h2> <p>Before you write any code, you'll need a clear mental model of what you're building. This section explains the three core components of a RAG system, how data flows between them, and why this architecture works at scale.</p> <h3 id="heading-the-mental-model">The Mental Model</h3> <p>Think of a traditional LLM like a doctor who studied medicine for years but has been in a remote cabin with no internet since their graduation day. They are brilliant, but they only know what they knew when they left. Ask them about a drug approved last year and they'll either say they don't know or – worse – confidently give you wrong information.</p> <p>RAG gives that doctor access to an up-to-date medical library. Before answering your question, they can look up the relevant pages, read them, and use that information to give you an accurate answer. Their training still matters (that is, they know how to read and interpret the information), but they're no longer limited to what they memorized years ago.</p> <p>In technical terms, RAG works in three steps on every request:</p> <ol> <li><p><strong>Retrieve</strong>: find the most relevant documents from your knowledge base</p> </li> <li><p><strong>Augment</strong>: add those documents to the LLM prompt as context</p> </li> <li><p><strong>Generate</strong>: let the LLM produce an answer using both its training and the retrieved context</p> </li> </ol> <h3 id="heading-the-three-components">The Three Components</h3> <p>Every RAG system has three moving parts. Understanding each one will help you debug problems and make better architectural decisions as you build.</p> <h4 id="heading-the-embedding-model">The Embedding Model</h4> <p>An embedding model converts text into a vector – an array of numbers that represents the meaning of that text. The model you will use in this tutorial, <code>@cf/baai/bge-base-en-v1.5</code>, outputs 768 numbers for any piece of text you give it.</p> <p>The critical property of embeddings is that semantically similar text produces numerically similar vectors. "How do I install Node.js?" and "What's the process for setting up Node?" will produce vectors that are close together. "How do I install Node.js?" and "What is the capital of France?" will produce vectors that are far apart.</p> <p>This is what makes semantic search possible. You aren't matching keywords, you're matching meaning.</p> <p>One rule you must never break: your documents and your queries must be embedded with the same model. If you embed your documents with <code>bge-base-en-v1.5</code> and your queries with a different model, the vectors won't be comparable and your searches will return garbage.</p> <h4 id="heading-the-vector-database">The Vector Database</h4> <p>The vector database stores your embeddings and lets you search them by similarity. In this tutorial, you'll use Cloudflare Vectorize.</p> <p>When you run a similarity search, you pass in a query vector and Vectorize returns the K most similar vectors it has stored, along with their metadata and similarity scores. This is called approximate nearest neighbor search, and Vectorize is optimized to do it fast even across millions of vectors.</p> <p>The key advantage of using Vectorize over an external vector database like Pinecone is co-location. Vectorize runs in the same Cloudflare network as your Worker. There's no external API call, no authentication roundtrip, and no network latency between your application and your database.</p> <h4 id="heading-the-language-model">The Language Model</h4> <p>The LLM is responsible for one thing: reading the retrieved context and generating a natural language answer. It doesn't search anything. It doesn't decide what's relevant. It just reads what you give it and writes a response.</p> <p>This separation of concerns is intentional. The LLM is good at language: understanding questions, synthesizing information, writing clearly. The vector database is good at retrieval: finding relevant documents fast. RAG combines their strengths without asking either component to do something it is not designed for.</p> <p>In this tutorial you'll use <code>@cf/meta/llama-3.3-70b-instruct-fp8-fast</code> through Workers AI. No API key required.</p> <h3 id="heading-a-note-on-visual-embeddings">A Note on Visual Embeddings</h3> <p>If you plan to extend this system to search images, you may be tempted to use a vision-language model like CLIP to generate visual embeddings (vectors that represent the image itself rather than a text description of it). This sounds clever but works worse for RAG in practice.</p> <p>Visual embeddings match pixel similarity. They are good for "find images that look like this one." They are poor for "find the login screen" or "find dashboards showing error rates" because those queries are about meaning, not pixels.</p> <p>The better approach – used in production – is to pass the image through a multimodal model like Llama 4 Scout, which generates a detailed text description and extracts visible text via OCR. You then embed that description using the same BGE model as your other documents.</p> <p>The result lives in one unified index, works with your existing query pipeline, and produces better search results than visual embeddings for RAG use cases.</p> <p>Cloudflare Workers AI does not support CLIP anyway. But even if it did, descriptions would outperform it for semantic search.</p> <h3 id="heading-how-a-query-flows-through-the-system">How a Query Flows Through the System</h3> <p>Here is exactly what happens when a user sends the question "What is RAG?" to your finished Worker:</p> <ol> <li><p><strong>Step 1 – Embed the question (20-30ms)</strong>: Your Worker calls Workers AI with the question text. The embedding model returns a 768-dimensional vector representing the meaning of the question.</p> </li> <li><p><strong>Step 2 – Search Vectorize (30-50ms)</strong>: Your Worker passes that vector to Vectorize, which searches your knowledge base and returns the 3 most similar documents with their similarity scores.</p> </li> <li><p><strong>Step 3 – Filter and build context (< 1ms)</strong>: Documents with a similarity score below 0.5 are discarded. The remaining document texts are joined into a context string.</p> </li> <li><p><strong>Step 4 – Generate the answer (500-1500ms)</strong>: Your Worker sends the context and the question to the LLM. The LLM reads the context and generates a grounded answer.</p> </li> <li><p><strong>Step 5 – Return to the user</strong>: The answer and source metadata are returned as JSON.</p> </li> </ol> <p>Total time: typically 600-1600ms end to end. The LLM generation step dominates. Everything else is fast.</p> <h3 id="heading-why-this-works-at-scale">Why This Works at Scale</h3> <p>A common objection to Cloudflare RAG is that it cannot meet sub-200ms retrieval requirements. That objection comes from a specific architectural mistake: trying to run the entire RAG pipeline, including heavy embedding generation and reranking, inside a single synchronous request. That's the wrong architecture.</p> <p>The architecture you're building in this tutorial separates the loading step (which is slow and runs once) from the query step (which is fast and runs on every request). By the time a user asks a question, your documents are already embedded and stored. The query pipeline only needs to embed the question, run one vector search, and call the LLM. Those three steps are fast.</p> <p>My production system (<a href="https://github.com/dannwaneri/vectorize-mcp-worker">vectorize-mcp-worker</a>) runs this architecture and handles real traffic at $5/month. The <a href="https://dev.to/dannwaneri/i-built-a-production-rag-system-for-5month-most-alternatives-cost-100-200-21hj">full performance breakdown is here</a>. Cloudflare RAG works. You just have to build it correctly.</p> <h2 id="heading-how-to-set-up-your-project">How to Set Up Your Project</h2> <p>In this section, you'll scaffold a Cloudflare Worker, create a Vectorize index to store your embeddings, and configure the bindings that connect them together.</p> <h3 id="heading-how-to-create-the-project">How to Create the Project</h3> <p>Open your terminal and create a new directory for the project.</p> <p>On Mac/Linux:</p> <pre><code class="language-bash">mkdir rag-tutorial-simple && cd rag-tutorial-simple </code></pre> <p>On Windows PowerShell:</p> <pre><code class="language-powershell">mkdir rag-tutorial-simple cd rag-tutorial-simple </code></pre> <p>Then run the Cloudflare scaffolding tool:</p> <pre><code class="language-bash">npm create cloudflare@latest </code></pre> <p>Answer the prompts like this:</p> <ul> <li><p><strong>Directory/app name</strong>: <code>rag-tutorial-simple</code></p> </li> <li><p><strong>What would you like to start with?</strong> Hello World example</p> </li> <li><p><strong>TypeScript?</strong> Yes</p> </li> <li><p><strong>Deploy?</strong> No</p> </li> </ul> <p>When it finishes, you'll have a working TypeScript Worker with Wrangler already configured.</p> <h3 id="heading-how-to-create-the-vectorize-index">How to Create the Vectorize Index</h3> <p>Vectorize is Cloudflare's vector database. It lives in the same network as your Worker, which means no external API call and no added latency when you search it.</p> <pre><code class="language-bash">npx wrangler vectorize create rag-tutorial-index --dimensions=768 --metric=cosine </code></pre> <p>Two things to note here.</p> <p><code>--dimensions=768</code> tells Vectorize how many numbers make up each embedding. This must match the output of the embedding model you use. The model you will use (<code>@cf/baai/bge-base-en-v1.5</code>) outputs 768 dimensions. If this number doesn't match, your searches will fail.</p> <p><code>--metric=cosine</code> is how Vectorize measures similarity between vectors. Cosine similarity measures the angle between two vectors rather than the distance between them. For text embeddings, this captures semantic meaning more accurately than other metrics.</p> <h3 id="heading-how-to-configure-wranglertoml">How to Configure wrangler.toml</h3> <p>Open <code>wrangler.toml</code> and replace its contents with the following:</p> <pre><code class="language-toml">name = "rag-tutorial-simple" main = "src/index.ts" compatibility_date = "2026-02-25" [[vectorize]] binding = "VECTORIZE" index_name = "rag-tutorial-index" [ai] binding = "AI" </code></pre> <p>The <code>[[vectorize]]</code> block connects your Worker to the index you just created. The <code>[ai]</code> block gives your Worker access to Workers AI – both for generating embeddings and for running the language model that produces answers.</p> <p>Notice that there are no API keys anywhere. Cloudflare handles authentication internally because everything – your Worker, Vectorize, and Workers AI – runs under the same account.</p> <h3 id="heading-how-to-update-srcindexts">How to Update src/index.ts</h3> <p>Open <code>src/index.ts</code> and replace the generated code with this:</p> <pre><code class="language-typescript">export interface Env { VECTORIZE: VectorizeIndex; AI: Ai; LOAD_SECRET: string; } export default { async fetch(request: Request, env: Env): Promise<Response> { return new Response("RAG tutorial worker is running", { status: 200 }); }, }; </code></pre> <p>The <code>Env</code> interface tells TypeScript what bindings are available inside your Worker. <code>VectorizeIndex</code> and <code>Ai</code> are types provided by Cloudflare's type definitions.</p> <h3 id="heading-how-to-verify-your-setup">How to Verify Your Setup</h3> <p>Start the local development server:</p> <pre><code class="language-bash">npx wrangler dev </code></pre> <p>Open your browser and visit <code>http://localhost:8787</code>. You should see:</p> <pre><code class="language-plaintext">RAG tutorial worker is running </code></pre> <p>You will see two warnings in your terminal. Both are expected.</p> <p>The first warning says that Vectorize doesn't support local mode. This means Vectorize queries won't work during local development unless you run with the <code>--remote</code> flag. You'll do this later when testing the full pipeline.</p> <p>The second warning says the AI binding always accesses remote resources. This means that embedding generation and LLM calls always hit Cloudflare's servers, even in local development. This is fine: usage within the free tier limits costs nothing.</p> <p>Your project structure at this point:</p> <pre><code class="language-plaintext">rag-tutorial-simple/ ├── scripts/ │ └── knowledge-base.ts ├── src/ │ └── index.ts ├── wrangler.toml ├── package.json └── tsconfig.json </code></pre> <h2 id="heading-how-to-build-the-data-pipeline">How to Build the Data Pipeline</h2> <p>The data pipeline is responsible for two things: generating embeddings for each document in your knowledge base, and storing those embeddings in Vectorize. You'll handle both steps inside the Worker itself using a <code>/load</code> endpoint.</p> <p>This approach has a key advantage: you don't need an API token, an Account ID, or any external tooling. Everything uses the bindings you already configured in <code>wrangler.toml</code>.</p> <h3 id="heading-how-to-create-the-knowledge-base">How to Create the Knowledge Base</h3> <p>Create a <code>scripts/</code> folder in your project and add a file called <code>knowledge-base.ts</code>:</p> <pre><code class="language-bash">mkdir scripts </code></pre> <p>Add your documents to <code>scripts/knowledge-base.ts</code>:</p> <pre><code class="language-typescript">export const documents = [ { id: "1", text: "Cloudflare Workers run JavaScript at the edge, in over 300 data centers worldwide. Requests are handled close to the user, reducing latency significantly compared to a single-region server.", metadata: { source: "cloudflare-docs", category: "workers" }, }, { id: "2", text: "Vectorize is Cloudflare's vector database. It stores embeddings and lets you search them by semantic similarity. It runs in the same network as your Worker, so there is no external API call needed.", metadata: { source: "cloudflare-docs", category: "vectorize" }, }, { id: "3", text: "Workers AI lets you run machine learning models directly on Cloudflare's infrastructure. You can generate embeddings and run LLM inference without leaving the Cloudflare network.", metadata: { source: "cloudflare-docs", category: "workers-ai" }, }, { id: "4", text: "RAG stands for Retrieval Augmented Generation. Instead of relying only on what the LLM was trained on, RAG retrieves relevant context from a knowledge base and adds it to the prompt before generating an answer.", metadata: { source: "ai-concepts", category: "rag" }, }, { id: "5", text: "An embedding is a numerical representation of text. Similar pieces of text produce similar embeddings. This is what makes semantic search possible — you search by meaning, not exact keywords.", metadata: { source: "ai-concepts", category: "embeddings" }, }, { id: "6", text: "The BGE model (bge-base-en-v1.5) is available through Workers AI. It generates 768-dimensional embeddings and works well for English semantic search tasks.", metadata: { source: "cloudflare-docs", category: "workers-ai" }, }, { id: "7", text: "Cosine similarity measures the angle between two vectors. For text embeddings, it captures semantic similarity regardless of text length, which makes it more reliable than Euclidean distance.", metadata: { source: "ai-concepts", category: "embeddings" }, }, { id: "8", text: "Cloudflare Workers have a free tier that includes 100,000 requests per day. Vectorize is available on both the Workers Free and Paid plans. The free tier lets you prototype and experiment. The Workers Paid plan starts at $5/month and includes higher usage allocations for production workloads.", metadata: { source: "cloudflare-docs", category: "pricing" }, }, ]; </code></pre> <p>Each document has three fields. The <code>id</code> is a unique string that Vectorize uses to identify the vector. The <code>text</code> is what gets converted into an embedding. The <code>metadata</code> is stored alongside the vector and returned in search results. You'll use it later to display the source of each answer.</p> <h3 id="heading-understanding-embeddings">Understanding Embeddings</h3> <p>Before writing the loading code, it helps to understand what you're actually generating.</p> <p>An embedding is an array of 768 numbers that represents the meaning of a piece of text. The model reads a sentence and outputs those 768 numbers in a way where similar sentences produce similar arrays of numbers.</p> <p>When a user asks a question, you convert that question into an embedding using the same model, then ask Vectorize to find the stored embeddings that are closest to it. The documents those embeddings came from are your most relevant context.</p> <p>This is why the model choice matters: your documents and your queries must be embedded with the same model, or the similarity scores will be meaningless.</p> <h3 id="heading-how-to-build-the-load-endpoint">How to Build the Load Endpoint</h3> <p>Open <code>src/index.ts</code> and update it with a <code>/load</code> route. Here is the complete file at this stage:</p> <pre><code class="language-typescript">import { documents } from "../scripts/knowledge-base"; export interface Env { VECTORIZE: VectorizeIndex; AI: Ai; LOAD_SECRET: string; } export default { async fetch(request: Request, env: Env): Promise<Response> { const url = new URL(request.url); if (url.pathname === "/load" && request.method === "POST") { return handleLoad(env, request); } return new Response("RAG tutorial worker is running", { status: 200 }); }, }; async function handleLoad(env: Env, request: Request): Promise<Response> { const authHeader = request.headers.get("X-Load-Secret"); if (authHeader !== env.LOAD_SECRET) { return Response.json({ error: "Unauthorized" }, { status: 401 }); } const results: { id: string; status: string }[] = []; for (const doc of documents) { const response = await env.AI.run("@cf/baai/bge-base-en-v1.5", { text: [doc.text], }) as { data: number[][] }; await env.VECTORIZE.upsert([ { id: doc.id, values: response.data[0], metadata: { ...doc.metadata, text: doc.text, }, }, ]); results.push({ id: doc.id, status: "loaded" }); } return Response.json({ success: true, loaded: results }); } </code></pre> <p>Notice that <code>env.AI.run()</code> and <code>env.VECTORIZE.upsert()</code> require no credentials. The bindings handle authentication because the Worker runs inside your Cloudflare account. There are no secrets to manage for internal service communication.</p> <p>The <code>text: doc.text</code> field inside <code>metadata</code> is important. Vectorize stores the vector values and whatever metadata you provide, but it doesn't store the original text separately. By including the text in metadata, you can retrieve and display it in search results later.</p> <p>The <code>as { data: number[][] }</code> cast is necessary because the TypeScript type definitions for Workers AI do not yet reflect the exact return shape of every model. The actual response always contains a <code>data</code> array, and the cast tells TypeScript to trust that.</p> <h3 id="heading-how-to-deploy-and-load-your-knowledge-base">How to Deploy and Load Your Knowledge Base</h3> <p>First, set the secret that will protect your load endpoint:</p> <pre><code class="language-bash">npx wrangler secret put LOAD_SECRET </code></pre> <p>Type a strong value when prompted. Then deploy:</p> <pre><code class="language-bash">npx wrangler deploy </code></pre> <p>Trigger the load endpoint. You only need to do this once, or any time you update your knowledge base:</p> <pre><code class="language-bash">curl -X POST https://rag-tutorial-simple.<your-subdomain>.workers.dev/load \ -H "X-Load-Secret: your-secret-value" </code></pre> <p>On Windows PowerShell:</p> <p><strong>Note:</strong> PowerShell uses backtick (<code>`</code>) for line continuation, not backslash.</p> <pre><code class="language-powershell">Invoke-WebRequest ` -Uri "https://rag-tutorial-simple.<your-subdomain>.workers.dev/load" ` -Method POST ` -Headers @{"X-Load-Secret"="your-secret-value"} ` -UseBasicParsing </code></pre> <p>You should see:</p> <pre><code class="language-json">{ "success": true, "loaded": [ { "id": "1", "status": "loaded" }, { "id": "2", "status": "loaded" }, { "id": "3", "status": "loaded" }, { "id": "4", "status": "loaded" }, { "id": "5", "status": "loaded" }, { "id": "6", "status": "loaded" }, { "id": "7", "status": "loaded" }, { "id": "8", "status": "loaded" } ] } </code></pre> <p>Your knowledge base is now stored in Vectorize as vectors. In the next section, you'll build the query pipeline that searches those vectors and generates answers.</p> <h2 id="heading-how-to-build-the-query-pipeline">How to Build the Query Pipeline</h2> <p>The query pipeline is the core of your RAG system. When a user sends a question, the pipeline runs four steps in sequence: embed the question, search Vectorize, build context from the results, and generate an answer with the LLM.</p> <p>Add a <code>/query</code> route to your fetch handler and the complete <code>handleQuery</code> function. Here is the full updated <code>src/index.ts</code>:</p> <pre><code class="language-typescript">import { documents } from "../scripts/knowledge-base"; export interface Env { VECTORIZE: VectorizeIndex; AI: Ai; LOAD_SECRET: string; } export default { async fetch(request: Request, env: Env): Promise<Response> { const url = new URL(request.url); if (url.pathname === "/load" && request.method === "POST") { return handleLoad(env, request); } if (url.pathname === "/query" && request.method === "POST") { return handleQuery(request, env); } return new Response("RAG tutorial worker is running", { status: 200 }); }, }; async function handleLoad(env: Env, request: Request): Promise<Response> { const authHeader = request.headers.get("X-Load-Secret"); if (authHeader !== env.LOAD_SECRET) { return Response.json({ error: "Unauthorized" }, { status: 401 }); } const results: { id: string; status: string }[] = []; for (const doc of documents) { const response = await env.AI.run("@cf/baai/bge-base-en-v1.5", { text: [doc.text], }) as { data: number[][] }; await env.VECTORIZE.upsert([ { id: doc.id, values: response.data[0], metadata: { ...doc.metadata, text: doc.text, }, }, ]); results.push({ id: doc.id, status: "loaded" }); } return Response.json({ success: true, loaded: results }); } async function handleQuery(request: Request, env: Env): Promise<Response> { const body = await request.json() as { question: string }; if (!body.question) { return Response.json({ error: "question is required" }, { status: 400 }); } // Step 1: Embed the question using the same model as your documents const embeddingResponse = await env.AI.run("@cf/baai/bge-base-en-v1.5", { text: [body.question], }) as { data: number[][] }; // Step 2: Search Vectorize for the 3 most similar documents const searchResults = await env.VECTORIZE.query( embeddingResponse.data[0], { topK: 3, returnMetadata: "all", } ); // Step 3: Build context from results above the similarity threshold const context = searchResults.matches .filter((match) => match.score > 0.5) .map((match) => match.metadata?.text as string) .filter(Boolean) .join("\n\n"); if (!context) { return Response.json({ answer: "I could not find relevant information to answer that question.", sources: [], }); } // Step 4: Generate an answer using the retrieved context const aiResponse = await env.AI.run("@cf/meta/llama-3.3-70b-instruct-fp8-fast", { messages: [ { role: "system", content: "You are a helpful assistant. Answer the question using only the context provided. If the context does not contain enough information, say so.", }, { role: "user", content: `Context:\n${context}\n\nQuestion: ${body.question}`, }, ], max_tokens: 256, }) as { response: string }; // Step 5: Return the answer with its sources const sources = searchResults.matches .filter((match) => match.score > 0.5) .map((match) => match.metadata?.source as string) .filter(Boolean); return Response.json({ answer: aiResponse.response, sources: [...new Set(sources)], }); } </code></pre> <p>What each step does:</p> <ol> <li><p><strong>Step 1 – Embed the question</strong>: You convert the user's question into a 768-dimensional vector using the same model you used when loading your documents. This is critical: the question and the documents must be embedded with the same model or the similarity scores will be meaningless.</p> </li> <li><p><strong>Step 2 – Search Vectorize</strong>: You pass the question embedding to Vectorize, which returns the three most similar documents. <code>returnMetadata: "all"</code> tells Vectorize to include the metadata you stored alongside each vector — including the original text.</p> </li> <li><p><strong>Step 3 – Build context</strong>: You filter out any results with a similarity score below 0.5 and join the remaining document texts into a single context string. The 0.5 threshold prevents the LLM from receiving irrelevant documents just because nothing better matched.</p> </li> <li><p><strong>Step 4 – Generate the answer</strong>: You pass the context and the question to the LLM using the chat format with <code>messages</code>. The system prompt explicitly instructs the model to answer using only the provided context. This is what keeps the LLM grounded. Without this instruction, it will ignore your context and answer from its training data instead.</p> </li> <li><p><strong>Step 5 – Return sources</strong>: You include the source metadata in the response so callers know which documents the answer came from. The <code>Set</code> deduplicates sources in case multiple chunks came from the same document.</p> </li> </ol> <h3 id="heading-how-to-test-the-query-pipeline">How to Test the Query Pipeline</h3> <p>Deploy your Worker:</p> <pre><code class="language-bash">npx wrangler deploy </code></pre> <p>Send a question:</p> <pre><code class="language-bash">curl -X POST https://rag-tutorial-simple.<your-subdomain>.workers.dev/query \ -H "Content-Type: application/json" \ -d '{"question": "What is RAG?"}' </code></pre> <p>On Windows PowerShell:</p> <pre><code class="language-powershell">Invoke-WebRequest ` -Uri "https://rag-tutorial-simple.<your-subdomain>.workers.dev/query" ` -Method POST ` -ContentType "application/json" ` -Body '{"question": "What is RAG?"}' ` -UseBasicParsing </code></pre> <p>You should receive a response like this:</p> <pre><code class="language-json">{ "answer": "RAG stands for Retrieval Augmented Generation. It's a method that enhances generation by retrieving relevant context from a knowledge base and adding it to the prompt before generating an answer.", "sources": ["ai-concepts"] } </code></pre> <p>The answer came from your knowledge base, not from the LLM's training data. That's the entire point of RAG: grounded, verifiable answers with traceable sources.</p> <h2 id="heading-how-to-add-error-handling-and-security">How to Add Error Handling and Security</h2> <p>A tutorial that only shows the happy path is not production-ready. In this section, you'll add error handling to every step of the query pipeline and protect the <code>/load</code> endpoint from unauthorized access.</p> <h3 id="heading-how-to-secure-the-load-endpoint">How to Secure the Load Endpoint</h3> <p>The <code>/load</code> endpoint generates embeddings and writes to your Vectorize index. Without protection, anyone who discovers your Worker URL can trigger it repeatedly, consuming your Workers AI quota and overwriting your data.</p> <p>The <code>LOAD_SECRET</code> binding you added to <code>Env</code> and the <code>wrangler secret put</code> command you ran earlier handle this. The check at the top of <code>handleLoad</code> rejects any request that doesn't include the correct secret header:</p> <pre><code class="language-typescript">const authHeader = request.headers.get("X-Load-Secret"); if (authHeader !== env.LOAD_SECRET) { return Response.json({ error: "Unauthorized" }, { status: 401 }); } </code></pre> <p>A request without the header returns <code>{"error":"Unauthorized"}</code> with a 401 status. The secret itself is stored as an encrypted environment variable in your Worker. It never appears in your code or <code>wrangler.toml</code>.</p> <p>To trigger the load endpoint, you must include the secret in the request header:</p> <pre><code class="language-bash">curl -X POST https://rag-tutorial-simple.<your-subdomain>.workers.dev/load \ -H "X-Load-Secret: your-secret-value" </code></pre> <h3 id="heading-how-to-handle-query-errors">How to Handle Query Errors</h3> <p>Replace your <code>handleQuery</code> function with this hardened version:</p> <pre><code class="language-typescript">async function handleQuery(request: Request, env: Env): Promise<Response> { // Guard against malformed request body let body: { question: string }; try { body = await request.json() as { question: string }; } catch { return Response.json({ error: "Invalid JSON in request body" }, { status: 400 }); } if (!body.question || typeof body.question !== "string" || body.question.trim() === "") { return Response.json({ error: "question must be a non-empty string" }, { status: 400 }); } // Sanitize: trim whitespace and cap length const question = body.question.trim().slice(0, 500); // Step 1: Embed the question let embeddingResponse: { data: number[][] }; try { embeddingResponse = await env.AI.run("@cf/baai/bge-base-en-v1.5", { text: [question], }) as { data: number[][] }; } catch (err) { console.error("Embedding generation failed:", err); return Response.json({ error: "Failed to process your question" }, { status: 503 }); } // Step 2: Search Vectorize let searchResults: Awaited<ReturnType<typeof env.VECTORIZE.query>>; try { searchResults = await env.VECTORIZE.query( embeddingResponse.data[0], { topK: 3, returnMetadata: "all" } ); } catch (err) { console.error("Vectorize query failed:", err); return Response.json({ error: "Failed to search knowledge base" }, { status: 503 }); } // Step 3: Build context const context = searchResults.matches .filter((match) => match.score > 0.5) .map((match) => match.metadata?.text as string) .filter(Boolean) .join("\n\n"); if (!context) { return Response.json({ answer: "I could not find relevant information to answer that question. Try rephrasing or asking something else.", sources: [], }); } // Step 4: Generate answer let aiResponse: { response: string }; try { aiResponse = await env.AI.run("@cf/meta/llama-3.3-70b-instruct-fp8-fast", { messages: [ { role: "system", content: "You are a helpful assistant. Answer the question using only the context provided. If the context does not contain enough information, say so.", }, { role: "user", content: `Context:\n${context}\n\nQuestion: ${question}`, }, ], max_tokens: 256, }) as { response: string }; } catch (err) { console.error("LLM generation failed:", err); return Response.json({ error: "Failed to generate an answer" }, { status: 503 }); } // Step 5: Return answer with sources const sources = searchResults.matches .filter((match) => match.score > 0.5) .map((match) => match.metadata?.source as string) .filter(Boolean); return Response.json({ answer: aiResponse.response, sources: [...new Set(sources)], }); } </code></pre> <p>What each error handling decision means:</p> <ul> <li><p><code>try/catch</code> <strong>around</strong> <code>request.json()</code>: <code>request.json()</code> throws if the body is not valid JSON. Without this catch, a malformed request crashes your Worker with an unhandled 500 error. With it, the caller gets a clear 400 explaining what went wrong.</p> </li> <li><p><strong>Input validation before processing</strong>: You check that <code>question</code> exists, is a string, and is not empty before calling any external service. This prevents wasted AI calls on invalid input.</p> </li> <li><p><code>.slice(0, 500)</code> <strong>on the question</strong>: This caps the input length before it reaches the embedding model. Without it, a malicious caller could send a very long string designed to inflate your AI usage or hit Workers CPU limits.</p> </li> <li><p><strong>503 for AI and Vectorize failures</strong>: HTTP 503 means "service temporarily unavailable." It signals to callers that the error is on the server side and the request can be retried.</p> </li> <li><p><code>.filter(Boolean)</code> <strong>on context</strong>: After mapping <code>match.metadata?.text</code>, some results may be <code>undefined</code> if metadata was stored without a <code>text</code> field. This filters them out before joining, preventing <code>"undefined"</code> from appearing in the context string you send to the LLM.</p> </li> </ul> <h3 id="heading-how-to-test-error-handling">How to Test Error Handling</h3> <p>Deploy your updated Worker:</p> <pre><code class="language-bash">npx wrangler deploy </code></pre> <p>Test each error case:</p> <pre><code class="language-bash"># Missing secret on load endpoint — should return 401 curl -X POST https://rag-tutorial-simple.<your-subdomain>.workers.dev/load # Invalid JSON — should return 400 curl -X POST https://rag-tutorial-simple.<your-subdomain>.workers.dev/query \ -H "Content-Type: application/json" \ -d 'not json' # Empty question — should return 400 curl -X POST https://rag-tutorial-simple.<your-subdomain>.workers.dev/query \ -H "Content-Type: application/json" \ -d '{"question": ""}' </code></pre> <h2 id="heading-performance-and-cost-analysis">Performance and Cost Analysis</h2> <p>This section uses real production data from my <a href="https://github.com/dannwaneri/vectorize-mcp-worker">vectorize-mcp-worker</a> deployment. It uses the same architecture you just built, measured from Port Harcourt, Nigeria to Cloudflare's edge.</p> <h3 id="heading-real-performance-numbers">Real Performance Numbers</h3> <p>Here is what the pipeline actually costs in time on every request:</p> <table> <thead> <tr> <th>Operation</th> <th>Time</th> </tr> </thead> <tbody><tr> <td>Embedding generation</td> <td>142ms</td> </tr> <tr> <td>Vector search</td> <td>223ms</td> </tr> <tr> <td>Response formatting</td> <td><5ms</td> </tr> <tr> <td><strong>Total</strong></td> <td><strong>~365ms</strong></td> </tr> </tbody></table> <p>This covers embedding generation and vector search only – the retrieval layer. LLM generation adds 500-1500ms on top, which is why end-to-end response time typically runs 600-1600ms.</p> <p>The embedding step and vector search dominate. Everything else is negligible. For context, a comparable setup using OpenAI embeddings and Pinecone would add two external API roundtrips on top of this, easily pushing total latency past 1 second.</p> <p>These numbers come from a single-region measurement. Your actual latency will vary based on your location and Cloudflare's load at the time of the request. The architectural point holds regardless: co-locating everything on the edge eliminates inter-service network hops, which is where most latency in traditional RAG stacks comes from.</p> <h3 id="heading-real-cost-breakdown">Real Cost Breakdown</h3> <p>For 10,000 searches per day (300,000 per month) with 10,000 stored vectors:</p> <p><strong>This stack:</strong></p> <table> <thead> <tr> <th>Service</th> <th>Monthly Cost</th> </tr> </thead> <tbody><tr> <td>Workers</td> <td>~$3</td> </tr> <tr> <td>Workers AI</td> <td>~$3-5</td> </tr> <tr> <td>Vectorize</td> <td>~$2</td> </tr> <tr> <td><strong>Total</strong></td> <td><strong>$8-10</strong></td> </tr> </tbody></table> <p><strong>Traditional alternatives for the same volume:</strong></p> <table> <thead> <tr> <th>Solution</th> <th>Monthly Cost</th> </tr> </thead> <tbody><tr> <td>Pinecone Standard</td> <td>$50-70</td> </tr> <tr> <td>Weaviate Serverless</td> <td>$25-40</td> </tr> <tr> <td>Self-hosted pgvector</td> <td>$40-60</td> </tr> </tbody></table> <p>That is an 85-95% cost reduction depending on which alternative you compare against. For a bootstrapped startup adding semantic search, that difference is $1,500-2,000 per year.</p> <h3 id="heading-why-the-cost-difference-is-so-large">Why the Cost Difference Is So Large</h3> <p>Traditional RAG stacks have three cost problems that compound each other.</p> <p>The first is idle compute. A dedicated server or container running your embedding service costs money even when no searches are happening. Cloudflare Workers charge only for actual execution time.</p> <p>The second is inter-service data transfer. Every time your application calls an external service for an embedding, then calls a separate service for a search, you're paying for two external API calls with metered pricing. In this stack, both operations happen inside Cloudflare's network at no additional transfer cost.</p> <p>The third is minimum plan pricing. Pinecone's Standard plan costs $50/month as a floor, regardless of how little you use it. Cloudflare's pricing scales from the $5/month Workers Paid plan base.</p> <h3 id="heading-when-the-included-allocation-is-enough">When the Included Allocation Is Enough</h3> <p>For smaller usage levels, you may not pay beyond the $5/month Workers Paid base price:</p> <ul> <li><p>Workers: 10 million requests per month included</p> </li> <li><p>Workers AI: generous daily neuron allocation included</p> </li> <li><p>Vectorize: available on both Free and Paid plans, with a free allocation included</p> </li> </ul> <p>A side project, internal tool, or small business with under 3,000 searches per day will likely stay within the included allocations entirely.</p> <h3 id="heading-the-trade-off-to-know-about">The Trade-off to Know About</h3> <p>This cost advantage comes with one operational constraint worth understanding before you build: Vectorize does not work in local development mode.</p> <p>When you run <code>wrangler dev</code>, your Worker runs locally but Vectorize calls fail. You have to deploy to Cloudflare to test your vector search. For most development workflows this means testing your query logic locally with mocked responses, then deploying to a staging environment for full integration tests.</p> <p>This is a real friction point. It's the honest trade-off for having a managed vector database with no infrastructure to operate.</p> <h2 id="heading-conclusion">Conclusion</h2> <p>In this tutorial, you have built and deployed a production-ready RAG system on Cloudflare's edge network. Let's look at what you actually built and what it costs to run.</p> <h3 id="heading-what-you-built">What You Built</h3> <p>Your completed system has three endpoints:</p> <ul> <li><p><code>GET /</code>: health check confirming the Worker is running</p> </li> <li><p><code>POST /load</code>: loads your knowledge base into Vectorize, protected by a secret header</p> </li> <li><p><code>POST /query</code>: accepts a question, retrieves relevant context, and returns a grounded answer with sources</p> </li> </ul> <p>The full query pipeline runs in four steps on every request:</p> <ol> <li><p>The question is converted to a 768-dimensional embedding using <code>@cf/baai/bge-base-en-v1.5</code></p> </li> <li><p>Vectorize finds the three most semantically similar documents</p> </li> <li><p>Documents above the 0.5 similarity threshold are assembled into context</p> </li> <li><p>Llama 3.3 generates an answer using only that context</p> </li> </ol> <p>Everything runs on Cloudflare's infrastructure. No external API keys. No separate vector database subscription. No servers to manage.</p> <h3 id="heading-what-to-build-next">What to Build Next</h3> <p>This tutorial covered the core RAG pattern. Here are four directions to take it further.</p> <h4 id="heading-add-more-documents">Add more documents</h4> <p>The knowledge base in this tutorial has 8 documents. A real system might have thousands. The loading pattern is identical: add documents to <code>knowledge-base.ts</code>, hit <code>/load</code> with your secret, and Vectorize handles the rest.</p> <p>For very large knowledge bases, update <code>handleLoad</code> to batch documents in groups of 20-50 rather than upserting one at a time.</p> <h4 id="heading-improve-chunking">Improve chunking</h4> <p>Each document in this tutorial is a single short paragraph. Real-world documents like PDFs, articles, documentation pages need to be split into chunks before embedding. Chunk at natural boundaries like paragraphs and sentences, aim for 200-400 tokens per chunk, and include 50-token overlaps between chunks to preserve context across boundaries.</p> <h4 id="heading-add-conversation-history">Add conversation history</h4> <p>The current system treats every query as independent. To support follow-up questions, store previous messages in a Cloudflare KV namespace and include the last 2-3 exchanges in the LLM <code>messages</code> array alongside the retrieved context.</p> <h4 id="heading-stream-the-response">Stream the response</h4> <p>For long answers, users stare at a blank screen until generation completes. Cloudflare Workers support streaming responses via <code>TransformStream</code>. Switching to streaming means the first tokens appear in under 100ms while the rest generates.</p> <h4 id="heading-consider-dimensions-vs-reranking-trade-offs">Consider dimensions vs reranking trade-offs</h4> <p>This tutorial uses <code>bge-base-en-v1.5</code> at 768 dimensions. My production system uses <code>bge-small-en-v1.5</code> at 384 dimensions. Testing showed upgrading from 384 to 768 dims only improved accuracy by about 2%, but doubled cost and latency.</p> <p>Adding a reranker (<code>@cf/baai/bge-reranker-base</code>) gave a larger accuracy improvement than the dimension upgrade for a fraction of the cost. The exact improvement will vary by domain and query distribution — test both on your actual data before deciding. If you're optimizing for production, add a reranker before you increase dimensions.</p> <h3 id="heading-the-complete-project">The Complete Project</h3> <p>Clone and deploy in five commands:</p> <pre><code class="language-bash">git clone https://github.com/dannwaneri/rag-tutorial-simple cd rag-tutorial-simple npm install npx wrangler vectorize create rag-tutorial-index --dimensions=768 --metric=cosine npx wrangler secret put LOAD_SECRET npx wrangler deploy </code></pre> <p>Then load your knowledge base:</p> <pre><code class="language-bash">curl -X POST https://<your-worker>.workers.dev/load \ -H "X-Load-Secret: your-secret" </code></pre> <p>If you found this useful, the production system this tutorial is based on is open source at <a href="https://github.com/dannwaneri/vectorize-mcp-worker">github.com/dannwaneri/vectorize-mcp-worker</a>. It extends this foundation with hybrid search combining vector and BM25, multimodal support for searching images with AI vision, a reranker for more accurate results, and a live dashboard. It runs on the same Cloudflare stack you just built – Workers, Vectorize, Workers AI – plus D1 for document storage.</p> <p>One difference you'll notice: the production system uses <code>bge-small-en-v1.5</code> at 384 dimensions rather than the 768 dimensions in this tutorial. That is an intentional trade-off: the reranker adds more accuracy than the extra dimensions at lower cost. The jump from what you built today to that system is smaller than it looks.</p> </article> </main></body></html>