openai - freeCodeCamp.org

The Codex Handbook: A Practical Guide to OpenAI's Coding Platform

Tatev Aslanyan — Fri, 08 May 2026 23:02:00 +0000

This handbook is written for developers, team leads, and admins who want to understand what Codex is, how to set it up, how to use it well, how it differs from general-purpose models, and how pricing works today.

It's based on current OpenAI Codex documentation and Help Center articles. Pricing and plan availability change frequently, so treat the pricing section as a snapshot of the current docs and verify against the official links before making procurement decisions.

What's new (April 2026): OpenAI released GPT-5.5 and GPT-5.5 Pro on April 23–24, 2026. GPT-5.5 is now the flagship general model and is rolling into Codex surfaces. See the new "GPT-5.5: The Newest Release" subsection in Section 2, the full benchmark deep dive in Section 11, and the updated pricing snapshot in Section 7.

Authors: Tatev Aslanyan, Vahe Aslanyan, Jim Amuto | Version: 1.3 — Last updated April 30, 2026

Executive Summary

Codex is OpenAI's coding agent — not a single model, but a product and workflow layer that wraps OpenAI's frontier models with file access, shell execution, sandboxes, approval flows, and code review.

It runs in four surfaces: the CLI, IDE extensions (VS Code, Cursor, Windsurf), the macOS/Windows app, and Codex Cloud for background tasks against GitHub repositories.

The product is included with most paid ChatGPT plans (Plus, Pro, Business, Enterprise/Edu) and, for now, Free and Go with stricter rate limits.

The model layer beneath Codex shifted in April 2026. GPT-5.5 is the new general flagship, with substantial gains on agentic and long-context benchmarks (MRCR v2 at 1M tokens jumped from 36.6% on GPT-5.4 to 74.0% on GPT-5.5. Terminal-Bench 2.0 reaches 82.7%, and hallucination rate dropped roughly 60% versus prior generations). It's also roughly 2× the per-token cost of GPT-5.4, so picking the right model per task now matters more for budget than it did a quarter ago.

For teams adopting Codex, the highest-leverage choices are:

Start in the CLI or IDE on small bounded tasks before enabling cloud
Use Codex as a pre-merge reviewer in addition to a code generator
Keep admin and user access separated through workspace RBAC, and
Treat token consumption — not prompt count — as the cost driver.

The 30-60-90 day adoption plan in the appendix gives a phased rollout that surfaces friction early.

This handbook covers what Codex is, how to set it up, how to use it well, how it compares to Claude Code, GitHub Copilot, and self-hosted alternatives. We'll also discuss what it costs, how to govern it in an enterprise, and where it does and does not fit. You'll find a glossary, security checklist, and worked cost example in the appendix.

Here's What We'll Cover:

Executive Summary
Prerequisites
Section 1: What Codex Is
Section 2: Where Codex Fits in the OpenAI Ecosystem
Section 3: The Core Surfaces
Section 4: Getting Started: Install, Set Up, and Your First Task
Section 5: How to Use Codex Effectively
Section 6: Difference Between Codex and Other Coding Tools
Comparison Matrix
Section 7: Pricing and Plan Access
Worked Cost Example
Section 8: Security, Permissions, and Enterprise Setup
Section 9: Best Practices for Teams
Section 10: Common Workflows and Examples
Section 11: Model Specs and Benchmarks (GPT-5.5 Deep Dive)
Section 12: Troubleshooting
Section 13: FAQ
Section 14: When NOT to Use Codex
Section 15: Final Recommendations
Section 16: Source References
Appendix A: 30-60-90 Day Adoption Plan
Appendix B: Glossary
Appendix C: Admin Security Checklist
Appendix D: Changelog
Appendix E: Working with Codex in VS Code

Prerequisites

This handbook is hands-on. To get the most out of it — especially Section 4, Section 5, and Section 10 where you'll install Codex and run real tasks — you should have the following in place.

Background Knowledge You Should Already Have

You don't need to be a senior engineer, but the walkthroughs assume:

Comfort using the command line. You can cd into a directory, list files, run git commands, and read shell error messages. If you have never opened a terminal, work through a one-hour shell tutorial first.
Basic Git literacy. You understand commits, branches, pull requests, and the difference between staged and unstaged changes. The Codex workflow centers on producing reviewable diffs, so this is non-negotiable.
Experience reading code in at least one mainstream language. Codex can work in any language, but the demo repo in Section 4 is a small Python service. If you can read Python, JavaScript, Go, or similar, you'll be fine.
A mental model of "what an API call costs." Section 7's worked cost example assumes you understand that LLM usage is metered by tokens. If "tokens" is a brand-new concept, skim the OpenAI tokenizer page once before reading Section 7.

If you're an engineering manager, procurement lead, or admin and you only need Section 7, Section 8, and Section 14, you can skip the technical prerequisites and jump straight to those sections.

Tools and Accounts You Need to Install

Before starting Section 4, have the following ready. Approximate setup time: 15–25 minutes if you're starting from scratch.

Tool / Account	Why you need it	Where to get it
A ChatGPT account on Plus, Pro, Business, or Enterprise/Edu	Codex is included with these plans. Free and Go work for now but with stricter rate limits	chatgpt.com
Node.js 18+ and npm	The Codex CLI is installed via npm (`npm i -g @openai/codex`)	nodejs.org
Git 2.30+	Required to clone the demo repo and produce diffs Codex can review	git-scm.com
A code editor	VS Code is the recommended baseline. Cursor and Windsurf also work	code.visualstudio.com
A GitHub account	Required only for Codex Cloud tasks (Section 8 and Appendix E)	github.com
WSL2 (Windows users only)	The Codex CLI is experimental on native Windows; WSL is the supported path	Microsoft WSL docs

Verify Your Environment

Run these three commands before you start Section 4. If any of them fails, fix it first.

node --version   # should print v18.x or higher
npm --version    # should print 9.x or higher
git --version    # should print 2.30 or higher

What This Handbook Will Not Teach You

To set expectations honestly, this handbook does not cover:

How to write production-grade Python, JavaScript, or any specific language. We use small examples to demonstrate Codex behavior, not teach syntax.
How to design a system architecture from scratch. Section 14 explains why Codex is a poor fit for novel architecture decisions.
How to administer GitHub at the organization level. Section 8 covers the Codex-specific GitHub Connector setup, but assumes your GitHub org already exists.
LLM internals (attention, RLHF, and so on). We treat the model as a black box with measurable behavior.

Section 1: What Codex Is

Codex is OpenAI's coding agent. The most important thing to understand is that Codex is not just a single model name. It's a product and workflow layer designed to help people write, review, debug, and ship code faster. In OpenAI's own wording, it's an AI coding agent that can work with you locally or complete tasks in the cloud.

That distinction matters. Most people think of AI in one of two ways:

A chat model that answers questions.
A coding assistant that suggests snippets.

Codex is broader than both. It can inspect a repository, edit files, run commands, and execute tests. It can also handle larger chunks of work by taking a prompt or spec and turning it into a task plan, code changes, and reviewable output.

For teams, the cloud-based workflow is especially important because it lets Codex run in the background while engineers stay in flow.

OpenAI's current docs also place Codex alongside a wider set of developer tools: the API, the Responses API, the Agents SDK, MCP tools, and the Codex app. If you are onboarding a team, the easiest mental model is this:

The models are the engine.
Codex is the coding product that uses those engines.
The CLI, IDE extension, web app, and cloud tasks are the ways you interact with it.

Section 2: Where Codex Fits in the OpenAI Ecosystem

OpenAI now offers a layered stack:

General-purpose frontier models such as GPT-5.5, GPT-5.5 Pro, GPT-5.4, GPT-5.4-mini, and GPT-5.4-nano.
Codex-specific models such as GPT-5.3-Codex, GPT-5.2-Codex, GPT-5.1-Codex, and codex-mini-latest.
Product surfaces that package those models into workflows, such as Codex CLI, the Codex app, IDE extensions, cloud tasks, and code review.

The practical difference is simple:

If you need one-off reasoning, synthesis, or general chat, you may use a general model.
If you need an agent that should navigate a repository, change files, run tests, and push toward a concrete code outcome, Codex is the purpose-built surface.

OpenAI's current model docs describe GPT-5.4 as the flagship model for complex reasoning and coding. At the same time, Codex-specific model pages describe GPT-5.3-Codex and GPT-5.2-Codex as optimized for agentic coding tasks in Codex or similar environments. That tells you how OpenAI is positioning the stack:

GPT-5.4 is the general flagship.
Codex-specific models are tuned for coding workflows.
Codex the product can switch models depending on the surface and configuration.

If you remember nothing else from this section, remember this: Codex is the workflow. Models are the engine.

GPT-5.5: The Newest Release

OpenAI launched GPT-5.5 on April 23, 2026, with API availability following on April 24, 2026. A higher-tier GPT-5.5 Pro variant shipped alongside it. OpenAI describes GPT-5.5 as their "smartest and most intuitive to use model yet, and the next step toward a new way of getting work done on a computer."

For a Codex user, the practical upshot is short:

GPT-5.5 is the new general flagship. Anywhere older docs say "GPT-5.4 is the flagship," read GPT-5.5 going forward. GPT-5.4 remains available as a cheaper default.
Codex surfaces will switch over. Expect GPT-5.5 to become selectable (and often the default) inside the CLI, IDE, app, and cloud tasks shortly after launch. Verify the active model in your settings.
Pricing has shifted. GPT-5.5 sits well above GPT-5.4 on a per-token basis. See Section 7 before approving budgets.

The full benchmark breakdown, performance highlights, and per-workload guidance for picking GPT-5.5 vs GPT-5.4 vs Codex-specific models are in Section 11: Model Specs and Benchmarks. Read that section once you have the foundational chapters under your belt.

Section 3: The Core Surfaces

Codex currently shows up in a few places, and each one is optimized for a slightly different working style.

Codex CLI

The CLI is the fastest way to put Codex directly into a terminal session. The docs describe it as OpenAI's coding agent that runs locally from your terminal, can read, change, and run code on your machine, and is open source and written in Rust.

Use the CLI when you want:

A terminal-first workflow.
Fast iteration inside an existing repo.
Fine-grained control over approvals and execution.
A lightweight path for local coding tasks.

IDE Extension

The CLI docs and Help Center articles point to the IDE extension for VS Code, Cursor, Windsurf, and other VS Code forks. This is the natural fit when your team lives in an editor and wants Codex embedded in the normal coding flow.

Use the IDE extension when you want:

Codex close to the files you are already editing.
Prompting and editing without switching contexts.
A bridge between human-driven and agent-driven editing.

Codex App

OpenAI's Help Center says the Codex app is available on macOS and Windows. It is designed for parallel work across projects, with built-in worktree support, skills, automations, and git functionality.

Use the app when you want:

Multiple Codex agents running in parallel.
Cloud tasks without bouncing between terminal and editor.
A project-centric place to assign and monitor tasks.

Codex Cloud

Codex cloud is the background execution mode. It runs each task in an isolated sandbox with the repository and environment, and it is intended for reviewable code output rather than direct interactive sessions.

Use Codex cloud when you want:

Tasks to run while you do something else.
Sandboxed execution with reviewable diffs.
Automated code review or repository-level workflows.

Code Review

Codex can also review code inside GitHub. OpenAI describes this as a way to automatically review your personal pull requests or configure reviews at the team level.

Use code review when you want:

A second set of eyes on pull requests.
Automated regression or issue spotting before human review.
Lightweight review coverage across a team.

Section 4: Getting Started: Install, Set Up, and Your First Task

This section walks you end-to-end from "nothing installed" to "Codex just fixed a real bug for me."

We will use a tiny demo repository you build yourself in two minutes — a small Python price-calculator with one obvious bug and one missing test. That gives you a real, reproducible target you can throw away when you're done.

The same walkthrough works for the CLI, the IDE extension, and the app, with notes for each.

If you have existing code you would rather use, skip ahead to Step 4 and point Codex at your own repo. The demo is for readers who want a known-good starting point.

Step 0: Confirm Access

Codex is included with ChatGPT Plus, Pro, Business, and Enterprise/Edu plans. For a limited time, it is also included with Free and Go, with stricter rate limits.

If you are in a team or enterprise workspace, access may also depend on workspace settings and role-based controls. Do not assume that a ChatGPT subscription alone guarantees access in a managed environment — confirm with your admin or look in Codex Cloud settings at chatgpt.com/codex.

Step 1: Install Codex

You have three install paths. Pick one to start; you can add the others later.

Option A: The CLI (recommended for first task)

The CLI is the most direct way to see how Codex behaves. The official docs note that macOS and Linux are first-class, while Windows is experimental and you should use WSL2.

npm i -g @openai/codex
codex --version

If codex --version prints a version number, you are done.

Option B: The VS Code Extension

In VS Code (or Cursor / Windsurf), open the Extensions panel, search for "Codex" by openai, and install it. Or from a terminal:

code --install-extension openai.chatgpt

The Codex panel will appear in the right sidebar after install.

Option C: The Codex App

Download the Codex app for macOS or Windows from chatgpt.com/codex. The app shines when you want parallel tasks, built-in git worktrees, and a project-centric UI. For your very first task it is overkill — start with the CLI or extension.

VS Code users: For a step-by-step guide covering all three VS Code entry points (extension, CLI in the integrated terminal, and browser Codex), see Appendix E: Working with Codex in VS Code.

Step 2: Authenticate

Run codex in a terminal (or open the extension panel). You will be prompted to:

Sign in with ChatGPT — recommended. Usage is charged against your plan's included Codex credits.
Sign in with an API key — used when you want metered API billing or your workspace policy requires it.

If you are unsure, pick ChatGPT sign-in.

Step 3: Build the Demo Repo

This is the part most quick-starts skip. Instead of pointing Codex at "any repo," let's create a small, self-contained demo repo with a known bug so you can verify Codex actually fixes it.

In a terminal, run:

mkdir codex-demo && cd codex-demo
git init

Now create three files. First, pricing.py — a small pricing calculator with one off-by-one bug and one missing edge case:

# pricing.py
def apply_discount(price: float, discount_percent: float) -> float:
    """Apply a percentage discount to a price.

    BUG: The discount is applied as a multiplier of (discount_percent / 10)
    instead of (discount_percent / 100). A 20% discount currently doubles
    the price instead of reducing it.
    """
    if discount_percent < 0:
        raise ValueError("discount_percent must be >= 0")
    return price * (1 - discount_percent / 10)


def cart_total(items: list[dict], discount_percent: float = 0) -> float:
    """Compute the total for a list of cart items after a discount."""
    subtotal = sum(item["price"] * item["quantity"] for item in items)
    return apply_discount(subtotal, discount_percent)

Then test_pricing.py — a single passing test plus one that will fail because of the bug:

# test_pricing.py
from pricing import apply_discount, cart_total


def test_no_discount_returns_original_price():
    assert apply_discount(100.0, 0) == 100.0


def test_twenty_percent_discount_on_100_is_80():
    # This will FAIL until the bug in apply_discount is fixed.
    assert apply_discount(100.0, 20) == 80.0


def test_cart_total_with_discount():
    items = [
        {"price": 10.0, "quantity": 2},
        {"price": 5.0, "quantity": 1},
    ]
    # Subtotal is 25.0. With 10% off, expected total is 22.5.
    assert cart_total(items, discount_percent=10) == 22.5

And a tiny README.md:

# codex-demo

A tiny pricing module used to learn the Codex workflow.

Run tests with: `python -m pytest`

Commit the starting state so Codex's diffs are easy to review:

git add .
git commit -m "Initial demo: pricing module with a known bug"

Confirm the bug is real before you ask Codex to fix it:

python -m pytest

You should see two failing tests (test_twenty_percent_discount_on_100_is_80 and test_cart_total_with_discount).

If pytest is not installed: pip install pytest. The full demo needs only Python 3.10+ and pytest.

Step 4: Launch Codex and Run Your First Task

Now point Codex at the demo repo.

From the CLI:

cd codex-demo
codex

When Codex starts, give it a clear, bounded task. Type this prompt exactly:

The test suite has two failing tests. Read pricing.py and test_pricing.py,
identify the root cause, fix the smallest possible thing, then run the tests
to confirm they pass. Explain what you changed and why.

Codex will:

Inspect pricing.py and test_pricing.py.
Recognize the off-by-one bug (/ 10 should be / 100).
Propose a one-line diff.
Ask for approval before modifying the file (in the default approval mode).
After you approve, run python -m pytest and report that all three tests now pass.

From the VS Code extension: Open the codex-demo folder in VS Code, open the Codex panel in the right sidebar, and paste the same prompt. The diff will appear inline in the editor for you to review and accept.

Step 5: Review the Diff

This is the most important habit to build early. Even though the fix is one character (10 → 100), look at the diff before accepting:

git diff

Read the change. Confirm it matches what Codex described. Run the tests yourself:

python -m pytest

All three should pass. Commit the fix:

git commit -am "Fix off-by-one in apply_discount"

You have just completed the full Codex loop: context → task → change → review → verify. Every bigger task is a longer version of this loop.

Step 6: Try Two More Bounded Tasks

Now that the loop works, try these against the same demo repo:

Add an edge case test. Prompt: "Add a test that verifies apply_discount raises a ValueError when discount_percent is negative. Run the tests after."
Add a missing safety check. Prompt: "apply_discount does not currently reject discount_percent values greater than 100, which would produce a negative price. Add validation, update the existing tests if needed, and add a new test for the new behavior."

Each task is small, has a clear acceptance criterion (the tests pass), and produces a reviewable diff. That is the shape of every good Codex task.

Step 7 (Optional): Set Up Codex Cloud

Cloud tasks let Codex run in the background while you do other work. They require a GitHub-hosted repository.

To enable Codex Cloud against the demo repo:

Push codex-demo to a private GitHub repo: gh repo create codex-demo --private --source=. --push (requires the gh CLI).
Visit chatgpt.com/codex and connect the ChatGPT GitHub Connector.
Allow the codex-demo repository in the connector. Do not grant org-wide access by default — see Appendix C.
From the web interface, pick the repo and prompt: "Add type hints to every function in pricing.py and add a CI-style summary of what changed."
Wait for the sandbox to finish, review the diff in the browser, and either accept it or open a PR.

By default, Codex Cloud sandboxes have no internet access. That is deliberate — admins can allowlist dependency registries and trusted sites if a real workflow needs them.

When to Use Which Surface

After completing the demo, the surface trade-offs become concrete:

CLI — fastest for terminal-heavy local work, scriptable, best for multi-step agentic tasks with explicit approvals.
VS Code extension — lowest friction for in-flow editing while you are already in the editor.
Codex app — best when you want to run multiple parallel tasks across projects with worktree isolation.
Codex Cloud — best for background work, long-running tasks, and PR-style review you can leave running.

Most experienced users have all of them installed and pick per task. A single workflow rarely fits every kind of work.

What If Something Doesn't Work?

If you get stuck during this walkthrough:

codex command not found → npm's global bin is not on your PATH. Restart your terminal, or use a Node version manager like nvm.
Sign-in keeps failing → confirm the email matches your ChatGPT plan; in enterprise workspaces, your admin must enable Codex.
Codex won't modify the file → you may be in a strict approval mode. Approve when prompted, or relax the mode after your first successful task.
Windows misbehavior → switch to a WSL2 terminal. Native Windows for the CLI is experimental.

The full troubleshooting guide is in Section 12.

Section 5: How to Use Codex Effectively

Codex works best when you treat it like a developer you're onboarding rather than a magic prompt responder. The more concrete your task, the better the result.

Each tip below has a bad example (what people actually type) and a good example (what produces a useful result). Most use the codex-demo repo from Section 4 so you can run them yourself.

Give It a Real Objective

A "real objective" means a concrete goal with a verifiable outcome — not a feeling.

Bad:

Improve this codebase.

Codex will pick something to do, but you have no way to know if the result is what you wanted, and the diff will probably touch more than you can review.

Good:

Refactor cart_total in pricing.py so the iteration logic and the discount
application are in two separate helper functions. Keep the public signature
of cart_total unchanged. Add tests for each helper. Run pytest at the end.

This works because there is exactly one acceptance criterion (tests pass with the new structure) and exactly one boundary (public signature unchanged). You can review the diff in 30 seconds.

Other shapes that work:

"Fix the failing test in test_pricing.py::test_twenty_percent_discount_on_100_is_80."
"Add a currency: str = 'USD' parameter to cart_total and update the tests."
"Review the changes in my last commit for missing edge cases."

Provide the Right Context

Codex can inspect the repo, but you still need to steer it to the right files and constraints. Without that, it wanders.

Bad:

Add validation to the pricing module.

What kind of validation? On which inputs? What error class? Codex has to guess all of that.

Good:

Context:
- File: pricing.py
- Function: apply_discount
- Current behavior: raises ValueError for negative discount_percent.
- Desired behavior: also raise ValueError when discount_percent > 100,
  with the message "discount_percent must be between 0 and 100".

Task:
- Add the validation.
- Add a matching test in test_pricing.py.
- Do not change apply_discount's public signature.
- Run pytest after.

Notice the structure: what file, current behavior, desired behavior, task, constraints, how to verify. That is the difference between a hopeful prompt and a usable spec.

For larger tasks, also include:

A link to the issue or spec (Codex can fetch it if web access is enabled).
The names of related files even if Codex could find them itself — naming them halves the time-to-first-edit.
The name of any test command, build command, or lint that should pass.

Ask for Intermediate Thinking When Needed

"Intermediate thinking" means asking Codex to plan in writing before it edits files. The default is for Codex to dive straight to code. For anything larger than a single function, that is the wrong default.

Without intermediate thinking (the alternative):

Refactor pricing.py to support multiple currencies.

Codex starts editing immediately. You discover after the fact that it changed the database schema, the API contract, and three test files — and you have no idea whether the design choice it made was the right one.

With intermediate thinking:

I want to add multi-currency support to pricing.py.

Before editing anything:
1. List the files you expect to touch and why.
2. Outline the approach in 5-10 bullets.
3. Call out any assumptions you are making and any open questions.
4. Identify the riskiest part of the change.

Wait for my approval before making any edits.

Now you get a plan you can review, push back on, or scrap entirely — at zero cost to the codebase. After you approve, Codex executes against the plan it just wrote, which makes the resulting diff predictable.

Use intermediate thinking whenever the task is:

Multi-file or cross-cutting.
Architecturally novel for this codebase.
Hard to test (so the diff is your only signal).
High blast-radius if wrong (auth, payments, data migrations).

Prefer Bounded Changes

A bounded change is one with all four of these properties:

Small surface area — touches one file, one module, or one logical concept.
Clear acceptance criterion — there's a specific test, output, or behavior that proves it worked.
Reviewable in a few minutes — a human can read the diff and form an opinion without setting aside an hour.
Easily revertible — if it goes wrong, git revert undoes it cleanly without breaking anything else.

The opposite is an unbounded change: "make the codebase faster," "modernize the API," "add types everywhere." These have no clear endpoint, no easy verification, and no clean revert path.

Bounded examples (good):

"Add a serialize() method to CartItem that returns a dict suitable for JSON encoding. Add a test."
"In apply_discount, replace the magic number 100 with a module-level constant MAX_DISCOUNT_PERCENT."
"The cart_total function takes a discount_percent keyword argument that defaults to 0. Make the default None and treat None as 'no discount.' Update the tests."

Unbounded examples (avoid):

"Make pricing.py production-ready."
"Add proper error handling everywhere."
"Improve the architecture."

When you catch yourself writing an unbounded prompt, break it into a list of bounded ones before sending. The decomposition itself is most of the work; once you have it, Codex is good at executing each piece.

Use Reviews as a Loop

Codex is not just for writing code — it is also a useful pre-merge reviewer. The loop is:

You (or Codex) write the change.
Ask Codex to review it.
Fix the issues it finds.
Re-run tests.

What this looks like in practice:

After completing a task in codex-demo, ask Codex to review your own commit:

Review the change in my last commit (git show HEAD) for:
- correctness issues (off-by-one, type mismatches, wrong defaults)
- missing tests, especially edge cases
- security concerns (input validation, injection, unsafe defaults)
- maintainability risks (unclear naming, hidden coupling)

Prioritize findings by severity (critical / important / nit). For each
finding, point to the exact line and propose a concrete fix. Do not
modify any files in this turn — just produce the review.

You will typically get back a structured response like:

CRITICAL: line 14 — apply_discount accepts NaN silently because the type
  check is `discount_percent < 0`, which is False for NaN. Fix: add an
  explicit math.isnan() check before the comparison.

IMPORTANT: test_pricing.py has no test for the boundary discount_percent=100.
  Fix: add a test asserting apply_discount(100, 100) == 0.

NIT: line 8 — the docstring mentions a "BUG" comment that should be removed
  now that the bug is fixed.

Then you triage: fix the critical and important findings (often by feeding them back to Codex with "apply the fixes you proposed"), defer or reject the nits, and re-run tests.

This converts Codex from a code generator into a quality gate, which is usually the higher-leverage use. A team that uses Codex only as a generator gets faster code; a team that also uses it as a reviewer gets better code.

Section 6: Difference Between Codex and Other Coding Tools

This is the section that usually matters most to new users, because the category boundaries are easy to blur.

Codex Is A Product Layer, Not Just A Model

Codex is the product experience and workflow layer. Models are the underlying engines. Put differently:

A general model answers questions or writes text.
A coding model is tuned more narrowly for software tasks.
Codex packages the model inside an agentic coding workflow with files, commands, approvals, sandboxes, and reviews.

That matters because users often compare Codex to "another model" when the real comparison is "another coding system."

Codex vs OpenAI General Models

OpenAI's current models page recommends GPT-5.4 as the flagship model for complex reasoning and coding. That is the general model-side recommendation.

Codex-specific pages, on the other hand, describe models like GPT-5.3-Codex and GPT-5.2-Codex as optimized for agentic coding tasks in Codex or similar environments.

The practical takeaway:

Use GPT-5.4 when you want a top-tier general model.
Use Codex-specific models when you want a model optimized for coding workflows inside Codex.
Use the Codex surface when you want file edits, shell commands, reviews, and sandboxes, not just text output.

Codex vs Claude Code

Claude Code is also a terminal-based agentic coding tool. Anthropic's docs describe it as a terminal tool that can make plans, edit files, run commands, create commits, and work with MCP-connected data sources. It is strong if your team already prefers a terminal-first workflow and wants a tightly scriptable developer tool.

Codex differs in a few practical ways:

Codex spans more surfaces, including CLI, IDE extension, app, cloud tasks, and code review.
Codex cloud is built around GitHub-connected task execution and review.
Codex is more explicitly positioned as a family of coding workflows, not just a single terminal agent.

The practical takeaway:

Choose Claude Code if you want a terminal-native workflow with strong composability and you are happy living mostly in the shell.
Choose Codex if you want a broader product layer with local, cloud, and app-based workflows that can be shared across a team.

Codex vs GitHub Copilot Coding Agent

GitHub Copilot coding agent is designed around GitHub's own workflow. GitHub docs describe it as an agent you can assign issues or pull requests to, and it works in the background to create or modify PRs. It lives very naturally inside GitHub-hosted development flows.

Codex is different in emphasis:

Copilot coding agent is highly GitHub-centric.
Codex is broader across terminal, IDE, app, and cloud.
Copilot is a strong fit if your team already uses GitHub as the center of gravity for task assignment and review.
Codex is a stronger fit if you want a more general coding agent surface that can work across local and cloud workflows.

The practical takeaway:

Choose Copilot coding agent if your process is already deeply anchored in GitHub issues and pull requests.
Choose Codex if you want a wider agent workflow that can run locally, in the IDE, or in Codex cloud.

Codex vs Open-Weight and Self-Hosted Models

Open-weight or self-hosted models serve a different need. Teams usually reach for them when they want:

Full infrastructure control.
Custom hosting or air-gapped deployment.
More direct control over retention and data boundaries.
A lower-cost path at high scale if they already own the hardware and ops stack.

The tradeoff is that self-hosted models usually do not give you the same out-of-the-box agentic product experience that Codex does. You have to assemble the orchestration, repo access, sandboxing, approvals, and review loop yourself.

That means the real choice is not "Which model is smartest?" It is "How much engineering do I want to spend on the workflow around the model?"

The practical takeaway:

Choose open-weight or self-hosted models when infrastructure control is the main requirement and you are willing to build the surrounding agent system.
Choose Codex when you want the workflow already packaged, especially for day-to-day engineering teams.

Codex vs General Chat Models

General chat models are best when the task is:

A question and answer exchange.
Conceptual reasoning.
Drafting prose.
Summarizing or rewriting text.

Codex is better when the task is:

Reading and modifying a repository.
Running tests.
Fixing code.
Reviewing pull requests.
Coordinating multi-step implementation work.

Codex vs API Usage of the Same Models

The same model family can behave differently depending on the surface.

In the API, you may call a model directly and design your own orchestration.
In Codex, the same or similar model may be wrapped in repo access, approval flows, and task execution.

That is why some model pages mention that a model is optimized for "Codex or similar environments." The model is tuned for agentic software work, but the workflow surface still matters.

Comparison Matrix

The prose comparisons above collapse into a single matrix for fast reference:

Dimension	Codex	Claude Code	GitHub Copilot Coding Agent	Self-hosted / Open-weight
Primary surface	CLI, IDE, app, cloud	CLI (terminal-first)	GitHub web/PR/issues	Whatever you build
Background execution	Yes (Codex Cloud sandboxes)	Limited; runs locally	Yes (GitHub Actions runners)	DIY
Repository integration	GitHub via connector; local repos directly	Local; MCP-connected sources	Native GitHub	DIY
Model choice	OpenAI models, switchable per surface	Anthropic Claude models	GitHub-managed (mix of vendors)	Any model you can host
Approval and sandbox controls	Yes, per-surface	Yes, per-tool	GitHub permission model	DIY
Parallel agents	Yes (app + cloud)	Limited	Yes (per-PR)	DIY
Best fit	Cross-surface team workflows	Terminal-native power users	Teams already living in GitHub	Air-gapped, custom infra, or cost-sensitive at scale
Main tradeoff	OpenAI ecosystem lock-in; price tier	Less product surface area	Heavily GitHub-coupled	Significant engineering effort

Use the matrix to pick the dominant tool, then layer the others where they fit. Many teams legitimately run two of these in parallel — for example, Codex for cross-surface work and Claude Code for power-user terminal workflows.

Which Tool Should A New User Choose?

As a rule of thumb:

For terminal-first coding and scripting, Claude Code is a strong alternative.
For GitHub-native issue and PR automation, GitHub Copilot coding agent fits naturally.
For local plus cloud plus app-based team workflows, Codex is the most flexible option.
For maximum infrastructure control, self-hosted or open-weight stacks make sense.

OpenAI's docs currently list GPT-5.5 as the general flagship, with GPT-5.4, GPT-5.4-mini, and GPT-5.4-nano remaining available below it, while Codex docs and model pages expose Codex-specific variants and model switching inside the CLI.

Section 7: Pricing and Plan Access

Pricing is the part of Codex most likely to change, so this section should be treated as a snapshot of the current official docs.

Plan Access

OpenAI's current Help Center says Codex is included with:

ChatGPT Plus
ChatGPT Pro
ChatGPT Business
ChatGPT Enterprise/Edu

For a limited time, it is also included with Free and Go, though those plans are temporary exceptions and subject to rate limits.

Flexible Pricing and Credits

The current rate card says Codex pricing changed on April 2, 2026 to align with API token usage instead of purely per-message pricing. The same article explains that:

New and existing Plus and Pro customers use the token-based rate card.
New and existing Business customers use the token-based rate card.
New Enterprise customers use the token-based rate card.
Existing Enterprise/Edu and several other legacy plan categories remain on the legacy rate card until migration.

This is important because two teams in the same company can be on different pricing logic depending on workspace status and plan vintage.

Current Model Pricing Snapshot

The current model pages list pricing per 1M tokens in USD. The exact numbers depend on the model you choose:

GPT-5.5: $5 input, $30 output. New flagship as of April 23, 2026.
GPT-5.5 Pro: $30 input, $180 output. Higher-tier variant for the most demanding agentic and reasoning workloads.
GPT-5.4: $2.50 input, $15 output.
GPT-5.4-mini: $0.75 input, $4.50 output.
GPT-5.4-nano: $0.20 input, $1.25 output.
GPT-5-Codex: $1.25 input, $10 output.
GPT-5.2-Codex: $1.75 input, $14 output.
GPT-5.1-Codex-mini: $0.25 input, $2 output.
codex-mini-latest: $1.50 input, $6 output.

These model pages also note context windows, output limits, and whether the model is intended for Codex-specific or general API use. For budget planning, remember that longer outputs can cost much more than the input prompt, so task framing matters as much as model choice.

Note that GPT-5.5 is roughly 2x the input price and 2x the output price of GPT-5.4, and GPT-5.5 Pro is an order of magnitude above that. OpenAI's framing is that GPT-5.5 is also more token-efficient than GPT-5.4, which can offset some of the headline price difference, but you should measure this on your own workloads before assuming it nets out. For the Codex-specific models, expect the lineup to shift as Codex variants based on GPT-5.5 ship; until then, the Codex-specific models above remain the right choice for purely coding-shaped tasks.

What This Means in Practice

The real cost depends on:

Input size.
Cached input.
Output length.
Whether the task uses fast mode.
Which model you select.

So if you are planning a team rollout, do not estimate usage from "number of prompts" alone. Estimate based on expected token consumption and task type.

Legacy Pricing

The legacy rate card still matters for users and workspaces that have not been migrated. The big lesson is that pricing is now tied more closely to model usage than to a simple fixed message count. Anyone budgeting Codex should read the current rate card before setting internal chargeback rules or usage policies.

Worked Cost Example

Pricing tables are easy to misread. A worked example makes the model selection question concrete.

Scenario: A 30-engineer team uses Codex Cloud for automated pull request review. Each engineer opens roughly 4 PRs per week. Each PR review pulls in approximately 30,000 input tokens (the diff plus relevant context files) and produces approximately 3,000 output tokens (the review comments and risk summary).

Weekly token volume:

Reviews per week: 30 engineers × 4 PRs = 120 reviews
Input tokens per week: 120 × 30,000 = 3.6M input tokens
Output tokens per week: 120 × 3,000 = 360K output tokens

Cost per week by model:

Model	Input cost	Output cost	Weekly total	Annualized (52 wk)
GPT-5.5 ($5 / $30)	3.6M × $5/1M = $18.00	0.36M × $30/1M = $10.80	$28.80	$1,498
GPT-5.5 Pro ($30 / $180)	$108.00	$64.80	$172.80	$8,986
GPT-5.4 ($2.50 / $15)	$9.00	$5.40	$14.40	$749
GPT-5-Codex ($1.25 / $10)	$4.50	$3.60	$8.10	$421
GPT-5.1-Codex-mini ($0.25 / $2)	$0.90	$0.72	$1.62	$84

Reading the table: The headline GPT-5.5 sticker shock disappears at this volume — under $1,500/year for 30 engineers' worth of automated review is a rounding error against engineering payroll. GPT-5.5 Pro is 6× more expensive and generally not justified for routine review; reserve it for the small share of reviews where you need its extra capability. The Codex-specific models are dramatically cheaper and are the right default if your reviews are mostly mechanical (style, obvious bugs, missing tests).

What this example does not capture:

Cached input. OpenAI prices repeated input tokens lower; if your review pulls the same context files repeatedly, real costs are lower than shown.
Long-task overhead. Agentic workflows that re-read files or iterate burn many more tokens than a single-shot review. A coding task can easily be 5–10× the tokens of a review.
Failure retries. A failed task that gets re-run costs roughly the same as the original. Agent flakiness is a real budget line item.
Mixed-model strategies. Most mature teams route cheap tasks (test stubs, doc updates) to a Codex-mini model and reserve GPT-5.5 for repository-wide refactors and PRs that need long-context reasoning.

The practical pattern: build the cost model around your actual highest-volume workload (usually PR review or test generation), then size the GPT-5.5 budget separately for the smaller set of tasks that actually benefit from the new capabilities.

Section 8: Security, Permissions, and Enterprise Setup

Teams care about Codex not just as a productivity tool, but as a controlled software-development system. OpenAI's docs reflect that reality.

Local vs Cloud Access

Enterprise admins can separately enable:

Codex Local
Codex Cloud
Both

Codex Local covers the app, CLI, and IDE extension. Codex Cloud covers hosted tasks, code review, and related integrations.

That separation is useful because some organizations want local tooling enabled broadly while keeping cloud tasks restricted to fewer users.

Workspace Controls

The admin docs say workspace owners can use RBAC to manage access. They can:

Set a default role.
Create custom roles.
Assign roles to groups.
Sync groups with SCIM.
Manage permissions centrally.

This is the right place to build a rollout with least privilege rather than giving every developer broad Codex access by default.

GitHub Connector and Repository Access

Codex Cloud requires GitHub-hosted repositories. Admins connect the ChatGPT GitHub Connector, choose an installation target, and allow specific repositories. Codex uses short-lived, least-privilege GitHub App tokens and respects repository permissions and branch protection rules.

For security teams, that matters because it keeps Codex aligned with the repo access model you already use.

Internet Access

By default, Codex cloud agents do not have internet access at runtime. That is deliberate. If your task truly needs access to dependency registries or trusted sites, admins can configure allowlists and HTTP method limits.

Recommended Governance Pattern

The enterprise docs recommend using separate groups for users and admins:

A smaller Codex Admin group for people who manage policy and governance.
A broader Codex Users group for developers who just need to use the tool.

That keeps policy management tight and avoids accidental over-permissioning.

Section 9: Best Practices for Teams

If you are onboarding a team, you will get much better outcomes if you set expectations up front.

Start With Simple, Valuable Tasks

Good first-team use cases:

Pull request review.
Small bug fixes.
Test generation.
Documentation updates.
Codebase navigation and understanding.

These are easy to compare against human work and easy to judge for quality.

Standardize Task Prompts

Give people a shared prompt template. For example:

Task: Fix the failing test in X.
Context: The regression started after Y.
Constraints: Do not change public API behavior.
Output: Explain root cause, apply fix, run tests, summarize risks.

This makes results easier to review and reduces the "prompt quality lottery" that often hurts team adoption.

Use a Review Culture

Codex should not replace code review discipline. Treat it as:

A first-pass implementer.
A pre-review reviewer.
A way to reduce repetitive work.

The human team should still own architecture, product tradeoffs, and final sign-off.

Measure What Matters

The metrics that matter are the ones that tell you whether Codex is producing reviewable, mergeable, trustworthy work — not the ones that count activity. Below is each metric, how to actually compute it from data you already have, and the rule of thumb for what "healthy" looks like.

1. Time to First Useful Diff

Definition: From the moment a Codex task is started, how long until it produces a diff that a human would actually consider applying (after possible small tweaks).

How to measure:

For CLI/IDE tasks, log the wall-clock time from prompt submission to first diff. The Codex CLI emits structured logs you can parse; a simple wrapper script suffices:
```
start=$(date +%s); codex ""; echo "elapsed: $(( $(date +%s) - start ))s"
```
For Codex Cloud tasks, use the task duration shown in the chatgpt.com/codex dashboard, or pull it from the workspace usage export.
Tag each task as "useful" or "discarded" in a shared spreadsheet for the first month. After that, you can sample.

Healthy: under 2 minutes for bounded tasks; under 10 minutes for multi-file refactors. If the median is much higher, your prompts probably lack context (see Section 5).

2. Test Pass Rate on Codex-Generated Changes

Definition: Of the diffs Codex produces, what percentage pass the existing test suite on the first try.

How to measure:

In CI, tag PRs that originated from Codex (a label like codex-authored or a commit-message prefix works). Then run a simple weekly query:

SELECT
  COUNT(*) FILTER (WHERE first_ci_run = 'pass') * 100.0 / COUNT(*) AS first_try_pass_rate
FROM pull_requests
WHERE labels @> '{"codex-authored"}'
  AND created_at > NOW() - INTERVAL '7 days';

For local CLI usage, instrument with a wrapper that runs your test command immediately after Codex finishes and records the exit code.

Healthy: above 75% for bounded tasks. Below 50% means Codex is making changes without verifying them — usually fixable by adding "run the tests after" to your prompt template (see Section 9 → Standardize Task Prompts).

3. Review Findings Caught by Codex

Definition: When Codex is used as a pre-merge reviewer, how many issues does it surface that a human reviewer or CI would have caught anyway, vs. issues only Codex caught, vs. false positives.

How to measure:

Have human reviewers annotate Codex's review comments with one of three tags: agree-found-it, agree-missed-it, disagree-noise.
Track the ratios over time:
- Useful-finding rate = (agree-found-it + agree-missed-it) / total Codex comments.
- Unique-value rate = agree-missed-it / total Codex comments.
A simple GitHub Actions step that posts the Codex review and asks the human reviewer to react with emoji (✅ / ⚠️ / ❌) makes this nearly free to collect.

Healthy: useful-finding rate above 70%; unique-value rate above 20%. Unique-value rate is the number that justifies keeping the workflow on — if it is near zero, Codex is duplicating CI and you can disable it without losing anything.

4. Tasks Completed Without Human Rewrite

Definition: Of all merged Codex-authored changes, what fraction shipped substantially as Codex wrote them (vs. being heavily rewritten by a human before merge).

How to measure:

Compare the diff Codex initially produced to the diff that actually merged. The simplest proxy:
```
# in the Codex-authored branch:
git diff codex/initial-commit HEAD --shortstat
```
If the post-Codex diff changes more than ~30% of the lines Codex originally wrote, count the task as "rewritten."
Track this monthly. The trend line matters more than the absolute number.

Healthy: above 60% shipped without major rewrite. Lower than that, and either prompts are under-specified or Codex is being pushed into work it is bad at — re-read Section 14.

5. Developer Satisfaction

Definition: Whether the people actually using the tool think it makes them faster and want to keep using it. Hard numbers do not capture this.

How to measure:

Run a 5-question pulse survey monthly. Keep it short. Suggested questions, all on a 1–5 scale:
1. "Codex saved me time this week."
2. "I trust Codex's diffs enough to review them confidently."
3. "Codex's review comments are usually worth reading."
4. "I would be unhappy if Codex were taken away."
5. "What is the single biggest friction point?" (free text)
Track the trend in question 4 specifically. That is the closest equivalent to a product-market-fit signal for an internal tool.

Healthy: average score above 3.5/5 on questions 1–4 by month 3 of rollout. If question 4 trends down, the rollout is failing regardless of what the other metrics say.

What NOT to Measure

These look useful but mislead:

Number of prompts sent. Counts activity, not value. A team sending 10× more prompts may be 10× more productive — or 10× more confused.
Tokens consumed. Useful for budget, useless for impact. Heavy users are not necessarily good users.
Lines of code generated. Same problem as LOC has always had: you reward verbosity.
PRs opened by Codex. A Codex-opened PR that nobody merges is a negative outcome dressed up as a positive one.

Use the cost data (Section 7) to manage budget. Use the metrics above to manage adoption.

Use the Right Surface for the Job

CLI for terminal-heavy local work.
IDE extension for day-to-day coding.
App for parallel project work.
Cloud for background tasks and review.

That is usually the difference between "this is useful" and "this is annoying."

Section 10: Common Workflows and Examples

Here are the workflows most teams will actually use. Each one includes a worked example against the codex-demo repo from Section 4 so you can see the full prompt, the kind of output Codex produces, and what to do with it.

Workflow 1: Fix a Bug Locally

Use when: A test is failing, a behavior is wrong, and the cause is contained to one file or function.

Steps:

Open the repo in your terminal or IDE.
Ask Codex to inspect the failing path.
Request a fix and a test.
Review the diff.
Run the test suite.

Worked example:

In the codex-demo repo, suppose a teammate just reported: "apply_discount is silently returning a negative price when discount_percent is greater than 100." Verify the bug first:

python -c "from pricing import apply_discount; print(apply_discount(100, 150))"
# prints: -50.0    <-- silent negative price, no error raised

Now launch Codex and run:

Bug: apply_discount(100, 150) returns -50.0 instead of raising an error.
Expected: discount_percent values above 100 should raise ValueError with
the message "discount_percent must be between 0 and 100".

Task:
- Add the validation in pricing.py.
- Add a test in test_pricing.py that asserts ValueError is raised for
  discount_percent=150.
- Keep the existing tests passing.
- Run pytest at the end and report the result.

What you get back: a diff that adds if discount_percent > 100: raise ValueError(...) in apply_discount, a new test_invalid_discount_percent_above_100 test, and the pytest output showing all four tests passing. Review with git diff, run python -m pytest yourself to confirm, then git commit -am "Reject discount_percent > 100".

This works best when the bug is bounded and reproducible. If you cannot reproduce it from the command line, Codex usually cannot either.

Workflow 2: Review a Pull Request

Use when: You (or a teammate) just made a change and want a fast pre-merge sanity check before opening it for human review.

Steps:

Point Codex at the PR or changed files.
Ask for correctness issues, missing tests, and security risks.
Compare the findings against human review.
Use Codex as a pre-filter before the broader team reviews.

Worked example:

After completing Workflow 1 above, ask Codex to review your own change before opening a PR:

Review the change in my last commit (HEAD) — it added validation to
apply_discount in pricing.py.

Look for:
- correctness issues (off-by-one on the boundary, wrong error type, etc.)
- missing tests (boundary cases like exactly 100, exactly 0, NaN, negative zero)
- security or robustness issues
- API consistency with the existing apply_discount validation style

Prioritize findings as CRITICAL / IMPORTANT / NIT and propose a concrete
fix for each. Do not modify any files in this turn.

What you might get back:

IMPORTANT: line 14 — the new validation rejects discount_percent > 100 but
  silently allows discount_percent == 100, which makes the price 0. That is
  technically valid but worth a test to lock the boundary. Add:
    test_apply_discount_at_boundary_100_returns_zero

NIT: the new error message says "between 0 and 100" but the existing check
  for negative values says "must be >= 0". Consider unifying the messages
  for consistency.

You apply the IMPORTANT fix (often by following up with: "apply the IMPORTANT fix from your review"), defer or accept the nit, and re-run tests.

This is one of the highest-leverage team workflows because it catches obvious problems before a human spends review time on them. See Section 9 → Measure What Matters → Review Findings Caught by Codex for how to track its actual value over time.

Workflow 3: Understand a Large Codebase

Use when: You are new to a repo (or returning after months away) and need a map before you can safely make changes.

Steps:

Ask Codex to trace a request flow.
Ask for the key modules and entry points.
Request a map of the code path before editing anything.

Worked example:

The codex-demo repo is too small to need this, so imagine a more realistic case: a teammate's repo with app/, services/, models/, api/, and 80 files you have never seen. Open the repo in Codex and run:

I am new to this codebase. Without modifying anything, give me an
orientation:

1. What is the entry point for the HTTP API?
2. Trace what happens when a POST hits /users — list every file the
   request touches in order, with a one-line description of each.
3. Where is database access centralized? Is there a repository pattern?
4. What test command should I run to verify any change I make?
5. What are the three files I should read first to understand the
   project's conventions?

Output as a structured markdown report.

What you get back: a markdown report you can paste into your notes. Read the recommended files, then start working with Codex on actual changes. The 10 minutes spent on this orientation typically saves an hour of confused refactoring later.

This workflow is particularly useful for new hires. A senior engineer can also use it the first time they touch an unfamiliar service to avoid breaking conventions they cannot see.

Workflow 4: Generate a Feature in Parallel

Use when: A feature naturally splits into independent pieces (API + tests + docs, or UI + backend + migration) that do not block each other.

Steps:

Break the work into subtasks.
Run separate Codex tasks for UI, API, tests, or docs.
Merge the outputs after review.

Worked example:

Add a new "loyalty discount" capability to codex-demo. The work splits into three pieces that do not depend on each other:

Subtask	Surface	Prompt
A. Implementation	CLI in terminal 1	"Add a `loyalty_discount(price, customer_tier)` function to `pricing.py`. Tiers are 'bronze' (0%), 'silver' (5%), 'gold' (10%). Reject unknown tiers with ValueError. Do not change any other function."
B. Tests	Codex Cloud	"Generate exhaustive tests in `test_pricing.py` for a function `loyalty_discount(price, customer_tier)` with tiers bronze/silver/gold. Cover: each tier, unknown tier, negative price, zero price, decimal prices. Do not modify pricing.py — assume the function will exist."
C. Docs	VS Code extension	"Add a section to README.md documenting the new loyalty_discount function: signature, tier table, and one usage example."

Each runs in parallel. When all three finish, merge the diffs (typically the implementation goes first, then tests verify against it, then docs reference what shipped). Review each independently.

The Codex app and cloud surfaces are especially good for this because they let you launch and monitor multiple tasks without juggling terminal windows. The CLI also supports parallel work, but it benefits from git worktree so each run operates on its own branch checkout.

Workflow 5: Use Subagents for Decomposition

Use when: A single task is too large for one Codex run but can be naturally split into investigate / plan / implement phases.

The CLI explicitly supports subagents — one Codex task that spawns child tasks, each with a narrower scope and its own context window.

Worked example:

A bug report says: "Cart totals are sometimes off by a penny for European currencies." You do not yet know if this is a rounding bug, a currency-conversion bug, or a data bug. Run a parent task that decomposes:

A bug report says cart totals are occasionally off by a penny for
European currencies.

Decompose this into three subagent tasks:

1. INVESTIGATE: Read pricing.py and any currency-related code. Identify
   every place where floating-point arithmetic touches a money value.
   Report findings without proposing fixes.

2. REPRODUCE: Write a failing test in test_pricing.py that demonstrates
   a one-cent discrepancy with EUR amounts. Use the smallest possible
   reproduction.

3. PROPOSE: Based on (1) and (2), propose two possible fixes (e.g.,
   switching to Decimal vs. rounding at the boundary) with the trade-offs
   of each. Do not implement either yet.

Wait for me to pick a fix before writing any production code.

Why subagents help: each child task has a clean context, so the investigation findings do not pollute the test-writing context, and the proposal task gets a clean view of both. You also get a natural human checkpoint between investigation and implementation.

That division is often faster than one giant all-purpose run, and dramatically more reviewable.

Prompt Cookbook

New users often ask for examples because they know what they want outcome-wise but not how to phrase it. These templates are a good starting point.

Bug Fix Template

Inspect the failing behavior in [file or module].
Identify the root cause.
Patch the smallest safe fix.
Add or update tests.
Summarize what changed and any edge cases I should watch.

Use this when the bug is narrow and you want a disciplined fix, not a redesign.

Refactor Template

Refactor [module] to improve readability and maintain the current behavior.
Keep external APIs stable.
Explain the refactor plan before editing.
Make the smallest set of changes that achieves the goal.

Use this when the code works but is hard to maintain.

Review Template

Review this change for correctness, missing tests, security issues, and maintainability risks.
Prioritize findings by severity.
Call out any behavior changes or ambiguous logic.

Use this when you want Codex to act like a pre-merge reviewer.

Feature Template

Implement [feature] in [file or subsystem].
List the files you expect to touch before changing anything.
Add tests.
Keep the implementation aligned with the current architecture.

Use this when the task spans multiple files and you want visibility into the plan.

Signs You Are Using Codex Well

You usually know the workflow is healthy when:

Codex makes small, reviewable diffs instead of broad rewrites.
The model asks for clarification only when the missing detail matters.
Test coverage improves along with functionality.
New developers can use the tool without needing a custom training session.
The time from prompt to merged change is lower, but review quality does not drop.

You usually know the workflow is unhealthy when:

Prompts are vague and every result needs heavy rework.
The team treats the first output as final.
Nobody is checking diffs or running tests.
Users keep asking for "make it better" instead of defining a clear target.

Those signals matter more than raw usage counts.

Section 11: Model Specs and Benchmarks (GPT-5.5 Deep Dive)

Section 2 introduced GPT-5.5 as the new general flagship and gave the three-bullet practical takeaway. This section is the deep dive: the published benchmark numbers, what each one actually measures, why it matters for Codex workloads specifically, and how to use those numbers to pick the right model per task.

If you are setting budgets or choosing default models for a team, read this section in full. If you just want to use Codex, you can skim it.

Why Benchmarks Matter for Model Selection

Codex lets you pick the model behind each surface. Picking well is mostly about matching the model's strengths to the task shape:

A bounded local edit (one file, one function) does not benefit much from a frontier model. Codex-specific or Codex-mini variants are usually the right call.
A repository-wide refactor that needs the model to keep many files in working memory benefits enormously from long-context performance.
An agentic cloud task that runs unattended for ten minutes benefits from low hallucination rates and strong tool-use behavior.
A PR review benefits from low hallucination rates above almost everything else — a confident-but-wrong review comment costs more than a missed real issue.

The benchmarks below tell you which model best matches each shape.

GPT-5.5 Performance Highlights

The published benchmarks position GPT-5.5 as a meaningful jump over GPT-5.4, particularly on agentic and long-context work — the workloads most relevant to Codex users.

Knowledge work (GDPval) — 84.9%. GDPval evaluates whether a model can produce well-specified knowledge-work output across 44 occupations. This is the headline general-capability number.
Computer use (OSWorld-Verified) — 78.7%. Measures whether the model can drive a real computer environment end-to-end. Directly relevant to Codex Cloud sandboxes and agentic CLI runs.
Coding (Terminal-Bench 2.0) — 82.7%. A terminal-centric coding benchmark with long-context retrieval and computer-use components. The closest public proxy for Codex CLI workloads.
Customer-service workflows (Tau2-bench Telecom) — 98.0% without prompt tuning. Indicates strong tool-use and policy-adherence behavior straight out of the box.
Long-context retrieval (MRCR v2 at 1M tokens) — 74.0%, up from 36.6% on GPT-5.4. This is the largest single jump in the report and the most important one for repository-scale Codex tasks where the model must keep many files in working memory.
Hallucination rate — independent coverage reports a roughly 60% reduction in hallucinations versus prior generations, which materially changes the trust calculus for review and PR-feedback workflows.

What Each Benchmark Actually Measures

Benchmarks are easy to misread. Quick definitions of the ones cited above:

GDPval — Asks the model to produce specified knowledge-work output across 44 occupations (legal memos, financial summaries, technical documentation, etc.). A high score means the model can produce structured, well-specified output reliably. Use as a general-capability signal, not a coding-specific one.
OSWorld-Verified — Tasks the model with operating a real desktop environment to complete real workflows (open files, navigate UIs, run commands). High scores predict the model will behave well in agentic sandboxes that mimic a developer's desktop.
Terminal-Bench 2.0 — A terminal-driven coding benchmark with long-context retrieval and computer-use components. The closest public proxy for what Codex CLI actually does day to day.
Tau2-bench Telecom — Evaluates complex customer-service-style workflows that require following policies and using tools correctly. A proxy for "does the model do what you told it without going off-script."
MRCR v2 at 1M tokens — A long-context retrieval benchmark. Tests whether the model can find and use information across a full 1M-token context window. The single best predictor of behavior on repository-scale Codex tasks where many files must be kept in working memory.

Practical Guidance for Codex Users

Translate the benchmarks into model choice:

Repository-wide tasks (cross-file refactors, multi-module migrations): GPT-5.5. The MRCR v2 jump is the single best signal that it will behave better on large codebases than GPT-5.4 did.
Cheap, bounded local edits (single function, single test, doc tweak): GPT-5.4 or a Codex-specific model. The cost/latency tradeoff is much better and the capability headroom is wasted on small tasks. Do not default everything to GPT-5.5 just because it is newest.
Agentic cloud tasks (background sandbox runs, multi-step workflows): GPT-5.5. The OSWorld-Verified score and lower hallucination rate are the relevant signals — fewer broken sandbox runs and fewer confidently-wrong outputs.
PR review and code review workflows: GPT-5.5. The 60% hallucination drop is the single most important number for review work; a noisy reviewer trains the team to ignore the reviewer.
Most expensive workloads (anything that approaches GPT-5.5 Pro pricing): keep GPT-5.5 Pro reserved for the small set of tasks where its extra capability is justified — typically deeply novel reasoning or extreme long-context work.

For Procurement: Treat GPT-5.5 as a Separate Budget Line

Token consumption on agentic tasks is dominated by output. GPT-5.5 outputs are substantially more expensive than GPT-5.4 outputs. Concretely:

Mixed-model strategies are now the rule, not the exception. Most mature teams route routine work to a Codex-mini model and reserve GPT-5.5 for repository-wide and review-heavy work.
The worked cost example in Section 7 shows the 30-engineer PR-review case across all five model tiers. Read it before approving a budget.
Re-check pricing every quarter. The rate card has changed in the past and will change again.

Verify Before Quoting

The numbers in this section come from OpenAI's launch documentation and contemporaneous press coverage. Before they go into a procurement deck or a public document, verify against the official OpenAI announcement and the model page — see Section 16: Source References. Benchmarks get re-run; numbers shift with eval methodology changes.

Section 12: Troubleshooting

Even good tools fail if the setup is wrong. Here are the most common issues.

"Codex is not installed"

Check:

You ran npm i -g @openai/codex.
You are using a supported shell and runtime.
The binary is on your path.

Check:

Your ChatGPT account has the right plan.
Your workspace allows Codex local or cloud use.
You are signing in with the correct account.

"Windows is behaving badly"

The CLI docs say Windows support is experimental. If you are on Windows, the best supported path is to use WSL for the CLI or use the Codex app where appropriate.

"Cloud task cannot see my repo"

Check:

The GitHub connector is installed.
The repository is allowed in the connector.
Your organization admin has enabled Codex cloud.
You are using a GitHub-hosted repository.

"Codex will not browse the internet"

That is expected by default in cloud mode. Ask your admin whether internet access has been intentionally restricted.

"The result is technically correct but not what I wanted"

Usually this means the prompt was under-specified. Tighten:

The target file or feature.
The acceptance criteria.
The constraints.
The expected output format.

Section 13: FAQ

Is Codex a chat model?

Not exactly. It is a coding agent and product surface built to work on repositories, tests, code review, and multi-step software tasks.

Can I use Codex without switching tools all the time?

Yes. That is one of its strengths. You can use the CLI, IDE extension, or Codex app depending on your workflow.

Do I need the cloud features?

No. Many individual users will get value from the local CLI or IDE extension alone. Cloud tasks become more valuable as soon as you want background execution, parallelism, or automated review.

Is Codex only for professional engineers?

No, but it is most useful when the user can evaluate code changes and understand a repository. It is a developer tool first.

Is Codex the same as GPT-5.4?

No. GPT-5.4 is a model. Codex is the coding product/workflow. Codex may use different models depending on the surface and configuration.

What is the safest way to start?

Use the CLI or IDE extension in a small repo change, keep the approval mode conservative, and review every diff before merging.

Section 14: When NOT to Use Codex

Most of this handbook is affirmative — Codex is good at this, Codex fits here, here is how to set it up. That framing risks creating the impression that Codex is the right tool for any coding-adjacent task. It is not. The fastest way to lose team trust in an AI coding tool is to push it into work it is bad at. The following is an honest list of where Codex is a poor fit today.

Tasks With No Reviewable Output

Codex's value depends on a human reviewing the diff, the test result, or the explanation. If the task produces something nobody will check — a one-off script that touches production data, an exploratory query whose result drives a decision before anyone reads the SQL — the AI's confidence becomes the only quality gate. That is a bad position to be in regardless of model quality. Either add a review step or do the task yourself.

Highly Novel Architecture Decisions

Codex is good at applying patterns. It is much weaker at choosing which pattern fits a problem the team has not solved before. Expect it to confidently generate plausible-but-wrong architecture for genuinely new domains: a new pricing model, a new auth boundary, a new event-sourcing scheme. Use it to prototype options, not to decide between them.

Work That Crosses Org Boundaries

Codex sees the repository it has access to. It does not see the cross-team contracts, the deprecation calendar in the platform team's roadmap, the half-finished migration in another repo, or the political reasons one approach is off-limits. For changes that span multiple teams or services, Codex can implement individual pieces, but a human still needs to own the cross-cutting plan.

Anything Touching Live Production State

Codex Cloud sandboxes are good. They are not a substitute for human approval before a production change. Database migrations, infrastructure-as-code that mutates real resources, secret rotation, customer-data scripts — these need a human in the approval path even if Codex wrote the diff. The fact that Codex can run commands does not mean it should run those commands.

Compliance- and Safety-Critical Code

Code that lives inside a regulated boundary (payments, medical, security primitives, model-evaluation harnesses for safety) has higher review and provenance requirements than typical product code. Codex output is fine as a starting draft, but the review burden is the same as for any third-party-authored code, which usually means the speed advantage shrinks substantially. Plan for that or keep these areas Codex-free.

Tasks Where the Real Bottleneck Is Knowledge, Not Typing

If the team is stuck because nobody understands the legacy system, the failing test, or the weird customer report, generating more code rarely helps. Codex can accelerate the implementation once you know what to do. It cannot replace the discovery and design conversation that should happen first. Teams that skip the discovery step and go straight to "ask Codex" tend to ship the wrong thing fast.

Anything Where Hallucinations Have High Cost

GPT-5.5 dropped hallucination rates by roughly 60% versus prior generations, which is a real improvement. It is not zero. Tasks where a confident-but-wrong output causes real damage — generating regulatory citations, copying API contract details from a doc the model hasn't actually read, asserting facts about an unfamiliar third-party library — still need the same skepticism you would apply to any AI output. Use search-grounded workflows or human verification for these.

Quick Heuristic

If you can answer all four of these with "yes," Codex is likely a good fit:

Can the output be reviewed by someone who would catch a mistake?
Is the task a known pattern, not a novel architecture decision?
Is the blast radius local to one repository or service?
Is the cost of a bad output bounded (e.g., a failed test, a reverted commit) rather than unbounded (e.g., production data loss, regulatory exposure)?

If any of those are "no," either restructure the task to make them "yes" or keep the work outside Codex.

Section 15: Final Recommendations

If you are rolling Codex out to new users, I would keep the guidance very simple:

Start with the CLI or IDE extension.
Use one small task to learn the tool.
Review every change before merging.
Move to cloud tasks only after users trust the local workflow.
For teams, separate user access from admin access.
Re-check pricing whenever your plan or workspace changes.

Codex is most valuable when it is treated as a disciplined engineering tool rather than a novelty. If you give it real code, clear constraints, and a review culture, it can accelerate the boring parts of software development and make bigger tasks easier to break down.

The LUNARTECH Fellowship: Bridging Academia and Industry

Addressing the growing disconnect between academic theory and the practical demands of the tech industry, the LUNARTECH Fellowship was created to bridge this talent gap.

Far too often, aspiring engineers are caught in the “no experience, no job” loop, graduating with theoretical knowledge but unprepared for the messy reality of production systems.

To combat this systemic issue and halt the resulting brain drain, the Fellowship invests heavily in promising individuals, offering a transformative environment that prioritizes hands-on experience, mentorship, and real-world engineering over traditional degrees.

This 6-month, remote-first apprenticeship serves as an immersive odyssey from aspiring talent to AI trailblazer. Rather than paying to learn in isolation, Fellows work on live, high-stakes AI and data products alongside experienced senior engineers and founders. By tackling actual engineering challenges and building a concrete portfolio of production-ready work, participants acquire the job-ready skills needed to thrive in today’s competitive landscape.

If you are ready to break the loop and accelerate your career, you can explore these opportunities and start your journey here: https://www.lunartech.ai/our-careers.

Master Your Career: The AI Engineering Handbook

For those ready to transition from theory to practice, we have developed The AI Engineering Handbook: How to Start a Career and Excel as an AI Engineer. This comprehensive guide provides a step-by-step roadmap for mastering the skills necessary to thrive in the transformative world of AI in 2026.

Whether you are a developer looking to break into a competitive field or a professional seeking to future-proof your career, this handbook offers proven strategies and actionable insights that have already empowered countless individuals to secure high-impact roles.

Inside, you will explore real-world industry workflows, advanced architecting methods, and expert perspectives from leaders at companies like NVIDIA, Microsoft, and OpenAI. From discovering the technology behind ChatGPT to learning how to architect systems that transform research into world-changing products, this eBook is your ultimate companion for career acceleration. You can download your free copy and start mastering the future of AI.

Section 16: Source References

Official OpenAI sources used for this handbook:

Press coverage of the GPT-5.5 release referenced in Section 2 and Section 11:

Appendix A: 30-60-90 Day Adoption Plan

If you are introducing Codex to a team, the fastest way to create trust is to phase adoption instead of rolling it out as a big-bang change. A staged plan also helps you discover where the real friction lives: authentication, permissions, prompt quality, review habits, or budget assumptions.

First 30 Days: Prove Value

In the first month, the goal is not maximum usage. The goal is repeatable wins.

Recommended actions:

Pick one or two engineers who are comfortable trying new tools.
Restrict usage to small, low-risk tasks such as bug fixes, test generation, and documentation updates.
Standardize a short prompt template so every request includes task, context, constraints, and expected output.
Require human review for every change.
Track the time it takes to go from prompt to merged diff.

What you should learn in this phase:

Does Codex understand your codebase structure?
Are the diffs reviewable?
Does the approval flow slow people down in a useful way, or in a frustrating way?
Which classes of tasks work well, and which ones need more guidance?

If the first month is noisy, do not blame the model first. Usually the issue is task scope, missing context, or unclear acceptance criteria.

Days 31-60: Expand Carefully

Once the tool has proven itself on a handful of tasks, expand to a broader pilot group.

Recommended actions:

Add more developers from different parts of the stack.
Include at least one person who is skeptical, because their feedback will reveal weak spots.
Try the app, CLI, and IDE extension in parallel so people can choose the workflow that matches their habits.
Introduce Codex cloud for one or two background tasks or pull request reviews.
Start documenting prompts that worked well, including examples of high-quality follow-up instructions.

What you should learn in this phase:

Which surfaces are actually sticky for the team?
Where does Codex save the most time?
Do people trust the output enough to delegate real work?
Are you seeing the same mistakes repeatedly?

At this stage, your internal documentation matters. A short "how we use Codex here" page is often more useful than another technical deep dive.

Days 61-90: Operationalize

After about three months, your objective should shift from experimentation to operating practice.

Recommended actions:

Assign ownership for workspace settings, GitHub connector setup, and model access.
Define which tasks should stay local and which can go to cloud sandboxes.
Document your review standards for Codex-generated diffs.
Set budget expectations with the team so no one is surprised by token-heavy tasks.
Add Codex to onboarding for new engineers, starting with one simple flow.

What good looks like at this stage:

New hires can use Codex on day one.
Team members know when to reach for Codex and when to use a different workflow.
Admins can answer access and pricing questions quickly.
The organization has a realistic picture of the tool's strengths and limits.

A Practical Onboarding Script

If you need a ready-made orientation for a new user, use this:

"Install the CLI or extension."
"Open a repository you know well."
"Ask Codex to make one small, safe change."
"Review the diff line by line."
"Run the tests."
"Ask Codex to explain what it changed and why."
"Repeat with a slightly larger task."

That sequence teaches the core loop: context, task, change, review, verify. Once a user understands that loop, the rest of the product family becomes much easier to adopt.

Appendix B: Glossary

Terms used in this handbook, in alphabetical order. The list is intentionally narrow — only terms that appear in the body and are likely to be unfamiliar to a non-engineering reader (procurement, security, leadership) are defined here.

Agent / agentic workflow. Software that can take a goal, plan steps, take actions (read files, run commands, call APIs), observe the result, and iterate. Codex is an agentic coding workflow; a chatbot is not.
Approval mode. A Codex setting that controls how much the agent can do without asking. Stricter modes prompt the human before running shell commands or modifying files; permissive modes let the agent work uninterrupted.
CLI. Command-line interface. The Codex CLI is the terminal-based version of Codex, installed via npm i -g @openai/codex.
Codex Cloud. The hosted, sandboxed execution mode for Codex. Tasks run in isolated environments with the repo and finish with a reviewable diff.
GDPval. A benchmark that scores models on their ability to produce well-specified knowledge-work output across 44 occupations. Used in Section 11 as a general-capability signal.
GitHub Connector. The integration that lets Codex Cloud access GitHub repositories. Required for cloud tasks; uses short-lived, least-privilege tokens.
MCP (Model Context Protocol). An open protocol for connecting models to external data sources and tools. Codex CLI supports MCP, which lets it pull in data from systems beyond the repo.
MRCR v2. A long-context retrieval benchmark that measures whether the model can find and use information across very large input windows. The 1M-token version is cited in the GPT-5.5 section because it predicts behavior on repository-scale tasks.
OSWorld-Verified. A benchmark that measures whether a model can operate a real desktop computer environment to complete tasks. A direct proxy for agentic and computer-use workloads.
PR (pull request). A proposed change to a code repository, hosted on GitHub or similar platforms, where reviewers approve before the change merges.
RBAC (role-based access control). A permission model where users are assigned to roles, and roles have specific permissions. Used by Codex workspace admins to control who can do what.
SCIM (System for Cross-domain Identity Management). A standard for syncing users and groups from an identity provider (Okta, Entra ID, etc.) into another system. Codex supports SCIM-based group sync for enterprise.
Subagent. A Codex CLI feature that splits a task across multiple parallel agent runs, each handling a piece of the work.
Tau2-bench Telecom. A benchmark for complex customer-service workflows with tool use. Cited as a signal for tool-use reliability and policy adherence.
Terminal-Bench 2.0. A coding benchmark focused on terminal-driven workflows, including long-context retrieval and computer use. The closest public proxy for Codex CLI workloads.
Worktree. A git feature that lets multiple branches be checked out simultaneously in different directories. The Codex app uses worktrees so multiple agents can work in parallel without stepping on each other.
WSL (Windows Subsystem for Linux). A compatibility layer that runs Linux binaries natively on Windows. The recommended environment for Codex CLI on Windows, since direct Windows support is experimental.

Appendix C: Admin Security Checklist

For workspace admins setting up Codex for an enterprise. This checklist condenses Section 8 into actionable items. Run through it before broad rollout, then revisit quarterly.

Access

[ ] Decide whether Codex Local, Codex Cloud, or both are enabled at the workspace level.
[ ] Create separate RBAC groups for Codex Admins (policy and governance) and Codex Users (day-to-day developers). Avoid mixing the two.
[ ] Sync user and group membership from your identity provider via SCIM rather than managing users by hand.
[ ] Set a sensible default role for new workspace members. Do not default to admin.

GitHub integration

[ ] Install the ChatGPT GitHub Connector against the correct GitHub organization.
[ ] Allowlist only the repositories Codex Cloud needs. Do not grant org-wide access by default.
[ ] Verify Codex respects existing branch protection rules on protected branches before enabling cloud tasks against them.
[ ] Confirm the GitHub App tokens Codex uses are short-lived and least-privilege.

Network and runtime

[ ] Confirm Codex Cloud runs with no internet access by default. This is the secure default; verify it is on.
[ ] If a workflow requires internet access, define an explicit allowlist (dependency registries, trusted sites) and limit allowed HTTP methods.
[ ] Document which model surfaces are approved for sensitive code (often: local CLI yes, cloud no for the most sensitive repositories).

Data and review

[ ] Document the team's review standard for Codex-generated diffs. At minimum: a human approves every merge.
[ ] Confirm logging and audit trails are configured for Codex actions (model used, prompts, files changed) per your compliance requirements.
[ ] Define which classes of data are off-limits to Codex (PII, customer data, secrets) and how those boundaries are enforced.
[ ] Establish an incident playbook for the case where Codex generates or commits something it should not have.

Budget and ongoing operations

[ ] Set a per-workspace token budget or alert threshold so unexpected spend is caught early.
[ ] Pick a default model per task type (e.g., Codex-mini for routine review, GPT-5.5 for repository-wide refactors) and document the choice.
[ ] Review the Codex pricing page quarterly. The rate card has changed in the past and will change again.
[ ] Re-run this checklist when (a) a major model release lands, (b) the workspace expands to a new team, or (c) Codex adds a new surface or capability.

Appendix D: Changelog

A short, append-only log of substantive revisions to this handbook. Each entry lists the version, date, and a one-line summary of what changed.

v1.3 — 2026-04-30. Made the Table of Contents clickable. Added a new Prerequisites section after the TOC. Restructured the early sections: merged the old "Quick Start" and "How to Set Up Codex" into a single Section 4 walkthrough using a self-contained codex-demo repo readers build themselves. Slimmed Section 2 by moving the GPT-5.5 benchmark deep dive to a new Section 11 (Model Specs and Benchmarks). Added per-surface hyperlinks to Section 3. Rewrote Section 5 (How to Use Codex Effectively) with bad/good examples for every tip and a definition of "bounded change." Rewrote the "Measure What Matters" subsection with concrete computation methods for each metric. Added worked, runnable examples to every workflow in Section 10. Renumbered downstream sections accordingly.
v1.2 — 2026-04-25. Added Appendix E (Working with Codex in VS Code), a detailed step-by-step guide covering the three VS Code entry points — the extension, the CLI in the integrated terminal, and browser Codex at chatgpt.com/codex — with setup instructions, a decision matrix, a combined-workflow pattern, and VS Code-specific troubleshooting. Added a forward-pointer in the setup section.
v1.1 — 2026-04-25. Added GPT-5.5 / GPT-5.5 Pro coverage in Section 2 and Section 7. Added executive summary, comparison matrix in the model-comparison section, worked cost example, "When NOT to use Codex" in Section 14. Added Appendix B (Glossary), Appendix C (Admin Security Checklist), Appendix D (Changelog). Added version stamp and author line. Press coverage sources for GPT-5.5 added in Section 16.
v1.0 — Initial release. Original Codex onboarding handbook covering surfaces, setup, usage, model comparison, pricing, security, team practices, workflows, troubleshooting, FAQ, and the 30-60-90 day adoption plan.

Appendix E: Working with Codex in VS Code

This appendix is a focused, step-by-step guide to using Codex inside Visual Studio Code (and its forks, Cursor and Windsurf).

VS Code is the most common starting surface for new Codex users, and the workflow has three distinct entry points that can be used independently or together. This guide covers each one, when to pick it, and how the three combine into a single fluid workflow.

E.1 Why VS Code Is the Recommended Starting Surface

Most teams start with VS Code rather than the standalone Codex app or pure CLI for a few practical reasons:

The editor is already where engineers spend their day. Adding Codex does not require a context switch.
The extension surface area is small and reviewable. Engineers can try it on a single file before adopting it more broadly.
VS Code's integrated terminal makes the CLI a one-keystroke experience, so the extension and CLI can be combined without leaving the editor.
Cursor and Windsurf, the most popular VS Code forks, both run the same Codex extension. A team that standardizes on the VS Code workflow does not have to retrain people if some engineers prefer a fork.

The downside of starting in VS Code is that you do not get parallel-task management or worktree support out of the box — those are stronger in the Codex app. For most individual contributors, that is not a meaningful loss in the first month.

E.2 The Three Entry Points

Codex shows up in VS Code in three distinct ways, and they are easy to confuse. Each is a separate piece of software with its own install and its own auth handshake, even though they all sign in with the same ChatGPT account.

The Codex VS Code extension — a sidebar UI inside VS Code itself. Installed from the VS Code Marketplace. Best for in-flow editing, quick questions about the open file, and short bounded tasks.
The Codex CLI, run inside VS Code's integrated terminal — the command-line agent (codex) running in the terminal pane that is already attached to your VS Code workspace. Best for multi-step agentic tasks, scripted runs, and anything where you want explicit approval gates.
Browser Codex at chatgpt.com/codex — the web interface to Codex Cloud, where tasks run in isolated sandboxes against your GitHub repository. Best for background work, parallel tasks, and PR-style review.

These are not alternatives to each other in the sense that you must pick one. They are three workflows that target different kinds of work, and most experienced Codex users have all three set up.

E.3 Setting Up the Codex VS Code Extension

This is the entry point most new users meet first.

Install

There are two install paths:

Open the VS Code Marketplace, search for "Codex" or "ChatGPT", and install the extension published by openai. The marketplace identifier is openai.chatgpt.
From a terminal, run:

code --install-extension openai.chatgpt

The CLI install path is useful for scripted dev-environment provisioning, dotfiles repos, and onboarding scripts that bring a new machine up to a known baseline.

Sign in

After install, the Codex panel appears in the right sidebar. The first time you open it, you will be prompted to sign in. You have two options:

Sign in with ChatGPT. Recommended for individuals on Plus, Pro, Business, or Enterprise/Edu plans. Usage is charged against your plan's included Codex credits.
Sign in with an API key. Used when you want metered API billing instead of plan-based usage, or when your workspace policy requires it. Get the key from the OpenAI developer console, then paste it into the extension's auth prompt.

If both options are visible and you are unsure which to pick, default to ChatGPT sign-in. It is the path that exercises the same plan-included usage that the rest of your team is on, which makes cost behavior predictable.

First-run sanity check

Once signed in, do a five-minute sanity check before relying on the extension for real work:

Open a small repository you know well.
Open the Codex panel in the right sidebar.
Ask a question about the open file (e.g., "What does this function do?") and confirm the answer matches what you already know.
Ask for a small change (e.g., "Add a docstring to this function") and confirm a reviewable diff appears.
Apply the change, run your tests, and revert if needed.

If any of those steps fails, fix the auth or install before going further. Trying to debug the extension on a real task is much harder than debugging it on a known-good toy task.

Platform notes

macOS and Linux are first-class. The extension and the underlying CLI both work natively.
Windows is experimental for the CLI. The extension itself works, but if you also want to run the CLI inside VS Code's integrated terminal, OpenAI recommends using a WSL workspace. Open the folder via "Reopen in WSL" before installing the CLI.
Cursor and Windsurf run the same extension. Watch for visual or shortcut conflicts with the fork's built-in AI features — see E.9 for specifics.

E.4 Setting Up the Codex CLI Inside VS Code's Integrated Terminal

The CLI is the second entry point. It runs as a normal command-line tool, but inside VS Code's integrated terminal it picks up the active workspace folder automatically, which makes it feel like a native part of the editor.

Install the CLI

From any terminal, including VS Code's integrated terminal:

npm i -g @openai/codex

This installs the codex binary globally. Confirm by running:

codex --version

If the command is not found, the most common cause is that npm's global bin directory is not on your PATH. Either fix the PATH or use a Node version manager (nvm, fnm, volta) that handles it for you.

Open the integrated terminal in VS Code

Three ways to open it, pick whichever matches your habits:

The View menu → Terminal.
The keyboard shortcut Ctrl+** (backtick) on Windows/Linux, **⌃ on macOS.
The Command Palette: Terminal: Create New Terminal.

The integrated terminal inherits the active workspace folder as its working directory, which means codex launched from there immediately sees the right repo.

Run Codex

In the terminal, navigate to the repo (if you are not already there) and run:

codex

The first time you run it, you will go through the same auth flow as the extension — sign in with ChatGPT or paste an API key.

Pick an approval mode

The CLI supports several approval modes that govern how much Codex can do without explicit confirmation. For new users, start with the strictest mode (asks before every shell command and every file change), then loosen it once you trust the workflow on your repo. The relevant modes and how to toggle them are described in the CLI docs linked in Section 16.

Where the CLI beats the extension

Multi-step agentic runs that need to read several files, run tests, iterate, and report.
Anything you want to script or invoke from a package.json script, a Makefile, or a CI step.
Subagent decomposition (the CLI explicitly supports splitting a task across multiple parallel agent runs).
MCP-connected tools and custom data sources.
Cloud task launching from the terminal, when you do not want to leave the keyboard.

E.5 Setting Up Browser Codex (chatgpt.com/codex)

The third entry point lives outside VS Code but is essential for the full workflow because it is how you launch and monitor cloud tasks.

Open browser Codex

Navigate to chatgpt.com/codex. You will need to be signed into the same ChatGPT account you used for the extension and CLI. If you are part of an enterprise workspace, your admin must have enabled Codex Cloud at the workspace level — see Section 8.

You can also reach Codex through the sidebar in regular ChatGPT. The browser surface exposes two main verbs:

Code — assign a coding task. Codex spins up a sandbox preloaded with your repository and produces a reviewable diff.
Ask — ask a question about your codebase without changing any code.

Connect a GitHub repository

Cloud tasks need a GitHub-hosted repository. Connect it once:

Open environment settings at chatgpt.com/codex.
Connect your GitHub account through the ChatGPT GitHub Connector.
Grant access to the specific repositories you want Codex to be able to use. Do not grant org-wide access by default — see Appendix C for the security checklist.
Confirm the connector shows the repo as available.

Launch a task

From the Codex web interface:

Pick the repository and (optionally) the branch.
Type a prompt describing the task. Be specific — "Add input validation to the /users POST endpoint and update the matching tests" beats "Improve the API."
Click Code (or Ask for a non-mutating question).
Watch the live logs as Codex works, or close the tab and let it run in the background.
When it finishes, review the diff. From there you can request changes, accept the result, or open a pull request.

Delegate from a GitHub PR comment

A useful shortcut: in any PR on a connected repo, you can post a comment that tags @codex with an instruction (for example, "@codex review this PR for security issues and missing tests"). Codex will pick up the request and respond on the PR. This requires being signed into ChatGPT in the same browser.

Why the browser surface matters even if you live in VS Code

Cloud tasks decouple Codex from your local machine. You can launch a long-running task from the browser, close the laptop, and come back to the diff later. The extension and CLI cannot do this — they need an open VS Code instance to run.

E.6 When to Pick Which Entry Point

The three entry points overlap, which causes confusion. This table makes the choice mechanical.

Situation	Best entry point	Why
Quick edit on the file you have open	Extension	Lowest friction, no context switch
"What does this function do?"	Extension	Right-sidebar Q&A is faster than typing it into a terminal
Multi-file refactor with tests	CLI in integrated terminal	Better at multi-step agentic work and approvals
Anything you want to script or wire into a Makefile	CLI	Only the CLI is invokable from other scripts
Long-running task you want to leave running	Browser (cloud)	Decoupled from your laptop
Parallel tasks (e.g., three independent fixes at once)	Browser (cloud)	Cloud sandboxes run in parallel without local resource contention
PR review on a teammate's pull request	Browser, via `@codex` mention in PR	Lives where the review actually happens
Anything touching production credentials or live infra	None of the above without explicit human approval	See Section 14

The pattern that emerges: extension for in-flow editing, CLI for serious local agentic work, browser for anything you want offloaded or shared with the team.

E.7 The Combined VS Code Workflow

The three entry points are most powerful when used together. A representative day looks like this.

Morning, in VS Code:

Open the repo. The Codex extension panel is in the right sidebar.
Use the extension to ask questions about an unfamiliar module before you touch it.
Make small in-line edits — single-function changes, docstrings, type fixes — using the extension's diff-apply flow.

Mid-morning, in the integrated terminal:

Open the integrated terminal (Ctrl+`).
Run codex and start a multi-file task with explicit approval mode: "Refactor the auth middleware to use the new session interface. List the files you intend to touch first, then make the changes in the smallest commits possible."
Approve each shell command and each diff as Codex requests them.
Run the test suite when Codex finishes.

Afternoon, in the browser:

While you are reviewing the morning's CLI changes, open chatgpt.com/codex in another tab.
Launch a cloud task: "Add OpenAPI annotations to every public endpoint in the /api/v2 directory." This will take a while.
Switch back to VS Code and keep working. The cloud task runs in its own sandbox.
When the cloud task finishes, review the diff in the browser, request any tweaks, and open a PR.

End of day, on GitHub:

Tag @codex on a teammate's open PR with "review for correctness and missing tests." The result lands as a comment overnight.

The point of the combined workflow is that each entry point is doing what it is best at simultaneously. The extension keeps in-flow editing fast, the CLI handles local agentic work where you want approval control, and the cloud handles long-running and parallel tasks without consuming your local machine.

E.8 VS Code-Specific Tips

These are small tips that compound over time once you use Codex daily inside VS Code.

Sidebar position. The Codex panel defaults to the right sidebar. If you also have GitHub PR review or another panel there, drag Codex to the secondary side or to a panel-bottom dock — whichever keeps it visible without stealing space from the editor.
Keybindings. Bind the most-used Codex commands (open panel, new task, accept diff) to keyboard shortcuts via VS Code's Preferences: Open Keyboard Shortcuts. Reach for the keyboard, not the mouse.
Settings sync. If you use VS Code's Settings Sync, the Codex extension's settings travel with you to other machines. Auth state does not — you sign in again on each machine. This is the right behavior; do not work around it.
Multi-root workspaces. The extension scopes to the active workspace folder. If you open a multi-root workspace, switch the active folder explicitly before asking Codex to make changes, otherwise it may operate against the wrong root.
Integrated terminal profiles. If you use multiple terminal profiles (PowerShell, bash, WSL), set the WSL profile as default on Windows so codex from the integrated terminal always lands in the supported environment.
Source control panel. After Codex applies a change, the VS Code Source Control panel shows the diff. Review there before committing — it gives you the same context as a git diff without leaving the editor.
Don't fight the approval mode. New users often loosen approvals to "auto" too quickly because the prompts feel slow. Resist that for the first week. The approvals are how you build a mental model of what Codex actually does in your repo.
One Codex panel per VS Code window. Avoid running the extension and the CLI in the same workspace simultaneously on the same task — they can both touch files and you will get confused about which one made which change.

E.9 Cursor and Windsurf

The Codex extension explicitly supports Cursor and Windsurf, the two most popular VS Code forks. The install and sign-in flow is identical. The notes worth knowing:

Avoid double-AI confusion. Cursor and Windsurf both ship their own AI features. Engineers using them with Codex sometimes accidentally invoke the fork's built-in AI when they meant to invoke Codex, or vice versa. Pick a primary tool for editing and use the other only when its specific strengths matter.
Auth is independent. The Codex extension's ChatGPT sign-in is separate from Cursor's or Windsurf's own model accounts. Your Codex usage is billed against your ChatGPT plan; Cursor/Windsurf usage against theirs.
Keybinding conflicts. Cursor in particular has heavily customized AI-related keybindings. Audit your bindings after installing the Codex extension to make sure both surfaces are reachable.
Settings sync caveat. Cursor and Windsurf have their own settings sync that diverges from upstream VS Code. Codex extension settings may sync within Cursor or Windsurf separately from your VS Code installs.

For pure Codex-first teams, vanilla VS Code is the simplest baseline. For teams that already standardized on Cursor or Windsurf for other reasons, the Codex extension is a clean addition rather than a replacement.

E.10 Troubleshooting VS Code Specifically

The general troubleshooting list is in Section 12. The issues below are specific to running Codex inside VS Code.

Extension installs but sidebar panel never appears

Reload the window (Command Palette → "Developer: Reload Window"). If that does not fix it, check the Output panel, switch the dropdown to "Codex", and look for the actual error. The most common causes are a corporate proxy blocking the extension's auth handshake, or a conflicting older version of the extension still installed.

"Sign in" keeps looping back to the sign-in prompt

This usually means the redirect from the browser auth flow did not reach the extension. Try signing out completely, closing all VS Code windows, then reopening and signing in fresh. On Windows, verify your default browser is one VS Code can open via the OS handler.

codex command not found in the integrated terminal

The CLI's npm global bin directory is not on PATH. The fastest fix on macOS/Linux is to add $(npm bin -g) to your shell profile (.zshrc, .bashrc). On Windows, restart VS Code after the npm install so the integrated terminal picks up the updated PATH, or switch to a WSL terminal where the install is already on PATH.

Cloud task says "no repository connected" even though you connected one

Verify in chatgpt.com/codex environment settings that the specific repository is in the allowlist. The GitHub Connector grants per-repository access; granting access to the org alone is not enough. Also confirm your workspace admin has enabled Codex Cloud — individual users cannot enable it themselves.

Extension and CLI both editing the same file at the same time

Stop one of them. They do not coordinate, and you will get conflicting edits. The simplest discipline: pick one entry point per task, switch between tasks rather than trying to combine within a task.

Extension feels slower than the CLI for the same prompt

Often this is because the extension is using a different default model than your CLI configuration. Check both for the active model — the model picker in the extension panel, and codex --help or the relevant config file for the CLI.

Windows behavior is generally bad

Switch to a WSL workspace. OpenAI's own docs call out Windows as experimental for the CLI; the WSL path is the supported one and clears most issues at once.

Ready to Excel as an AI Engineer?

As we conclude this exploration of intelligent healthcare, it’s clear that the future belongs to those who can bridge the gap between groundbreaking research and real-world utility. If you are inspired to lead this transformation, we invite you to download our flagship resource, The AI Engineering Handbook. Authored by Tatev Aslanyan, a pioneering AI engineer and co-founder of LUNARTECH, this guide is designed to help you navigate the highly competitive landscape of AI engineering, providing you with the step-by-step roadmap and industry workflows needed to build world-changing products.

Empower yourself with the same strategies used by AI trailblazers at the world's most innovative tech companies. By mastering these production-ready skills, you won't just keep pace with the hyper-connected world — you will help define it. Get started today by downloading your eBook here: https://www.lunartech.ai/download/the-ai-engineering-handbook.

About LunarTech Lab

“Real AI. Real ROI. Delivered by Engineers — Not Slide Decks.”

LunarTech Lab is a deep-tech innovation partner specializing in AI, data science, and digital transformation – from healthcare to energy, telecom, and beyond.

We build real systems, not PowerPoint strategies. Our teams combine clinical, data, and engineering expertise to design AI that’s measurable, compliant, and production-ready. We’re vendor-neutral, globally distributed, and grounded in real AI and engineering, not hype. Our model blends Western European and North American leadership with high-performance technical teams offering world-class delivery at 70% of the Big Four’s cost.

How We Work — From Scratch, in Four Phases

1. Discovery Sprint (2–4 Weeks): We start with data and ROI – not assumptions to define what’s worth building and what’s not and how much it will cost you.

2. Pilot / Proof of Concept (8–12 Weeks): We prototype the core idea – fast, focused, and measurable.
This phase tests models, integrations, and real-world ROI before scaling.

3. Full Implementation (6–12 Months): We industrialize the solution – secure data pipelines, production-grade models, full compliance (HIPAA, MDR, GDPR), and knowledge transfer.

4. Managed Services (Ongoing): We maintain, retrain, and evolve the AI models for lasting ROI. Quarterly reviews ensure that performance improves with time, not decays. As we own LunarTech Academy, we also build customised training to ensure clients tech team can continue working without us.

Every project is designed from scratch, integrating clinical knowledge, data engineering, and applied AI research.

Why LunarTech Lab?

LunarTech Lab bridges the gap between strategy and real engineering, where most competitors fall short. Traditional consultancies, including the Big Four, sell frameworks, not systems – expensive slide decks with little execution.

We offer the same strategic clarity, but it’s delivered by engineers and data scientists who build what they design, at about 70% of the cost. Cloud vendors push their own stacks and lock clients in. LunarTech is vendor-neutral: we choose what’s best for your goals, ensuring freedom and long-term flexibility.

Outsourcing firms execute without innovation. LunarTech works like an R&D partner, building from first principles, co-creating IP, and delivering measurable ROI.

From discovery to deployment, we combine strategy, science, and engineering, with one promise: We don’t sell slides. We deliver intelligence that works.

Stay Connected with LunarTech

Follow LunarTech Lab on LunarTech NewsLetter and LinkedIn, where innovation meets real engineering. You’ll get insights, project stories, and industry breakthroughs from the front lines of applied AI and data science.

How to Create REST API Documentation in Node.js Using Scalar

Orim Dominic Adah — Wed, 25 Feb 2026 16:03:26 +0000

A REST API documentation is a guide that explains how clients can make use of the REST APIs in an application. It details the available endpoints, how to send requests and what responses to expect. It may also contain explanations of concepts that are specific to the scope of the application.

Without API documentation, application development is considered incomplete because developers cannot build software to interact with it, rendering the application effectively useless.

In this article, you will learn how to create beautiful REST API documentation that also allows you to test the APIs for free using an OpenAPI specification and Scalar in Node.js projects. You will use asteasolutions/zod-to-openapi to generate OpenAPI specification and use Scalar to create a web page from the specification.

To get the most out of this article, you should have experience developing REST APIs with Express or NestJS. You should also have experience with documenting REST APIs and using zod.

REST API Documentation Tools for Node.js
- Swagger
- Postman
- ReDoc
zod-to-openapi and Scalar for REST API Documentation
- How zod-to-openapi Works with Scalar
Benefits of Using zod-to-openapi and Scalar to Create REST API Documentation
How to Create the API Documentation
How to Use Scalar with NestJS
How to Resolve Content Security Policy (CSP) Errors When Used with Helmet
Absence of AsyncAPI Documentation Feature
Conclusion

REST API Documentation Tools for Node.js

A variety of tools already exist for documenting REST APIs and they have different strengths and weaknesses depending on their use case. While some of them are completely free to use, others operate a freemium model, and while some have an interface for testing APIs, others have a presentation-only user interface (UI).

Some of the most popular tools for documenting REST APIs in Node.js projects are listed below:

Swagger
Postman
Redoc

Swagger

In order to document REST APIs in Express with Swagger, you need swagger-jsdoc and swagger-express-ui. swagger-jsdoc collates and parses JSDoc-annotated documentation comments in the codebase and generates an OpenAPI specification document. swagger-ui-express uses the generated document to created a web page that renders the API documentation and test the APIs.

One of Swagger’s strengths lies in its support for the OpenAPI specification which is an industry standard. Swagger is free to use, has a vibrant open source community and strong support for many programming languages and frameworks. It supports only REST APIs.

Its major drawback is the poor developer experience in manually writing JSDoc comments or YAML for the documentation. The process can be clumsy and developers can forget to include some annotations. Another drawback is that the JSDoc comments can interfere with reading functional code. Lastly, some developers have complained about its boring UI.

Postman

Postman is a cloud-based desktop API client application that allows developers and technical writers to write, test, collaborate on and publish API documentation. Unlike Swagger, it does not require deep programming experience to use most of its features. It is also not limited to REST API documentation — it can document APIs for GraphQL, websockets and gRPC.

Postman provides a UI to fill in details of an API documentation. The documentation process is manual and sometimes, its content can get out of sync with that of deployed applications. It is not free to use for teams and collaboration and hides the real behaviour of browser interaction with the APIs like CORS and streaming.

ReDoc

ReDoc is an open-source tool used to generate API documentation from an OpenAPI (Swagger) specification. It supports GraphQL, AsyncAPI and the OpenAPI specification and it renders a more beautiful documentation than Swagger. redoc-express is used to document Express REST APIs with Redoc.

Redoc’s major drawback is that its free community edition is presentation-only. It does not support testing the APIs. Similar to Swagger, a drawback it has is manually updating the application's OpenAPI document via a YAML specification file or JSDoc comments.

zod-to-openapi and Scalar for REST API Documentation

asteasolutions/zod-to-openapi is a TypeScript library that generates OpenAPI specification from zod schemas. It provides typed methods which serve as guardrails for documenting API components instead of using code comments so that:

The library methods serve as guardrails for what to document and how to document it
The documentation is consistent across the codebase
The documentation doesn't negatively affect code readability

A sample snippet used to document a POST request for creating a user with zod-to-openapi is shown below:

// CreateUser and User are zod schema

registry.registerPath({
  method: "post",
  path: "/api/users",
  summary: "Create user",
  tags: ["users"],
  request: {
    body: {
      content: {
        "application/json": {
          schema: schema.CreateUser, 
        },
      },
      description: "Create user payload",
      required: true,
    },
  },
  responses: {
    201: {
      description: "User created",
      content: {
        "application/json": {
          schema: z.object({
            message: z.string(),
            data: schema.User,
          }),
        },
      },
    },
  },
});

Scalar is a tool that generates beautiful, organized and searchable API documentation from OpenAPI documents. The documentation generated also supports testing the APIs and this makes Scalar effectively function as an API documentation generator and a lightweight API client. The image below shows a sample documentation generated by Scalar:

How zod-to-openapi Works with Scalar

zod-to-openapi provides the functionality to generate an OpenAPI specification from code. Scalar uses the document generated to create a documentation web page that presents the information in the document in an organized and beautiful way that also allows for testing the APIs.

Benefits of Using zod-to-openapi and Scalar to Create REST API Documentation

When you combine zod-to-openapi and Scalar to create REST API documentation for your Express applications, you get a myriad of benefits. Some of the benefits are explained below:

OpenAPI Specification Support

The OpenAPI specification is a format for describing REST APIs. It takes takes into consideration the important components of an API necessary for clients to use it effectively. These components include:

Request paths, methods, headers, path parameters and query parameters,
Schemas of request and response payloads,
Authentication requirements,
Descriptions for information not accommodated by other components

zod-to-openapi provides methods for all of these to be included in the documentation and it generates an OpenAPI specification-compliant document that Scalar uses to generate the documentation web page.

Open Source and Free to Use

zod-to-openapi is open source and free to use. It has no plans to be unsupported or sunsetted soon because like Ruby on Rails and Laravel, the creators of the project use it in their day-to-day work.

Scalar is open source too. It has paid plans but the features in the paid plans are only really useful for enterprise applications. The free version supports the necessary features needed to create useful REST API documentation.

Better Documentation Experience

In terms of user experience, the union of zod-to-openapi and Scalar provides the following benefits when writing documentation with both tools:

Guardrails with zod-to-openapi Methods

The methods provided by zod-to-openapi serve as guardrails to ensure that developers don't omit or forget the documentation of important components of APIs. The methods also ensure that these components are documented in an OpenAPI specification-compliant manner through the typed nature of the methods' parameters.

Avoid the Clumsiness of Comments and YAML Files

With zod-to-openapi, you don't document the APIs using comments or a YAML file. You document APIs using methods from zod-to-openapi. This removes the cluttering of code with comments and the clumsiness around manually updating large YAML files of OpenAPI specification.

Accuracy and Auto-generation of Documentation

When you use zod-to-openapi and Scalar, your API documentation is generated automatically when the application runs. zod-to-openapi does the collation and compilation of the documented APIs, and Scalar creates a web page for it that can be hosted on an API route of the same application. You don't need to manually run CLI commands to generate the documentation.

Another benefit of accuracy and auto-generation is that the job of API documentation is not split between technical writer and backend developer. The documentation is in the code and this makes development faster and more seamless in terms of API documentation.

Developer-friendly UI

While Swagger’s UI is functional, some developers consider its presentation somewhat minimal, particularly when displaying detailed endpoint descriptions. ReDoc improves on visual design but does not offer API testing features. Scalar, on the other hand, delivers a more refined and intuitive interface with greater customization options than ReDoc.

Beyond its design advantages, Scalar provides auto-generated code samples in multiple programming languages. This enables developers to integrate APIs more efficiently using examples tailored to their specific tech stack.

Scalar's UI provides a search feature that allows developers to quickly locate specific sections of the API documentation. It also includes an AI chat interface that enables users to understand how different API endpoints can help address their specific use cases. This approach is more efficient than manually reviewing the entire documentation.

Lastly, you can test the API endpoints from the UI. When you make authenticated API requests, the UI caches authentication tokens so that you don't have to type or paste them for subsequent requests.

Markdown Support

zod-to-openapi and Scalar have Markdown support. With Markdown, you can include conceptual documentation and more information about API endpoints that are not supported by the default documentation components like headers and the request body.

You can embed images, include tables and format text in the documentation. You can use Markdown to include notes that explain concepts related to the API.

How to Create the API Documentation

In this section, you will create an Express CRUD API project that uses zod-to-openapi and Scalar to document its APIs. To practice along, clone the Express starter project from GitHub at orimdominic/freeCodeCamp-zod-to-openapi-scalar.

Set up the Project

After cloning the project:

install its dependencies using your preferred Node.js package manager
start the server using the serve script

# Install dependencies
npm install

# Start the application
npm run serve

You should see the following output on the terminal if the application runs successfully:

> freecodecamp-zod-to-openapi-scalar@1.0.0 serve
> node --experimental-strip-types --watch src/index.ts

Listening on :3000

The project has two modules - Users and Pets.

The router configuration for each module is defined in the router.ts file, while the route controllers are located in the controllers.ts file within each module's folder under src/modules. The controllers do not contain business logic, they simply respond with JSON values generated by the Faker library.

How to Set Up zod-to-openapi

Install asteasolutions/zod-to-openapi using your preferred Node.js package manager. If you use npm, run the code snippet below in your terminal:

npm install @asteasolutions/zod-to-openapi

After the installation, create a folder called lib (library) in the src folder. In the lib folder, create a file called openapi.ts. The file will house the code that sets up zod-to-openapi for collating the API documentation and generating the OpenAPI specification.

Copy and paste the code snippet below into src/lib/openapi.ts:

import z from "zod";
import {
  extendZodWithOpenApi,
  OpenApiGeneratorV31,
  OpenAPIRegistry,
} from "@asteasolutions/zod-to-openapi";

extendZodWithOpenApi(z);

export const registry = new OpenAPIRegistry();

export const bearerAuth = registry.registerComponent("securitySchemes", "bearerAuth", {
  type: "http",
  scheme: "bearer",
  bearerFormat: "JWT",
});

export function generateOpenAPIDocument() {
  const generator = new OpenApiGeneratorV3(registry.definitions);

  return generator.generateDocument({
    openapi: "3.1.0",
    info: {
      title: "Users API",
      version: "1.0.0",
      description: `Backend API documentation for users application.`,
    },
    tags: [
      {
        name: "users",
        description: "For operations carried out by admin users",
      },
    ],
    servers: [
      {
        url: "http://localhost:3000",
        description: "Local server",
      },
    ],
  });
}

zod-to-openapi v8 requires zod v4. If you use zod v3, you should use v7.3.4 of zod-to-openapi.

extendZodWithOpenApi is a method provided by zod-to-openapi that enhances Zod schemas by adding an openapi method. The openapi method allows you to attach additional documentation to request payloads, responses, parameters, and their properties, which are then displayed in the API documentation rendered by Scalar.

It is important to call extendZodWithOpenApi before loading any files that use the openapi method, otherwise accessing openapi on Zod objects will result in errors.

An alternative is to use the meta method on zod v4 schemas for the additional documentation. For example, schemaOne and schemaTwo in the code snippet below are the same:

const schemaOne = z
  .string()
  .openapi({ description: 'Name of the user', example: 'Test' });

const schemaTwo = z
  .string()
  .meta({description: 'Name of the user', example: 'Test' });

The meta method supports all metadata information that you'd normally pass to openapi and will produce exactly the same results.

The OpenAPIRegistry is a utility that is used to collate API documentation which would later be passed to an OpenAPI specification generator. registry is created from OpenAPIRegistry, exported, and used to document API endpoints and components in modules where it is imported.

export const registry = new OpenAPIRegistry();

bearerAuth is a component created by the registry to represent JWT authentication. When bearerAuth is included in the documentation of an endpoint, the UI renders an input for submitting an authentication token for authenticated requests as shown in the image below.

In the registerComponent method, "securitySchemes" registers a security scheme component. "bearerAuth" , the first argument of registerComponent, is the name given to the component and it can be changed to a name that you prefer. It appears in the top right of the authentication token input, shown in the image above. The third input to registerComponent is an object that defines the component.

When generateOpenAPIDocument function is executed, it collates all the registry API definitions in the project, generates the OpenAPI specification through generator.generateDocument, and returns the specification as JSON.

The tags property in generator.generateDocument organizes API endpoints into sections on the documentation UI. For example, all API endpoints with the Users tag in their registry definition will be placed under the Users section of the UI. description can be written in Markdown within template literals.

The servers property is a collection of the servers connected to the application. If you have multiple servers, you have the option of selecting what server to use for the base URL in the documentation UI for making API requests from it.

With this setup in place, when endpoints are documented with the registry, generateOpenAPIDocument will have an OpenAPI specification to return.

How to Generate the Documentation UI with Scalar

In this section, you will set up Scalar and connect it to the return value of generateOpenAPIDocument. You will also connect Scalar with an Express route, allowing the application to serve the documentation UI at that route.

Scalar has an Express API reference library that makes it easier for you to connect it with the OpenAPI specification and Express. Install scalar/express-api-reference using your preferred Node.js package manager. If you use npm, use the snippet below:

npm install @scalar/express-api-reference

Copy and paste the code snippet below into src/app.js:

import express from "express";
import router from "./router.ts";
import { generateOpenAPIDocument } from "./lib/openapi.ts";
import { apiReference } from "@scalar/express-api-reference";

const app = express();
app.use(express.json(), express.urlencoded({ extended: true }));

app.get("/", function (req, res) {
  return res.send("OK");
});

app.use("/api", router);

const apiDocJsonContent = generateOpenAPIDocument();

app.use(
  "/docs", // documentation route
  apiReference({
    content: apiDocJsonContent,
    title: "Users API",
    pageTitle: "Users API",
  }),
);

export default app;

In the code snippet above, generateOpenAPIDocument is imported from src/lib/openapi.ts, and apiReference is imported from @scalar/express-api-reference. When executed, generateOpenAPIDocument returns the OpenAPI specification, which is stored in apiDocJsonContent for caching to improve perfoemance.

A GET /docs route is then created, with the Scalar apiReference function acting as the controller. It accepts apiDocJsonContent and returns a web page whenever the GET /docs route is accessed.

With this setup in place, run the application using npm run serve and visit the documentation page at http://localhost:3000/docs in your browser. You should see a user interface similar to the image below:

To view the codebase at this point, run git checkout set-up-openapi-scalar .

Document the Endpoints

You have set up zod-to-openapi and connected it with Scalar. You have also hooked it up with a route in the backend application. In this section, you will write code to document the endpoints in the application for generating the OpenAPI specification and rendering it on the documentation UI.

To document the route for creating users ( POST /api/users ), in src/modules/users/router.ts , import registry, the schemas and zod using the snippet below:

import z from "zod";
import { registry } from "../../lib/openapi.ts";
import { 
    UserSchema, 
    UserListItemSchema,
    UpdateUserSchema, 
    CreateUserSchema, 
} from "./types.ts";

Copy and paste the code below above the create user route to document the create user endpoint:

registry.registerPath({
  method: "post",
  path: "/api/users",
  summary: "Create user",
  tags: ["users"],
  request: {
    body: {
      content: {
        "application/json": {
          schema: CreateUserSchema,
        },
      },
      description: "Create user payload",
      required: true,
    },
  },
  responses: {
    201: {
      description: "User created",
      content: {
        "application/json": {
          schema: z.object({
            message: z.string().openapi({ example: "User created" }),
            data: UserSchema,
          }),
        },
      },
    },
  },
});

Visit the documentation page and you will find see a web page similar to the image below:

The UI result of some of the input fields of registry.registerPath have been labelled in the image above. The description in the API endpoint is italicised because its value is Markdown in a template string.

By registering the route path for creating users with registry.registerPath and filling its values, you added the documentation of the route to the registry definitions and that makes it included in the OpenAPI specification.

To test the endpoint from the documentation UI:

click the Test Request button
fill in the payload in the dialog that appears and
click the Send button

To document the get user by id route (GET /api/users/:id ), import bearerAuth from src/lib/openapi.ts, copy the code snippet below and paste it above the get user by id route definition.

registry.registerPath({
  method: "get",
  path: "/api/users/{userId}",
  summary: "Get user details by id",
  tags: ["Users"],
  security: [{ [bearerAuth.name]: [] }],
  request: {
    params: z.object({ userId: z.int() }),
  },
  responses: {
    200: {
      description: "User retrieved",
      content: { "application/json": { schema: UserSchema } },
    },
  },
});

When the request.params field is defined using a Zod object, it generates an input UI on the documentation web page that enables users to provide values for path parameters such as userId, highlighted in the image below:

The complete code for the documentation of all endpoints in this section can be accessed when you check out the complete-project branch by running git checkout complete-project in your terminal. It contains documentation for the endpoint for uploading user photo, which demonstrates how to document endpoints that accept file uploads.

How to Use Scalar with NestJS

Scalar has a library that integrates with NestJS. You can use supply the Swagger document created by swagger/nestjs to the Scalar NestJS integration library to generate the Scalar documentation UI.

In root folder of your NestJS project, install the Scalar NestJS integration library:

npm install @scalar/nestjs-api-reference

Update the main.ts file of your NestJS project with the code snippet below:

import { NestFactory } from '@nestjs/core';
import { apiReference } from '@scalar/nestjs-api-reference';
import { DocumentBuilder, SwaggerModule } from '@nestjs/swagger';
import { AppModule } from './app.module';

async function bootstrap() {
  const app = await NestFactory.create(AppModule);

  const options = new DocumentBuilder()
    .setTitle('Cats example')
    .setDescription('The cats API description')
    .setVersion('1.0')
    .addTag('cats')
    .addBearerAuth()
    .build();

  const openApiSpecification = SwaggerModule.createDocument(app, options);
  
  // integrate the documentation with NestJS
  app.use(
    '/api/docs', // documentation route
    apiReference({
      content: openApiSpecification,
    }),
  )

  await app.listen(3000);
  console.log(`Application is running on: ${await app.getUrl()}`);
}

bootstrap();

With this setup in place, you can visit the /api/docs route in your browser to view the Scalar documentation for your NestJS application.

How to Resolve Content Security Policy (CSP) Errors When Used with Helmet

If you use Helmet in your Express or NestJS project, you will encounter CSP errors when you try to render the Scalar documentation UI. To resolve the errors, update the Helmet CSP configuration in your code to have the value of the object in the code snippet below:

{
    directives: {
      defaultSrc: [`'self'`],
      styleSrc: [`'self'`, `'unsafe-inline'`],
      imgSrc: [`'self'`, 'data:', 'validator.swagger.io'],
      scriptSrc: ['self', 'https:', 'unsafe-inline'],
    },
  }

Absence of AsyncAPI Documentation Feature

At the time of writing, Scalar does not fully support rendering AsyncAPI specifications for event-driven architecture APIs, although it is currently under development. You can track the progress of its development through the GitHub issue linked in the documentation to stay informed about its release.

Conclusion

You have learned about zod-to-openapi and how it makes it easier for you to generate an OpenAPI specification for your REST APIs than writing comments or large YAML files. You also learned how to use the specification document generated to render a beautiful API documentation UI which also functions as a lightweight API client. Endeavour to implement it in your projects that need a documentation uplift.

Feel free to connect with me on LinkedIn for questions or clarifications. Thank you for reading this far and I hope this helps you achieve what you intended to achieve. Don’t hesitate to share this article if you feel that it would help someone else out there. Cheers!

How to Build an AI-Powered RAG Search Application with Next.js, Supabase, and OpenAI

Mayur Vekariya — Tue, 27 Jan 2026 17:21:37 +0000

In this tutorial, you'll learn how to build a complete RAG (Retrieval-Augmented Generation) search application from scratch. Your application will allow users to upload documents, store them securely, and search through them using AI-powered semantic search.

By the end of this guide, you'll have a fully functional application that can:

Upload and process PDF, DOCX, and TXT files
Store documents in Supabase Storage
Generate embeddings using OpenAI
Perform semantic search across document chunks
Provide AI-generated answers based on document content
View and manage uploaded documents

This is a production-ready solution that you can deploy and use immediately.

What You'll Learn
Prerequisites
Understanding the Technologies
Project Overview
Step 1: Create Your Next.js Project
Step 2: Install Required Dependencies
Step 3: Set Up Your Supabase Project
Step 4: Configure Environment Variables
Step 5: Create the Upload API Route
Step 6: Create the RAG Search API Route
Step 7: Create the Documents API Route
Step 8: Create the Upload Modal Component
Step 9: Create the PDF Viewer Modal Component
Step 10: Create the Navigation Component
Step 11: Create the Home Page (Search Interface)
Step 12: Create the Documents Page
Step 13: Test Your Application
Step 14: Deploy Your Application
How RAG Search Works
Troubleshooting Common Issues
Next Steps
Conclusion

What You'll Learn

In this handbook, you'll learn how to:

Set up a Next.js application with TypeScript
Configure Supabase for database and file storage
Integrate OpenAI embeddings and chat completions
Implement document text extraction and chunking
Build a vector search system using PostgreSQL
Create a modern UI with React components
Handle file uploads and storage
Implement RAG (Retrieval-Augmented Generation) search

Prerequisites

Before you begin, make sure you have:

Node.js 18 or higher installed on your computer
A Supabase account (free tier works fine)
An OpenAI API key
Basic knowledge of React and TypeScript
Familiarity with Next.js (helpful but not required)

Understanding the Technologies

Before we dive into building the application, you should understand the key technologies and concepts you'll be working with:

What is RAG (Retrieval-Augmented Generation)?

RAG is an AI pattern that combines information retrieval with text generation. Instead of relying solely on an AI model's training data, RAG retrieves relevant information from your own documents. It then uses that information as context to generate accurate, up-to-date answers. This approach gives you:

Accuracy: Answers are based on your actual documents, not just the AI's training data
Transparency: You can see which document sections were used to generate the answer
Efficiency: Only relevant document chunks are used, reducing token costs

What are Embeddings and Vector Database?

Embeddings are numerical representations of text that capture semantic meaning. When you convert text to an embedding, similar meanings are represented by similar numbers. For example, "dog" and "puppy" would have similar embeddings. Meanwhile, "dog" and "airplane" would have very different ones.

OpenAI's embedding models convert text into vectors. These are arrays of numbers that can be compared mathematically. This allows you to find documents that are semantically similar to a search query. You can find matches even if they don't contain the exact same words.

A vector database is a specialized database designed to store and search through embeddings efficiently. Instead of searching for exact text matches, vector databases use mathematical operations. They use operations like cosine similarity to find the most semantically similar content.

In this tutorial, you'll use Supabase's PostgreSQL database with the pgvector extension. This extension adds vector storage and similarity search capabilities to PostgreSQL. This lets you store embeddings alongside your regular database data. You can also perform fast similarity searches.

What is Text Chunking?

Text chunking is the process of breaking large documents into smaller, manageable pieces. This is necessary for several reasons.

First, AI models have token limits. These are maximum input sizes. Second, smaller chunks allow for more precise retrieval. Third, overlapping chunks ensure context isn't lost at boundaries.

You'll use LangChain's RecursiveCharacterTextSplitter. This tool intelligently splits text while trying to preserve sentence and paragraph boundaries.

What is Supabase?

Supabase is an open-source Firebase alternative. It provides several key features.

You get a PostgreSQL database, which is a powerful, open-source relational database. You also get storage, which is file storage similar to AWS S3. There are real-time features that provide real-time subscriptions to database changes. Finally, there's built-in user authentication.

For this project, you'll use Supabase's database to store document chunks and embeddings. You'll also use Supabase Storage to store the original uploaded files.

What is Tailwind CSS?

Tailwind CSS is a utility-first CSS framework that lets you style your application by applying pre-built utility classes directly in your HTML/JSX. Instead of writing custom CSS, you use classes like bg-blue-600, text-white, and rounded-lg to style elements.

You'll use Tailwind CSS in this project because it speeds up development by providing ready-made styling utilities. It also ensures consistent design across the application. Plus, it makes it easy to create responsive, modern UIs. Finally, it works seamlessly with Next.js.

Now that you understand the core concepts and tools we’ll be using, let's start building the application.

Project Overview

Your RAG search application will consist of:

Frontend: Next.js application with React components for uploading documents and searching
Backend API Routes: Next.js API routes for handling uploads, searches, and document management
Database: Supabase PostgreSQL with vector extension for storing embeddings
Storage: Supabase Storage for storing original files
AI Integration: OpenAI for generating embeddings and chat completions

The application will have two main pages:

Search Page: Where users can ask questions about their uploaded documents and get AI-generated answers
Documents Page: Where users can view all uploaded documents, upload new ones, preview files, and manage their document library

Let's start building!

If you ever get stuck on the source code, you can view it on GitHub here:

https://github.com/mayur9210/rag-search-app

Step 1: Create Your Next.js Project

Start by creating a new Next.js project with TypeScript. Open your terminal and run:

npx create-next-app@latest rag-search-app --typescript --tailwind --app

When prompted, choose the following options:

TypeScript: Yes
ESLint: Yes
Tailwind CSS: Yes
App Router: Yes (default)
Customize import alias: No

Navigate into your project directory:

cd rag-search-app

Now that your project is set up, you'll need to install the additional packages required for document processing, AI integration, and database operations.

Step 2: Install Required Dependencies

You'll need several packages for this project. You can install them using npm:

npm install @supabase/supabase-js @langchain/openai @langchain/textsplitters langchain openai mammoth pdf2json

Here's what each package does:

@supabase/supabase-js: Client library for interacting with Supabase (database and storage)
@langchain/openai: LangChain integration for OpenAI (helps with text processing)
@langchain/textsplitters: Text splitting utilities for chunking documents into smaller pieces
langchain: Core LangChain library (provides AI workflow tools)
openai: Official OpenAI SDK (for generating embeddings and chat completions)
mammoth: Converts DOCX files to plain text
pdf2json: Extracts text from PDF files

Install the TypeScript types for pdf2json:

npm install --save-dev @types/pdf-parse

With all dependencies installed, you're ready to set up your Supabase project, which will handle your database and file storage needs.

Step 3: Set Up Your Supabase Project

Create a Supabase Project

First, you’ll need to create a new Supabase project, which you can do by following these steps:

Go to supabase.com and sign in or create an account
Click "New Project"
Fill in your project details:
- Name: rag-search-app (or any name you prefer)
- Database Password: Choose a strong password (save this – you'll need it)
- Region: Select the region closest to you
Click "Create new project" and wait for it to be ready (this takes a few minutes)

Get Your Supabase Credentials

Once your project is ready, go to Settings and then API.

Copy the following values:

Project URL (this is your NEXT_PUBLIC_SUPABASE_URL)
anon public key (this is your NEXT_PUBLIC_SUPABASE_PUBLISHABLE_DEFAULT_KEY)
service_role key (this is your SUPABASE_SERVICE_ROLE_KEY)

Important: Keep your service role key secret. Never expose it in client-side code. It bypasses Row-Level Security (RLS) policies, which is necessary for server-side file uploads but should never be used in browser code.

Set Up the Database Schema

Now you'll set up the database structure to store your documents and embeddings. Go to SQL Editor in your Supabase dashboard and run the following SQL:

-- Enable the vector extension for embeddings
-- This extension allows PostgreSQL to store and search vector data efficiently
CREATE EXTENSION IF NOT EXISTS vector;

-- Create the documents table
-- This table stores document chunks, their metadata, and embeddings
CREATE TABLE documents (
  id BIGSERIAL PRIMARY KEY,
  content TEXT NOT NULL,
  metadata JSONB,
  embedding vector(1536)  -- OpenAI's text-embedding-3-small produces 1536-dimensional vectors
  file_path text null,
  file_url text null,
);

-- Create an index on the embedding column for faster similarity search
-- The ivfflat index speeds up vector similarity queries
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops);

-- Create a function for matching documents based on similarity
-- This function finds the most similar document chunks to a query embedding
CREATE OR REPLACE FUNCTION match_documents(
  query_embedding vector(1536),
  match_threshold float,
  match_count int
)
RETURNS TABLE (
  id bigint,
  content text,
  metadata jsonb,
  similarity float
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    documents.id,
    documents.content,
    documents.metadata,
    1 - (documents.embedding <=> query_embedding) AS similarity
  FROM documents
  WHERE 1 - (documents.embedding <=> query_embedding) > match_threshold
  ORDER BY documents.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

This SQL does the following:

Enables the vector extension: This adds vector storage and similarity search capabilities to PostgreSQL
Creates the documents table: Stores document chunks, metadata (file name, type, and so on), and their embeddings
Creates an index: Speeds up similarity searches on the embedding column
Creates a match function: Finds the most similar document chunks to a query embedding using cosine similarity

The <=> operator calculates cosine distance between vectors. A smaller distance means more similar content.

Set Up Supabase Storage

You’ll need a storage bucket to store uploaded files. This is separate from the database and holds the original PDF, DOCX, and TXT files.

To set up your storage bucket:

Go to Storage in your Supabase dashboard
Click New bucket
Name it documents
Set it to Public (this allows file downloads)
Click Create bucket

If you prefer a private bucket, you can use the service role key for server-side operations, which bypasses Row-Level Security policies. For this tutorial, a public bucket is simpler and works well.

Now that your Supabase project is configured, you'll set up your environment variables to connect your Next.js application to Supabase and OpenAI.

Step 4: Configure Environment Variables

Create a .env.local file in your project root:

NEXT_PUBLIC_SUPABASE_URL=your_supabase_project_url
NEXT_PUBLIC_SUPABASE_PUBLISHABLE_DEFAULT_KEY=your_supabase_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_role_key
OPENAI_API_KEY=your_openai_api_key

Replace the placeholder values with your actual credentials:

Get Supabase values from Settings → API in your Supabase dashboard
Get your OpenAI API key from platform.openai.com/api-keys

Security Note: Never commit .env.local to version control. It's already in .gitignore by default, but double-check to ensure your secrets stay secure.

With your environment configured, you're ready to start building the API routes that will handle file uploads, searches, and document management.

Step 5: Create the Upload API Route

Now you'll create the API route that handles file uploads. This route will process uploaded files, extract their text, split them into chunks, generate embeddings, and store everything in your database and storage.

Create src/app/api/upload/route.ts:

import { createClient } from '@supabase/supabase-js';
import OpenAI from 'openai';
import { NextResponse } from 'next/server';
import { RecursiveCharacterTextSplitter } from '@langchain/textsplitters';
import mammoth from 'mammoth';

const url = process.env.NEXT_PUBLIC_SUPABASE_URL!;
const anonKey = process.env.NEXT_PUBLIC_SUPABASE_PUBLISHABLE_DEFAULT_KEY!;
const serviceKey = process.env.SUPABASE_SERVICE_ROLE_KEY;
const supabaseStorage = createClient(url, serviceKey || anonKey);
const supabase = createClient(url, anonKey);
const openai = new OpenAI();

function safeDecodeURIComponent(str: string): string {
  try { 
    return decodeURIComponent(str); 
  } catch { 
    try { 
      return decodeURIComponent(str.replace(/%/g, '%25')); 
    } catch { 
      return str; 
    } 
  }
}

async function extractTextFromFile(file: File): Promise<string> {
  const buffer = Buffer.from(await file.arrayBuffer());
  const fileName = file.name.toLowerCase();

  if (fileName.endsWith('.pdf')) {
    const PDFParser = (await import('pdf2json')).default;
    return new Promise((resolve, reject) => {
      const pdfParser = new (PDFParser as any)(null, true);
      pdfParser.on('pdfParser_dataError', (err: any) => 
        reject(new Error(`PDF parsing error: ${err.parserError}`))
      );
      pdfParser.on('pdfParser_dataReady', (pdfData: any) => {
        try {
          let fullText = '';
          pdfData.Pages?.forEach((page: any) => 
            page.Texts?.forEach((text: any) => 
              text.R?.forEach((r: any) => 
                r.T && (fullText += safeDecodeURIComponent(r.T) + ' ')
              )
            )
          );
          resolve(fullText.trim());
        } catch (error: any) {
          reject(new Error(`Error extracting text: ${error.message}`));
        }
      });
      pdfParser.parseBuffer(buffer);
    });
  } else if (fileName.endsWith('.docx')) {
    const result = await mammoth.extractRawText({ buffer });
    return result.value;
  } else if (fileName.endsWith('.txt')) {
    return buffer.toString('utf-8');
  } else {
    throw new Error('Unsupported file type. Please upload PDF, DOCX, or TXT files.');
  }
}

export async function POST(req: Request) {
  try {
    const file = (await req.formData()).get('file') as File;
    if (!file) {
      return NextResponse.json({ error: 'No file provided' }, { status: 400 });
    }

    const documentId = crypto.randomUUID();
    const uploadDate = new Date().toISOString();
    const filePath = `${documentId}.${file.name.split('.').pop() || 'bin'}`;

    // Upload file to Supabase Storage
    const fileBuffer = Buffer.from(await file.arrayBuffer());
    const { error: storageError } = await supabaseStorage.storage
      .from('documents')
      .upload(filePath, fileBuffer, {
        contentType: file.type || 'application/octet-stream',
        upsert: false,
      });

    if (storageError) {
      const msg = storageError.message || 'Unknown storage error';
      if (msg.includes('row-level security') || msg.includes('RLS')) {
        return NextResponse.json({ 
          success: false, 
          error: `Storage RLS error: ${msg}. Ensure SUPABASE_SERVICE_ROLE_KEY is set.` 
        }, { status: 500 });
      }
      return NextResponse.json({ 
        success: false, 
        error: `Failed to store file: ${msg}` 
      }, { status: 500 });
    }

    // Get public URL for the file
    const { data: urlData } = supabaseStorage.storage
      .from('documents')
      .getPublicUrl(filePath);

    // Extract text from file
    const text = await extractTextFromFile(file);
    if (!text || text.trim().length === 0) {
      return NextResponse.json({ 
        error: 'Could not extract text from file' 
      }, { status: 400 });
    }

    // Split text into chunks
    // Chunk size of 800 characters with 100-character overlap ensures
    // we don't lose context at chunk boundaries
    const textSplitter = new RecursiveCharacterTextSplitter({
      chunkSize: 800,
      chunkOverlap: 100,
    });
    const chunks = await textSplitter.splitText(text);

    // Process each chunk: generate embedding and store in database
    for (let i = 0; i < chunks.length; i++) {
      const chunk = chunks[i];

      // Generate embedding using OpenAI
      // This converts the text chunk into a 1536-dimensional vector
      const emb = await openai.embeddings.create({
        model: 'text-embedding-3-small',
        input: chunk,
      });

      // Store chunk with embedding in database
      const { error } = await supabase.from('documents').insert({
        content: chunk,
        metadata: { 
          source: file.name,
          document_id: documentId,
          file_name: file.name,
          file_type: file.type || file.name.split('.').pop(),
          file_size: file.size,
          upload_date: uploadDate,
          chunk_index: i,
          total_chunks: chunks.length,
          file_path: filePath,
          file_url: urlData.publicUrl,
        },
        embedding: JSON.stringify(emb.data[0].embedding),
      });

      if (error) {
        return NextResponse.json({ 
          success: false, 
          error: error.message 
        }, { status: 500 });
      }
    }

    return NextResponse.json({ 
      success: true, 
      documentId, 
      fileName: file.name, 
      chunks: chunks.length, 
      textLength: text.length, 
      fileUrl: urlData.publicUrl 
    });
  } catch (error: any) {
    return NextResponse.json({ 
      success: false, 
      error: error.message || 'Failed to process file' 
    }, { status: 500 });
  }
}

This route handles the complete upload workflow:

Receives the file from the client via FormData
Generates a unique document ID using crypto.randomUUID()
Uploads the file to Supabase Storage for safekeeping
Extracts text based on file type (PDF, DOCX, or TXT)
Splits the text into chunks of 800 characters with 100-character overlap
Generates embeddings for each chunk using OpenAI's embedding model
Stores each chunk with its embedding and metadata in the database

The overlap between chunks ensures that if a sentence or concept spans a chunk boundary, it won't be lost. Now that you can upload and process documents, let's create the search functionality.

Step 6: Create the RAG Search API Route

This route implements the core RAG functionality: it takes a user's query, finds the most relevant document chunks, and uses them to generate an accurate answer.

Create src/app/api/search/route.ts:

import { createClient } from '@supabase/supabase-js';
import OpenAI from 'openai';
import { NextResponse } from 'next/server';

const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.NEXT_PUBLIC_SUPABASE_PUBLISHABLE_DEFAULT_KEY!
);
const openai = new OpenAI();

export async function POST(req: Request) {
  try {
    const { query } = await req.json();

    // Generate embedding for the user's query
    // This converts the search query into the same vector space as document chunks
    const emb = await openai.embeddings.create({ 
      model: 'text-embedding-3-small', 
      input: query 
    });

    // Find similar documents using vector similarity search
    // The match_documents function finds the 5 most similar chunks
    const { data: results, error } = await supabase.rpc('match_documents', {
      query_embedding: JSON.stringify(emb.data[0].embedding),
      match_threshold: 0.0,  // Accept any similarity (you can increase this for stricter matching)
      match_count: 5,        // Return top 5 most similar chunks
    });

    if (error) {
      return NextResponse.json({ error: error.message }, { status: 500 });
    }

    // Combine retrieved chunks into context
    // These chunks will be used as context for the AI to generate an answer
    const context = results?.map((r: any) => r.content).join('\n---\n') || '';

    // Generate answer using OpenAI with retrieved context
    // This is the "Generation" part of RAG
    const completion = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [
        { 
          role: 'system', 
          content: 'You are a helpful assistant. Use the provided context to answer questions. If the answer is not in the context, say you do not know.' 
        },
        { 
          role: 'user', 
          content: `Context: ${context}\n\nQuestion: ${query}` 
        }
      ],
    });

    return NextResponse.json({ 
      answer: completion.choices[0].message.content, 
      sources: results 
    });
  } catch (error: any) {
    return NextResponse.json({ error: error.message }, { status: 500 });
  }
}

This route implements the RAG pattern. Here's how the complete RAG workflow works:

Converts the query to an embedding: The user's question is transformed into the same vector space as your document chunks. This uses the same embedding model (text-embedding-3-small) that processed the documents, ensuring they're in the same "vector space."
Searches for similar chunks: Uses the match_documents function to find the 5 most semantically similar document chunks. This uses cosine similarity on the embeddings. Cosine similarity measures the angle between vectors - smaller angles mean more similar content, even if the exact words differ.
Uses chunks as context: The retrieved chunks are passed to GPT-4o-mini as context. These chunks contain the most relevant information from your documents.
Generates an answer: The AI model generates an answer based on the provided context. The system prompt instructs the AI to only answer based on the provided context, ensuring accuracy and preventing hallucinations.
Returns results: Both the answer and source chunks are returned so users can verify the information.

This RAG approach gives you several benefits. First, you get accuracy because answers are based on your actual documents, not just the AI's training data. Second, you get transparency because you can see which document chunks were used to generate each answer. Third, you get efficiency because only relevant chunks are used, which reduces token usage and costs. Finally, you get up-to-date information because you can update your knowledge base by uploading new documents without retraining the AI.

Now let's create the API route for managing documents.

Step 7: Create the Documents API Route

This route handles listing, viewing, downloading, and deleting documents. It serves multiple purposes depending on the query parameters.

Create src/app/api/documents/route.ts:

import { createClient } from '@supabase/supabase-js';
import { NextResponse } from 'next/server';

const url = process.env.NEXT_PUBLIC_SUPABASE_URL!;
const anonKey = process.env.NEXT_PUBLIC_SUPABASE_PUBLISHABLE_DEFAULT_KEY!;
const serviceKey = process.env.SUPABASE_SERVICE_ROLE_KEY || anonKey;
const supabase = createClient(url, anonKey);
const supabaseStorage = createClient(url, serviceKey);

export async function GET(req: Request) {
  try {
    const reqUrl = new URL(req.url);
    const id = reqUrl.searchParams.get('id');
    const file = reqUrl.searchParams.get('file') === 'true';
    const view = reqUrl.searchParams.get('view') === 'true';

    // Handle file download/view
    if (id && file) {
      const { data: documents } = await supabase
        .from('documents')
        .select('metadata')
        .eq('metadata->>document_id', id)
        .limit(1);

      if (!documents || documents.length === 0) {
        return NextResponse.json({ error: 'Document not found' }, { status: 404 });
      }

      const meta = documents[0].metadata;
      const fileName = meta?.file_name || 'document';
      const fileType = meta?.file_type || 'application/octet-stream';
      const filePath = meta?.file_path || `${id}.${fileName.split('.').pop() || 'pdf'}`;

      const { data: fileData, error: downloadError } = await supabaseStorage.storage
        .from('documents')
        .download(filePath);

      if (downloadError || !fileData) {
        return NextResponse.json({ 
          error: downloadError?.message || 'File not stored' 
        }, { status: 404 });
      }

      const buffer = Buffer.from(await fileData.arrayBuffer());
      if (buffer.length === 0) {
        return NextResponse.json({ error: 'File is empty' }, { status: 500 });
      }

      const isPDF = fileType === 'application/pdf' || fileName.toLowerCase().endsWith('.pdf');
      return new NextResponse(new Uint8Array(buffer), {
        headers: {
          'Content-Type': fileType,
          'Content-Disposition': (view && isPDF) 
            ? `inline; filename="${fileName}"` 
            : `attachment; filename="${fileName}"`,
          'Content-Length': buffer.length.toString(),
          ...(view && isPDF ? { 'X-Content-Type-Options': 'nosniff' } : {}),
        },
      });
    }

    // Get single document with text content
    if (id) {
      const { data: chunks, error } = await supabase
        .from('documents')
        .select('content, metadata')
        .eq('metadata->>document_id', id)
        .order('metadata->>chunk_index', { ascending: true });

      if (error || !chunks || chunks.length === 0) {
        return NextResponse.json({ error: 'Document not found' }, { status: 404 });
      }

      const m = chunks[0].metadata || {};
      return NextResponse.json({
        id,
        file_name: m.file_name || 'Unknown',
        file_type: m.file_type || 'unknown',
        file_size: m.file_size || 0,
        upload_date: m.upload_date || new Date().toISOString(),
        total_chunks: chunks.length,
        fullText: chunks.map((c: any) => c.content).join('\n\n'),
        file_url: m.file_url,
        file_path: m.file_path
      });
    }

    // List all documents
    const { data: documents, error } = await supabase
      .from('documents')
      .select('metadata');

    if (error) {
      return NextResponse.json({ error: error.message }, { status: 500 });
    }

    // Deduplicate documents by document_id
    // Since each document is split into multiple chunks, we need to group them
    const map = new Map();
    documents?.forEach((doc: any) => {
      const m = doc.metadata;
      if (m?.document_id && !map.has(m.document_id)) {
        map.set(m.document_id, {
          id: m.document_id,
          file_name: m.file_name || 'Unknown',
          file_type: m.file_type || 'unknown',
          file_size: m.file_size || 0,
          upload_date: m.upload_date || new Date().toISOString(),
          total_chunks: m.total_chunks || 0,
          file_url: m.file_url,
          file_path: m.file_path,
        });
      }
    });

    return NextResponse.json({ documents: Array.from(map.values()) });
  } catch (error: any) {
    return NextResponse.json({ error: error.message }, { status: 500 });
  }
}

export async function DELETE(req: Request) {
  try {
    const id = new URL(req.url).searchParams.get('id');
    if (!id) {
      return NextResponse.json({ error: 'Document ID required' }, { status: 400 });
    }

    // Get file path from metadata
    const { data: docs } = await supabase
      .from('documents')
      .select('metadata')
      .eq('metadata->>document_id', id)
      .limit(1);

    const filePath = docs?.[0]?.metadata?.file_path;

    // Delete file from storage
    if (filePath) {
      await supabaseStorage.storage.from('documents').remove([filePath]);
    }

    // Delete all chunks from database
    const { error } = await supabase
      .from('documents')
      .delete()
      .eq('metadata->>document_id', id);

    if (error) {
      return NextResponse.json({ error: error.message }, { status: 500 });
    }

    return NextResponse.json({ success: true, fileDeleted: !!filePath });
  } catch (error: any) {
    return NextResponse.json({ error: error.message }, { status: 500 });
  }
}

This route handles:

GET without ID: Lists all documents (deduplicated since each document has multiple chunks)
GET with ID: Returns document details and full text (all chunks combined)
GET with ID and file=true: Downloads the original file from storage
DELETE with ID: Deletes the document and its file from both storage and database

Now that your API routes are complete, let's build the user interface components, starting with the upload modal.

The upload modal provides a user-friendly interface for selecting and uploading documents. It handles file selection, upload progress, and displays success or error messages.

Create src/app/components/UploadModal.tsx:

'use client';
import { useState, useEffect } from 'react';

interface UploadModalProps {
  isOpen: boolean;
  onClose: () => void;
  onUploadSuccess?: () => void;
}

export default function UploadModal({ isOpen, onClose, onUploadSuccess }: UploadModalProps) {
  const [file, setFile] = useStatenull>(null);
  const [uploading, setUploading] = useState(false);
  const [message, setMessage] = useState<{ type: 'success' | 'error'; text: string } | null>(null);

  useEffect(() => {
    document.body.style.overflow = isOpen ? 'hidden' : 'unset';
    if (!isOpen) { 
      setFile(null); 
      setMessage(null); 
    }
    return () => { 
      document.body.style.overflow = 'unset'; 
    };
  }, [isOpen]);

  const handleFileChange = (e: React.ChangeEvent) => {
    if (e.target.files && e.target.files[0]) {
      setFile(e.target.files[0]);
      setMessage(null);
    }
  };

  const handleUpload = async () => {
    if (!file) {
      setMessage({ type: 'error', text: 'Please select a file' });
      return;
    }

    setUploading(true);
    setMessage(null);

    try {
      const formData = new FormData();
      formData.append('file', file);

      const res = await fetch('/api/upload', {
        method: 'POST',
        body: formData,
      });

      const data = await res.json();

      if (data.success) {
        setMessage({
          type: 'success',
          text: `File "${data.fileName}" uploaded successfully! Processed ${data.chunks} chunks.`,
        });
        setFile(null);
        (document.getElementById('upload-file-input') as HTMLInputElement)?.setAttribute('value', '');
        setTimeout(() => { 
          onUploadSuccess?.(); 
          onClose(); 
        }, 1500);
      } else {
        setMessage({ type: 'error', text: data.error || 'Upload failed' });
      }
    } catch (error: any) {
      setMessage({ type: 'error', text: error.message || 'Upload failed' });
    } finally {
      setUploading(false);
    }
  };

  if (!isOpen) return null;

  return (
    "fixed inset-0 z-50 flex items-center justify-center bg-black bg-opacity-75 p-4"
      onClick={onClose}
    >
      "relative bg-white dark:bg-gray-900 rounded-lg shadow-xl w-full max-w-2xl max-h-[90vh] overflow-y-auto"
        onClick={(e) => e.stopPropagation()}
      >
        "flex items-center justify-between p-6 border-b border-gray-200 dark:border-gray-800">
          "text-2xl font-semibold text-gray-900 dark:text-gray-100">
            Upload Document
          
          
        

        "p-6">
          "mb-6">
            "upload-file-input" className="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-2">
              Select a file (PDF, DOCX, or TXT)
            
            "upload-file-input"
              type="file"
              accept=".pdf,.docx,.txt"
              onChange={handleFileChange}
              className="block w-full text-sm text-gray-500
                file:mr-4 file:py-2 file:px-4
                file:rounded-lg file:border-0
                file:text-sm file:font-semibold
                file:bg-blue-50 file:text-blue-700
                hover:file:bg-blue-100
                dark:file:bg-blue-900 dark:file:text-blue-300
                dark:hover:file:bg-blue-800"
            />
          

          {file && (
            "mb-6 p-4 bg-gray-50 dark:bg-gray-800 rounded-lg text-sm text-gray-600 dark:text-gray-400 space-y-1">
              "font-medium">Selected:</span> {file.name}p>
              
"font-medium">Size:</span> {(file.size / 1024).toFixed(2)} KB
              "font-medium">Type:</span> {file.type || file.name.split('.').pop()}p>
            
          )}

          

          {message && (
            `mt-6 p-4 rounded-lg ${
                message.type === 'success'
                  ? 'bg-green-50 text-green-800 dark:bg-green-900 dark:text-green-200'
                  : 'bg-red-50 text-red-800 dark:bg-red-900 dark:text-red-200'
              }`}
            >
              {message.text}
            
          )}

          "mt-8 p-4 bg-blue-50 dark:bg-blue-900/20 rounded-lg text-sm">
            "font-medium text-blue-900 dark:text-blue-200 mb-2">Supported: PDF, DOCX, TXT
            "text-blue-700 dark:text-blue-400">Files will be processed and embedded for RAG search.
          
        
      
    
  );
}

This component provides a clean interface for file uploads with proper error handling and user feedback. Next, let's create the PDF viewer component for previewing documents.

The PDF viewer modal allows users to preview PDFs and view extracted text from any document. It's particularly useful for verifying that documents were processed correctly.

Create src/app/components/PDFViewerModal.tsx:

'use client';
import { useEffect, useState } from 'react';

interface PDFViewerModalProps {
  isOpen: boolean;
  onClose: () => void;
  fileUrl: string;
  fileName: string;
  documentId?: string;
  isPDF?: boolean;
}

export default function PDFViewerModal({ 
  isOpen, 
  onClose, 
  fileUrl, 
  fileName, 
  documentId, 
  isPDF = true 
}: PDFViewerModalProps) {
  const [error, setError] = useState<string | null>(null);
  const [loading, setLoading] = useState(true);
  const [activeTab, setActiveTab] = useState<'preview' | 'content'>('preview');
  const [text, setText] = useState<string>('');
  const [textLoading, setTextLoading] = useState(false);
  const [textError, setTextError] = useState<string | null>(null);

  useEffect(() => {
    document.body.style.overflow = isOpen ? 'hidden' : 'unset';
    if (isOpen) { 
      setError(null); 
      setLoading(true); 
      setActiveTab(isPDF ? 'preview' : 'content'); 
      setText(''); 
      setTextError(null); 
    }
    return () => { 
      document.body.style.overflow = 'unset'; 
    };
  }, [isOpen, isPDF]);

  useEffect(() => {
    if (isOpen && documentId && activeTab === 'content' && !text && !textLoading && !textError) {
      fetchDocumentText();
    }
  }, [isOpen, documentId, activeTab, text, textLoading, textError]);

  useEffect(() => {
    if (isOpen && fileUrl && isPDF) {
      fetch(fileUrl, { method: 'GET', headers: { 'Accept': 'application/json' } })
        .then(async res => {
          if (res.headers.get('content-type')?.includes('application/json')) {
            const data = await res.json();
            throw new Error(data.error || 'File not available');
          }
          if (!res.ok) throw new Error(`Failed to load: ${res.status}`);
          setLoading(false);
        })
        .catch(err => {
          setError(err.message || 'Failed to load PDF');
          setLoading(false);
        });
    } else if (isOpen && !isPDF) {
      setLoading(false);
    }
  }, [isOpen, fileUrl, isPDF]);

  const fetchDocumentText = async () => {
    if (!documentId) return;
    setTextLoading(true); 
    setTextError(null);
    try {
      const res = await fetch(`/api/documents?id=${documentId}`);
      const data = await res.json();
      if (data.error) {
        setTextError(data.error);
      } else {
        setText(data.fullText || 'No text content available');
      }
    } catch (err) {
      setTextError(err instanceof Error ? err.message : 'Failed to fetch document text');
    } finally {
      setTextLoading(false);
    }
  };

  if (!isOpen) return null;

  return (
    "fixed inset-0 z-50 flex items-center justify-center bg-black bg-opacity-75 p-4"
      onClick={onClose}
    >
      "relative bg-white dark:bg-gray-900 rounded-lg shadow-xl w-full max-w-6xl h-[90vh] flex flex-col"
        onClick={(e) => e.stopPropagation()}
      >
        "flex flex-col border-b border-gray-200 dark:border-gray-800">
          "flex items-center justify-between p-4">
            "text-xl font-semibold text-gray-900 dark:text-gray-100 truncate flex-1 mr-4">
              {fileName}
            
            "flex items-center gap-2">
              
            
          

          {isPDF && (
            "flex border-t border-gray-200 dark:border-gray-800">
              {(['preview', 'content'] as const).map(tab => (
                
              ))}
            
          )}
        

        "flex-1 overflow-hidden">
          {isPDF && activeTab === 'preview' && (
            "h-full overflow-hidden">
              {error ? (
                "flex flex-col items-center justify-center h-full p-8">
                  "bg-yellow-50 dark:bg-yellow-900/20 border border-yellow-200 dark:border-yellow-800 rounded-lg p-6 max-w-md">
                    "text-lg font-semibold text-yellow-800 dark:text-yellow-200 mb-2">
                      PDF File Not Available
                    
                    "text-yellow-700 dark:text-yellow-300 mb-4">{error}
                    {documentId && (
                      
                    )}
                  
                
              ) : loading ? (
                "flex items-center justify-center h-full">
                  "text-gray-500 dark:text-gray-400">Loading PDF...
                
              ) : (
                `${fileUrl}${fileUrl.includes('?') ? '&' : '?'}view=true#toolbar=1&navpanes=0&scrollbar=1`}
                  className="w-full h-full border-0"
                  title={fileName}
                  allow="fullscreen"
                  onError={() => setError('Failed to load PDF')}
                />
              )}
            
          )}

          {(!isPDF || activeTab === 'content') && (
            "h-full overflow-auto p-6">
              {textLoading ? (
                "flex items-center justify-center h-full">
                  "text-gray-500 dark:text-gray-400">Loading...
                
              ) : textError ? (
                "bg-red-50 dark:bg-red-900/20 border border-red-200 dark:border-red-800 rounded-lg p-4">
                  "text-red-800 dark:text-red-200">Error: {textError}
                
              ) : (
                "space-y-4">
                  "text-sm text-gray-500 dark:text-gray-400">
                    Formatting may be inconsistent from source.
                  
                  "whitespace-pre-wrap text-sm text-gray-800 dark:text-gray-200 font-mono bg-gray-50 dark:bg-gray-800 p-4 rounded-lg">
                    {text || 'No text content available'}
                  
                
              )}
            
          )}
        
      
    
  );
}

This component provides a full-screen modal for viewing PDFs and extracted text, with tabs to switch between preview and text content. Now let's create a simple navigation component to tie everything together.

The navigation component provides easy access to the Search and Documents pages. It highlights the current page and provides a clean, consistent navigation experience.

Create src/app/components/Navigation.tsx:

'use client';
import Link from 'next/link';
import { usePathname } from 'next/navigation';

export default function Navigation() {
  const pathname = usePathname();

  const navItems = [
    { href: '/', label: 'Search' },
    { href: '/documents', label: 'Documents' },
  ];

  return (
    "border-b border-gray-200 dark:border-gray-800 mb-8">
      "max-w-7xl mx-auto px-4 sm:px-6 lg:px-8">
        "flex space-x-8">
          {navItems.map((item) => (
            `py-4 px-1 border-b-2 font-medium text-sm ${
                pathname === item.href
                  ? 'border-blue-500 text-blue-600 dark:text-blue-400'
                  : 'border-transparent text-gray-500 hover:text-gray-700 hover:border-gray-300 dark:text-gray-400 dark:hover:text-gray-300'
              }`}
            >
              {item.label}
            
          ))}
        
      
    
  );
}

With navigation in place, let's create the main search page where users can query their documents.

Step 11: Create the Home Page (Search Interface)

The search page is the main interface where users ask questions about their uploaded documents. It displays the AI-generated answers along with source citations, allowing users to verify the information.

Update src/app/page.tsx:

'use client';
import { useState } from 'react';
import Navigation from './components/Navigation';

export default function Home() {
  const [query, setQuery] = useState('');
  const [answer, setAnswer] = useState('');
  const [loading, setLoading] = useState(false);
  const [sources, setSources] = useState<any[]>([]);

  const handleSearch = async () => {
    if (!query.trim()) return;
    setLoading(true); 
    setAnswer(''); 
    setSources([]);
    try {
      const res = await fetch('/api/search', { 
        method: 'POST', 
        headers: { 'Content-Type': 'application/json' }, 
        body: JSON.stringify({ query }) 
      });
      const data = await res.json();
      if (data.error) {
        setAnswer(`Error: ${data.error}`);
      } else { 
        setAnswer(data.answer || 'No answer generated'); 
        setSources(data.sources || []); 
      }
    } catch (error: any) {
      setAnswer(`Error: ${error.message}`);
    } finally {
      setLoading(false);
    }
  };

  const handleKeyPress = (e: React.KeyboardEvent) => {
    if (e.key === 'Enter' && (e.metaKey || e.ctrlKey)) {
      handleSearch();
    }
  };

  return (
    "min-h-screen">
      
      "max-w-4xl mx-auto p-8">
        "text-3xl font-bold mb-6">RAG Search

        "bg-white dark:bg-gray-900 border border-gray-200 dark:border-gray-800 rounded-lg p-6 shadow-sm mb-6">
          "w-full p-4 border border-gray-300 dark:border-gray-700 rounded-lg shadow-sm bg-white dark:bg-gray-800 text-gray-900 dark:text-gray-100 resize-none focus:ring-2 focus:ring-blue-500 focus:border-transparent"</span>
            placeholder=<span class="hljs-string">"Ask a question about your uploaded documents..."</span>
            value={query}
            onChange={<span class="hljs-function">(<span class="hljs-params">e</span>) =></span> setQuery(e.target.value)}
            onKeyDown={handleKeyPress}
            rows={<span class="hljs-number">4</span>}
          />
          <button 
            onClick={handleSearch}
            className=<span class="hljs-string">"mt-4 bg-blue-600 text-white px-8 py-3 rounded-lg hover:bg-blue-700 disabled:bg-gray-400 disabled:cursor-not-allowed font-medium"</span>
            disabled={loading || !query.trim()}
          >
            {loading ? <span class="hljs-string">'Searching...'</span> : <span class="hljs-string">'Search'</span>}
          </button>
          <p className=<span class="hljs-string">"mt-2 text-sm text-gray-500 dark:text-gray-400"</span>>
            Press Cmd/Ctrl + Enter to search
          </p>
        </div>

        {answer && (
          <div className=<span class="hljs-string">"bg-white dark:bg-gray-900 border border-gray-200 dark:border-gray-800 rounded-lg p-6 shadow-sm mb-6"</span>>
            <h2 className=<span class="hljs-string">"text-xl font-semibold mb-3"</span>>Answer:</h2>
            <p className=<span class="hljs-string">"text-gray-800 dark:text-gray-200 leading-relaxed whitespace-pre-wrap"</span>>
              {answer}
            </p>
          </div>
        )}

        {sources && sources.length > <span class="hljs-number">0</span> && (
          <div className=<span class="hljs-string">"bg-white dark:bg-gray-900 border border-gray-200 dark:border-gray-800 rounded-lg p-6 shadow-sm"</span>>
            <h2 className=<span class="hljs-string">"text-xl font-semibold mb-3"</span>>Sources ({sources.length}):</h2>
            <div className=<span class="hljs-string">"space-y-3"</span>>
              {sources.map(<span class="hljs-function">(<span class="hljs-params">source, index</span>) =></span> (
                <div
                  key={index}
                  className=<span class="hljs-string">"p-4 bg-gray-50 dark:bg-gray-800 rounded-lg border border-gray-200 dark:border-gray-700"</span>
                >
                  <p className=<span class="hljs-string">"text-sm text-gray-600 dark:text-gray-400 mb-1"</span>>
                    <span className=<span class="hljs-string">"font-medium"</span>>Source:</span>{<span class="hljs-string">' '</span>}
                    {source.metadata?.source || source.metadata?.file_name || <span class="hljs-string">'Unknown'</span>}
                  </p>
                  <p className=<span class="hljs-string">"text-sm text-gray-800 dark:text-gray-200 line-clamp-3"</span>>
                    {source.content}
                  </p>
                </div>
              ))}
            </div>
          </div>
        )}
      </main>
    </div>
  );
}
</code></pre>
<p>This page provides a clean search interface with a textarea for queries, a search button, and sections to display answers and source citations. The sources section helps users verify where the information came from, which is crucial for trust and accuracy. Now let's create the documents management page.</p>
<h2 id="heading-step-12-create-the-documents-page">Step 12: Create the Documents Page</h2>
<p>The documents page serves as your document library. It displays all uploaded documents in a table format, shows metadata like file size and chunk count, and provides actions to preview, download, or delete documents. This page is essential for managing your document collection and verifying uploads.</p>
<p>Create <code>src/app/documents/page.tsx</code>:</p>
<pre><code class="lang-typescript"><span class="hljs-string">'use client'</span>;
<span class="hljs-keyword">import</span> { useState, useEffect } <span class="hljs-keyword">from</span> <span class="hljs-string">'react'</span>;
<span class="hljs-keyword">import</span> Navigation <span class="hljs-keyword">from</span> <span class="hljs-string">'../components/Navigation'</span>;
<span class="hljs-keyword">import</span> PDFViewerModal <span class="hljs-keyword">from</span> <span class="hljs-string">'../components/PDFViewerModal'</span>;
<span class="hljs-keyword">import</span> UploadModal <span class="hljs-keyword">from</span> <span class="hljs-string">'../components/UploadModal'</span>;

<span class="hljs-keyword">interface</span> Document {
  id: <span class="hljs-built_in">string</span>;
  file_name: <span class="hljs-built_in">string</span>;
  file_type: <span class="hljs-built_in">string</span>;
  file_size: <span class="hljs-built_in">number</span>;
  upload_date: <span class="hljs-built_in">string</span>;
  total_chunks: <span class="hljs-built_in">number</span>;
  file_url?: <span class="hljs-built_in">string</span>;
  file_path?: <span class="hljs-built_in">string</span>;
}

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">DocumentsPage</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> [documents, setDocuments] = useState<Document[]>([]);
  <span class="hljs-keyword">const</span> [loading, setLoading] = useState(<span class="hljs-literal">true</span>);
  <span class="hljs-keyword">const</span> [error, setError] = useState<<span class="hljs-built_in">string</span> | <span class="hljs-literal">null</span>>(<span class="hljs-literal">null</span>);
  <span class="hljs-keyword">const</span> [showPDFModal, setShowPDFModal] = useState(<span class="hljs-literal">false</span>);
  <span class="hljs-keyword">const</span> [selectedPDF, setSelectedPDF] = useState<{ url: <span class="hljs-built_in">string</span>; name: <span class="hljs-built_in">string</span>; id?: <span class="hljs-built_in">string</span>; isPDF?: <span class="hljs-built_in">boolean</span> } | <span class="hljs-literal">null</span>>(<span class="hljs-literal">null</span>);
  <span class="hljs-keyword">const</span> [deletingId, setDeletingId] = useState<<span class="hljs-built_in">string</span> | <span class="hljs-literal">null</span>>(<span class="hljs-literal">null</span>);
  <span class="hljs-keyword">const</span> [showUploadModal, setShowUploadModal] = useState(<span class="hljs-literal">false</span>);

  useEffect(<span class="hljs-function">() =></span> {
    fetchDocuments();
  }, []);

  <span class="hljs-keyword">const</span> fetchDocuments = <span class="hljs-keyword">async</span> () => {
    <span class="hljs-keyword">try</span> {
      setLoading(<span class="hljs-literal">true</span>);
      <span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> fetch(<span class="hljs-string">'/api/documents'</span>);
      <span class="hljs-keyword">const</span> data = <span class="hljs-keyword">await</span> res.json();
      <span class="hljs-keyword">if</span> (data.error) {
        setError(data.error);
      } <span class="hljs-keyword">else</span> {
        setDocuments(data.documents || []);
      }
    } <span class="hljs-keyword">catch</span> (err) {
      setError(err <span class="hljs-keyword">instanceof</span> <span class="hljs-built_in">Error</span> ? err.message : <span class="hljs-string">'Failed to fetch documents'</span>);
    } <span class="hljs-keyword">finally</span> {
      setLoading(<span class="hljs-literal">false</span>);
    }
  };

  <span class="hljs-keyword">const</span> formatDate = <span class="hljs-function">(<span class="hljs-params">s: <span class="hljs-built_in">string</span></span>) =></span> {
    <span class="hljs-keyword">try</span> {
      <span class="hljs-keyword">const</span> d = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(s);
      <span class="hljs-keyword">return</span> <span class="hljs-built_in">isNaN</span>(d.getTime()) 
        ? s 
        : d.toLocaleString(<span class="hljs-string">'en-US'</span>, { 
            year: <span class="hljs-string">'numeric'</span>, 
            month: <span class="hljs-string">'short'</span>, 
            day: <span class="hljs-string">'numeric'</span>, 
            hour: <span class="hljs-string">'2-digit'</span>, 
            minute: <span class="hljs-string">'2-digit'</span>, 
            hour12: <span class="hljs-literal">true</span> 
          });
    } <span class="hljs-keyword">catch</span> { 
      <span class="hljs-keyword">return</span> s; 
    }
  };

  <span class="hljs-keyword">const</span> formatFileSize = <span class="hljs-function">(<span class="hljs-params">b: <span class="hljs-built_in">number</span></span>) =></span> 
    b < <span class="hljs-number">1024</span> 
      ? <span class="hljs-string">`<span class="hljs-subst">${b}</span> B`</span> 
      : b < <span class="hljs-number">1024</span> * <span class="hljs-number">1024</span> 
        ? <span class="hljs-string">`<span class="hljs-subst">${(b / <span class="hljs-number">1024</span>).toFixed(<span class="hljs-number">2</span>)}</span> KB`</span> 
        : <span class="hljs-string">`<span class="hljs-subst">${(b / (<span class="hljs-number">1024</span> * <span class="hljs-number">1024</span>)).toFixed(<span class="hljs-number">2</span>)}</span> MB`</span>;

  <span class="hljs-keyword">const</span> handleDelete = <span class="hljs-keyword">async</span> (id: <span class="hljs-built_in">string</span>, name: <span class="hljs-built_in">string</span>) => {
    <span class="hljs-keyword">if</span> (!confirm(<span class="hljs-string">`Delete "<span class="hljs-subst">${name}</span>"? This will permanently delete the document, embeddings, and file.`</span>)) {
      <span class="hljs-keyword">return</span>;
    }
    setDeletingId(id);
    <span class="hljs-keyword">try</span> {
      <span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> fetch(<span class="hljs-string">`/api/documents?id=<span class="hljs-subst">${id}</span>`</span>, { method: <span class="hljs-string">'DELETE'</span> });
      <span class="hljs-keyword">const</span> data = <span class="hljs-keyword">await</span> res.json();
      <span class="hljs-keyword">if</span> (data.error) {
        alert(<span class="hljs-string">`Error: <span class="hljs-subst">${data.error}</span>`</span>);
      } <span class="hljs-keyword">else</span> {
        setDocuments(documents.filter(<span class="hljs-function"><span class="hljs-params">doc</span> =></span> doc.id !== id));
      }
    } <span class="hljs-keyword">catch</span> (err) {
      alert(err <span class="hljs-keyword">instanceof</span> <span class="hljs-built_in">Error</span> ? err.message : <span class="hljs-string">'Failed to delete'</span>);
    } <span class="hljs-keyword">finally</span> {
      setDeletingId(<span class="hljs-literal">null</span>);
    }
  };

  <span class="hljs-keyword">return</span> (
    <div className=<span class="hljs-string">"min-h-screen"</span>>
      <Navigation />
      <main className=<span class="hljs-string">"max-w-7xl mx-auto p-8"</span>>
        <div className=<span class="hljs-string">"flex items-center justify-between mb-6"</span>>
          <h1 className=<span class="hljs-string">"text-3xl font-bold"</span>>Documents</h1>
          <button
            onClick={<span class="hljs-function">() =></span> setShowUploadModal(<span class="hljs-literal">true</span>)}
            className=<span class="hljs-string">"px-4 py-2 bg-blue-600 text-white rounded-lg hover:bg-blue-700 font-medium"</span>
          >
            Upload Document
          </button>
        </div>

        {loading ? (
          <div className=<span class="hljs-string">"text-center py-12"</span>>
            <p className=<span class="hljs-string">"text-gray-500 dark:text-gray-400"</span>>Loading documents...</p>
          </div>
        ) : error ? (
          <div className=<span class="hljs-string">"bg-red-50 dark:bg-red-900/20 border border-red-200 dark:border-red-800 rounded-lg p-4"</span>>
            <p className=<span class="hljs-string">"text-red-800 dark:text-red-200"</span>><span class="hljs-built_in">Error</span>: {error}</p>
          </div>
        ) : documents.length === <span class="hljs-number">0</span> ? (
          <div className=<span class="hljs-string">"bg-gray-50 dark:bg-gray-800 border border-gray-200 dark:border-gray-700 rounded-lg p-12 text-center"</span>>
            <p className=<span class="hljs-string">"text-gray-500 dark:text-gray-400 mb-4"</span>>No documents uploaded yet.</p>
            <button
              onClick={<span class="hljs-function">() =></span> setShowUploadModal(<span class="hljs-literal">true</span>)}
              className=<span class="hljs-string">"text-blue-600 dark:text-blue-400 hover:underline font-medium"</span>
            >
              Upload your first <span class="hljs-built_in">document</span>
            </button>
          </div>
        ) : (
          <div className=<span class="hljs-string">"bg-white dark:bg-gray-900 border border-gray-200 dark:border-gray-800 rounded-lg shadow-sm overflow-hidden"</span>>
            <div className=<span class="hljs-string">"overflow-x-auto"</span>>
              <table className=<span class="hljs-string">"min-w-full divide-y divide-gray-200 dark:divide-gray-800"</span>>
                <thead className=<span class="hljs-string">"bg-gray-50 dark:bg-gray-800"</span>>
                  <tr>
                    <th className=<span class="hljs-string">"px-6 py-3 text-left text-xs font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider"</span>>
                      File Name
                    </th>
                    <th className=<span class="hljs-string">"px-6 py-3 text-left text-xs font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider"</span>>
                      Type
                    </th>
                    <th className=<span class="hljs-string">"px-6 py-3 text-left text-xs font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider"</span>>
                      Size
                    </th>
                    <th className=<span class="hljs-string">"px-6 py-3 text-left text-xs font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider"</span>>
                      Chunks
                    </th>
                    <th className=<span class="hljs-string">"px-6 py-3 text-left text-xs font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider"</span>>
                      Upload <span class="hljs-built_in">Date</span>
                    </th>
                    <th className=<span class="hljs-string">"px-6 py-3 text-left text-xs font-medium text-gray-500 dark:text-gray-400 uppercase tracking-wider"</span>>
                      Actions
                    </th>
                  </tr>
                </thead>
                <tbody className=<span class="hljs-string">"bg-white dark:bg-gray-900 divide-y divide-gray-200 dark:divide-gray-800"</span>>
                  {documents.map(<span class="hljs-function">(<span class="hljs-params">doc</span>) =></span> (
                    <tr key={doc.id} className=<span class="hljs-string">"hover:bg-gray-50 dark:hover:bg-gray-800"</span>>
                      <td className=<span class="hljs-string">"px-6 py-4 whitespace-nowrap"</span>>
                        <div className=<span class="hljs-string">"text-sm font-medium text-gray-900 dark:text-gray-100"</span>>
                          {doc.file_name}
                        </div>
                      </td>
                      <td className=<span class="hljs-string">"px-6 py-4 whitespace-nowrap"</span>>
                        <span className=<span class="hljs-string">"px-2 inline-flex text-xs leading-5 font-semibold rounded-full bg-blue-100 text-blue-800 dark:bg-blue-900 dark:text-blue-200"</span>>
                          {doc.file_type || <span class="hljs-string">'unknown'</span>}
                        </span>
                      </td>
                      <td className=<span class="hljs-string">"px-6 py-4 whitespace-nowrap text-sm text-gray-500 dark:text-gray-400"</span>>
                        {formatFileSize(doc.file_size)}
                      </td>
                      <td className=<span class="hljs-string">"px-6 py-4 whitespace-nowrap text-sm text-gray-500 dark:text-gray-400"</span>>
                        {doc.total_chunks}
                      </td>
                      <td className=<span class="hljs-string">"px-6 py-4 whitespace-nowrap text-sm text-gray-500 dark:text-gray-400"</span>>
                        {formatDate(doc.upload_date)}
                      </td>
                      <td className=<span class="hljs-string">"px-6 py-4 whitespace-nowrap text-sm font-medium"</span>>
                        <div className=<span class="hljs-string">"flex gap-3 items-center"</span>>
                          {doc.file_name.toLowerCase().endsWith(<span class="hljs-string">'.pdf'</span>) ? (
                            <button 
                              onClick={<span class="hljs-function">() =></span> {
                                <span class="hljs-keyword">const</span> pdfUrl = doc.file_url 
                                  ? <span class="hljs-string">`<span class="hljs-subst">${doc.file_url}</span>?view=true`</span> 
                                  : <span class="hljs-string">`/api/documents?id=<span class="hljs-subst">${doc.id}</span>&file=true&view=true`</span>;
                                setSelectedPDF({ url: pdfUrl, name: doc.file_name, id: doc.id });
                                setShowPDFModal(<span class="hljs-literal">true</span>);
                              }} 
                              className=<span class="hljs-string">"text-blue-600 hover:text-blue-900 dark:text-blue-400 dark:hover:text-blue-300"</span>
                            >
                              Preview
                            </button>
                          ) : (
                            <>
                              <button 
                                onClick={<span class="hljs-function">() =></span> {
                                  setSelectedPDF({ 
                                    url: doc.file_url || <span class="hljs-string">`/api/documents?id=<span class="hljs-subst">${doc.id}</span>&file=true`</span>, 
                                    name: doc.file_name, 
                                    id: doc.id, 
                                    isPDF: <span class="hljs-literal">false</span> 
                                  });
                                  setShowPDFModal(<span class="hljs-literal">true</span>);
                                }} 
                                className=<span class="hljs-string">"text-blue-600 hover:text-blue-900 dark:text-blue-400 dark:hover:text-blue-300"</span>
                              >
                                View
                              </button>
                              {(doc.file_url || doc.file_path) && (
                                <a 
                                  href={doc.file_url || <span class="hljs-string">`/api/documents?id=<span class="hljs-subst">${doc.id}</span>&file=true`</span>} 
                                  download={doc.file_name}
                                  className=<span class="hljs-string">"text-green-600 hover:text-green-900 dark:text-green-400 dark:hover:text-green-300"</span> 
                                  target=<span class="hljs-string">"_blank"</span> 
                                  rel=<span class="hljs-string">"noopener noreferrer"</span>
                                >
                                  Download
                                </a>
                              )}
                            </>
                          )}
                          <button 
                            onClick={<span class="hljs-function">() =></span> handleDelete(doc.id, doc.file_name)} 
                            disabled={deletingId === doc.id}
                            className=<span class="hljs-string">"text-red-600 hover:text-red-900 dark:text-red-400 dark:hover:text-red-300 disabled:opacity-50 disabled:cursor-not-allowed"</span>
                          >
                            {deletingId === doc.id ? <span class="hljs-string">'Deleting...'</span> : <span class="hljs-string">'Delete'</span>}
                          </button>
                        </div>
                      </td>
                    </tr>
                  ))}
                </tbody>
              </table>
            </div>
          </div>
        )}

        {selectedPDF && (
          <PDFViewerModal 
            isOpen={showPDFModal} 
            onClose={<span class="hljs-function">() =></span> { 
              setShowPDFModal(<span class="hljs-literal">false</span>); 
              setSelectedPDF(<span class="hljs-literal">null</span>); 
            }}
            fileUrl={selectedPDF.url} 
            fileName={selectedPDF.name} 
            documentId={selectedPDF.id} 
            isPDF={selectedPDF.isPDF !== <span class="hljs-literal">false</span>} 
          />
        )}
        <UploadModal 
          isOpen={showUploadModal} 
          onClose={<span class="hljs-function">() =></span> setShowUploadModal(<span class="hljs-literal">false</span>)} 
          onUploadSuccess={fetchDocuments} 
        />
      </main>
    </div>
  );
}
</code></pre>
<p>This page provides a comprehensive document management interface with a table showing all documents, their metadata, and action buttons for preview, download, and deletion. The page automatically refreshes after uploads and handles loading and error states gracefully.</p>
<p>Now that all your components and pages are built, let's test the complete application.</p>
<h2 id="heading-step-13-test-your-application">Step 13: Test Your Application</h2>
<p>Start your development server:</p>
<pre><code class="lang-typescript">npm run dev
</code></pre>
<p>Open <a target="_blank" href="http://localhost:3000/"><strong>http://localhost:3000</strong></a> in your browser.</p>
<h3 id="heading-test-the-upload-flow">Test the Upload Flow</h3>
<ol>
<li><p>Navigate to the Documents page</p>
</li>
<li><p>Click "Upload Document"</p>
</li>
<li><p>Select a PDF, DOCX, or TXT file</p>
</li>
<li><p>Wait for the upload and processing to complete (this may take a moment as embeddings are generated)</p>
</li>
<li><p>You should see your document in the list with its metadata:</p>
</li>
</ol>
<p></p>
<h3 id="heading-test-the-search-flow">Test the Search Flow</h3>
<ol>
<li><p>Navigate to the Search page (or click "Search" in the navigation)</p>
</li>
<li><p>Make sure you've uploaded at least one document first</p>
</li>
<li><p>Type a question about your uploaded document (for example, "What is this document about?" or ask about specific content)</p>
</li>
<li><p>Click "Search" or press Cmd/Ctrl + Enter</p>
</li>
<li><p>You should see an AI-generated answer with source citations showing which document chunks were used</p>
</li>
</ol>
<p>Once the embedding is done, you can navigate to search and look for the sample test command based on the documents you have uploaded. You can also check the source from which the search results were pulled.</p>
<p></p>
<h3 id="heading-test-document-management">Test Document Management</h3>
<ol>
<li><p>On the Documents page, click "Preview" or "View" on a document</p>
</li>
<li><p>Try downloading a document</p>
</li>
<li><p>Test deleting a document (be careful - this is permanent)</p>
</li>
</ol>
<p>If everything works correctly, you're ready to deploy your application!</p>
<h2 id="heading-step-14-deploy-your-application">Step 14: Deploy Your Application</h2>
<h3 id="heading-deploy-to-vercel">Deploy to Vercel</h3>
<p>Vercel is the easiest way to deploy Next.js applications and is made by the creators of Next.js:</p>
<p>To get started, you’ll need to push your code to GitHub. So go ahead and create a repository and push your code.</p>
<p>Then go to <a target="_blank" href="https://vercel.com/"><strong>vercel.com</strong></a> and sign in with your GitHub account. Click "New Project" and import your GitHub repository.</p>
<p>Add your environment variables in the project settings:</p>
<ul>
<li><p><code>NEXT_PUBLIC_SUPABASE_URL</code></p>
</li>
<li><p><code>NEXT_PUBLIC_SUPABASE_PUBLISHABLE_DEFAULT_KEY</code></p>
</li>
<li><p><code>SUPABASE_SERVICE_ROLE_KEY</code></p>
</li>
<li><p><code>OPENAI_API_KEY</code></p>
</li>
</ul>
<p>Then click "Deploy", and your application will be live in minutes! Vercel automatically builds and deploys your Next.js application, and you'll get a URL like <a target="_blank" href="http://your-app.vercel.app/"><code>your-app.vercel.app</code></a>.</p>
<h3 id="heading-important-deployment-notes">Important Deployment Notes</h3>
<ul>
<li><p>Make sure all environment variables are set in your Vercel project settings</p>
</li>
<li><p>The service role key is required for file uploads to work</p>
</li>
<li><p>Supabase Storage bucket should be accessible (public or with proper RLS policies)</p>
</li>
<li><p>Your OpenAI API key should have sufficient credits</p>
</li>
</ul>
<h2 id="heading-how-rag-search-works">How RAG Search Works</h2>
<p>Your application uses the RAG (Retrieval-Augmented Generation) pattern. This combines information retrieval with AI text generation. Here's how it works step by step:</p>
<ol>
<li><p><strong>Document processing</strong>: When you upload a document, it's split into chunks. These are typically 800 characters each with 100-character overlap. Each chunk gets an embedding. This is a 1536-dimensional vector that represents its semantic meaning.</p>
</li>
<li><p><strong>Storage</strong>: Embeddings are stored in a vector database. This is PostgreSQL with the pgvector extension. They're stored alongside the original text chunks. The original files are stored in Supabase Storage.</p>
</li>
<li><p><strong>Query processing</strong>: When you search, your query is converted into an embedding. It uses the same model that processed the documents. This ensures the query and documents are in the same "vector space."</p>
</li>
<li><p><strong>Similarity search</strong>: The system finds the most similar document chunks. It uses cosine similarity on the embeddings. Cosine similarity measures the angle between vectors. Smaller angles mean more similar content, even if the exact words differ.</p>
</li>
<li><p><strong>Answer generation</strong>: The retrieved chunks are used as context for an AI model. This model is GPT-4o-mini. It generates an accurate answer. The system prompt instructs the AI to only answer based on the provided context. This ensures accuracy.</p>
</li>
</ol>
<p>This approach gives you several benefits.</p>
<p>First, you get accuracy. Answers are based on your actual documents, not just the AI's training data. Second, you get transparency. You can see which document chunks were used to generate each answer. Third, you get efficiency. Only relevant chunks are used, which reduces token usage and costs. Finally, you get up-to-date information. You can update your knowledge base by uploading new documents without retraining the AI.</p>
<h2 id="heading-troubleshooting-common-issues">Troubleshooting Common Issues</h2>
<h3 id="heading-storage-rls-error-when-uploading">"Storage RLS error" when uploading</h3>
<p>This means your <code>SUPABASE_SERVICE_ROLE_KEY</code> is not set or incorrect. Make sure the key is in your <code>.env.local</code> file for local development. Also make sure you're using the service role key, not the anon key. Finally, make sure the key is correctly set in your deployment environment, such as Vercel.</p>
<h3 id="heading-failed-to-extract-text-from-file">"Failed to extract text from file"</h3>
<p>Make sure your file is a valid PDF, DOCX, or TXT file. Check that the file isn't corrupted. For PDFs, ensure they contain extractable text. Scanned PDFs with only images won't work without <a target="_blank" href="https://en.wikipedia.org/wiki/Optical_character_recognition">OCR</a>.</p>
<h3 id="heading-no-answer-generated">"No answer generated"</h3>
<p>Make sure you've uploaded at least one document. Try a different query that's more likely to match your documents. Check that embeddings were successfully created. You can verify this in your Supabase database.</p>
<h3 id="heading-vector-similarity-search-not-working">Vector similarity search not working</h3>
<p>Ensure the <code>vector</code> extension is enabled in Supabase. You can do this by running <code>CREATE EXTENSION IF NOT EXISTS vector;</code>. Verify the <code>match_documents</code> function exists in your database. You can check this in the SQL Editor. Check that embeddings are being stored correctly. They should be JSON strings in the embedding column.</p>
<h3 id="heading-slow-search-or-upload-times">Slow search or upload times</h3>
<p>Large documents take longer to process. This is because more chunks mean more embedding API calls. Consider reducing chunk size or processing documents in batches. Also check your OpenAI API rate limits.</p>
<h2 id="heading-next-steps">Next Steps</h2>
<p>Now that you have a working RAG search application, you can extend it with additional features. Here are some examples of useful features you could add:</p>
<ul>
<li><p>You can add more file types by extending the text extraction to support Markdown, HTML, or other formats.</p>
</li>
<li><p>You can improve chunking by experimenting with different chunk sizes, overlap strategies, or semantic chunking.</p>
</li>
<li><p>You can add authentication to protect your documents with user authentication using Supabase Auth.</p>
</li>
<li><p>You can enhance the UI by adding features like search history, document tags, or advanced filters.</p>
</li>
<li><p>You can optimize performance by adding caching, pagination, or streaming responses.</p>
</li>
<li><p>You can add filters to allow users to search within specific documents or date ranges.</p>
</li>
<li><p>Finally, you can improve search by adding hybrid search, which combines keyword and semantic search, or reranking.</p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>You've built a complete RAG search application from scratch. This application demonstrates modern web development with Next.js and TypeScript. It shows vector database operations with Supabase and pgvector. It demonstrates AI integration with OpenAI embeddings and chat completions. It includes file handling and storage with Supabase Storage. Finally, it features a production-ready user interface with Tailwind CSS.</p>
<p>The RAG pattern you've implemented is used by many production applications. These include <a target="_blank" href="https://www.freecodecamp.org/news/how-to-build-an-embeddable-ai-chatbot-widget-with-cloudflare-workers/">chatbots</a>, knowledge bases, document search systems, and AI assistants. You now have the foundation to build more advanced features on top of this.</p>
<p>The skills you've learned are highly valuable in today's AI-driven development landscape. You've learned to work with embeddings, vector databases, and the RAG pattern. You can apply these concepts to build intelligent search systems, document Q&A applications, or AI-powered knowledge bases.</p>
 
</article>
<article>
<h1> How to Build and Deploy a Blog-to-Audio Service Using OpenAI </h1>
<p>Manish Shivanandhan — Wed, 14 Jan 2026 04:34:50 +0000</p>
 <p>Turning written blog posts into audio is a simple way to reach more people. Many users prefer listening during travel or workouts. Others enjoy having both reading and listening options. </p>
<p>With OpenAI’s <a target="_blank" href="https://platform.openai.com/docs/guides/text-to-speech">text-to-speech</a> models, you can build a clean service that takes a blog URL or pasted text and produces a natural-sounding audio file. </p>
<p>In this article, you’ll learn how to build this system end-to-end. You will learn how to fetch blog content, send it to OpenAI’s audio API, save the output as an MP3 file, and serve everything through a small <a target="_blank" href="https://fastapi.tiangolo.com/">FastAPI</a> app. </p>
<p>At the end, you’ll also build a minimal user interface and deploy it to Sevalla so that anyone can upload text and download audio without touching code.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-understanding-the-core-idea">Understanding the Core Idea</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-set-up-your-project">How to Set Up Your Project</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-fetch-and-clean-blog-content">How to Fetch and Clean Blog Content</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-send-text-to-openai-for-audio">How to Send Text to OpenAI for Audio</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-build-a-fastapi-backend">How to Build a FastAPI Backend</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-add-a-simple-user-interface">How to Add a Simple User Interface</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-deploy-your-service-to-sevalla">How to Deploy Your Service to Sevalla</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-understanding-the-core-idea">Understanding the Core Idea</h2>
<p>A blog-to-audio service has only three important parts. The first part takes a blog link or text and cleans it. The second part sends the clean text to OpenAI’s text-to-speech model. The third part gives the final MP3 file back to the user.</p>
<p>OpenAI’s speech generation is simple to use. You send text, choose a voice, and get audio back. The quality is high and works well even for long posts. This means you do not need to worry about training models or tuning voices.</p>
<p>The only job left is to make the system easy to use. That is where FastAPI and a small HTML form help. They wrap your code into a web service so anyone can try it.</p>
<h2 id="heading-how-to-set-up-your-project">How to Set Up Your Project</h2>
<p>Create a folder for your project. Inside it, create a file called <code>main.py</code>. You will also need a basic HTML file later.</p>
<p>Install the libraries you need with pip:</p>
<pre><code class="lang-python">pip install fastapi uvicorn requests beautifulsoup4 python-multipart
</code></pre>
<p>FastAPI gives you a simple backend. Requests module helps download blog pages. <a target="_blank" href="https://pypi.org/project/beautifulsoup4/">BeautifulSoup</a> helps remove HTML tags and extract readable text. Python-multipart helps upload form data.</p>
<p>You must also install the OpenAI client:</p>
<pre><code class="lang-python">pip install openai
</code></pre>
<p>Make sure you have your OpenAI API key ready. Set it in your terminal before running the app:</p>
<pre><code class="lang-python">export OPENAI_API_KEY=<span class="hljs-string">"your-key"</span>
</code></pre>
<p>On Windows, you can do:</p>
<pre><code class="lang-python">setx OPENAI_API_KEY <span class="hljs-string">"your-key"</span>
</code></pre>
<h2 id="heading-how-to-fetch-and-clean-blog-content">How to Fetch and Clean Blog Content</h2>
<p>To convert a blog into audio, you must first extract the main article text. You can fetch the page with requests and parse it with BeautifulSoup. </p>
<p>Below is a simple function that does this. </p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> requests
<span class="hljs-keyword">from</span> bs4 <span class="hljs-keyword">import</span> BeautifulSoup

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">extract_text_from_url</span>(<span class="hljs-params">url: str</span>) -> str:</span>
    response = requests.get(url, timeout=<span class="hljs-number">10</span>)
    html = response.text
    soup = BeautifulSoup(html, <span class="hljs-string">"html.parser"</span>)
    paragraphs = soup.find_all(<span class="hljs-string">"p"</span>)
    text = <span class="hljs-string">" "</span>.join(p.get_text(strip=<span class="hljs-literal">True</span>) <span class="hljs-keyword">for</span> p <span class="hljs-keyword">in</span> paragraphs)
    <span class="hljs-keyword">return</span> text
</code></pre>
<p>Here is what happens step by step. </p>
<ul>
<li><p>The function downloads the page. </p>
</li>
<li><p>BeautifulSoup reads the HTML and finds all paragraph tags. </p>
</li>
<li><p>It pulls out the text in each paragraph and joins them into one long string. </p>
</li>
<li><p>This gives you a clean version of the blog post without ads or layout code.</p>
</li>
</ul>
<p>If the user pastes text instead of a URL, you can skip this part and use the text as it is.</p>
<h2 id="heading-how-to-send-text-to-openai-for-audio">How to Send Text to OpenAI for Audio</h2>
<p>OpenAI’s text-to-speech API makes this part of the work very easy. You send a message with text and select a voice such as Alloy or Verse. The API returns raw audio bytes. You can save these bytes as an MP3 file.</p>
<p>Here is a helper function to convert text into audio:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> openai <span class="hljs-keyword">import</span> OpenAI
client = OpenAI()

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">text_to_audio</span>(<span class="hljs-params">text: str, output_path: str</span>):</span>
    audio = client.audio.speech.create(
        model=<span class="hljs-string">"gpt-4o-mini-tts"</span>,
        voice=<span class="hljs-string">"alloy"</span>,
        input=text
    )
    <span class="hljs-keyword">with</span> open(output_path, <span class="hljs-string">"wb"</span>) <span class="hljs-keyword">as</span> f:
        f.write(audio.read())
</code></pre>
<p>This function calls the OpenAI client and passes the text, model name, and voice choice. The <code>.read()</code> method extracts the binary audio stream. Writing this to an MP3 file completes the process.</p>
<p>If the blog post is very long, you may want to limit text length or chunk the text and join the audio files later. But for most blogs, the model can handle the entire text in one request.</p>
<h2 id="heading-how-to-build-a-fastapi-backend">How to Build a FastAPI Backend</h2>
<p>Now you can wrap both steps into a simple FastAPI server. This server will accept either a URL or pasted text. It will convert the content into audio and return the MP3 file as a response.</p>
<p>Here is the full backend code:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> fastapi <span class="hljs-keyword">import</span> FastAPI, Form
<span class="hljs-keyword">from</span> fastapi.responses <span class="hljs-keyword">import</span> FileResponse
<span class="hljs-keyword">import</span> uuid
<span class="hljs-keyword">import</span> os

app = FastAPI()
<span class="hljs-meta">@app.post("/convert")</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">convert</span>(<span class="hljs-params">url: str = Form(<span class="hljs-params">None</span>), text: str = Form(<span class="hljs-params">None</span>)</span>):</span>
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> url <span class="hljs-keyword">and</span> <span class="hljs-keyword">not</span> text:
        <span class="hljs-keyword">return</span> {<span class="hljs-string">"error"</span>: <span class="hljs-string">"Please provide a URL or text"</span>}
    <span class="hljs-keyword">if</span> url:
        <span class="hljs-keyword">try</span>:
            text_content = extract_text_from_url(url)
        <span class="hljs-keyword">except</span> Exception:
            <span class="hljs-keyword">return</span> {<span class="hljs-string">"error"</span>: <span class="hljs-string">"Could not fetch the URL"</span>}
    <span class="hljs-keyword">else</span>:
        text_content = text
    file_id = uuid.uuid4().hex
    output_path = <span class="hljs-string">f"audio_<span class="hljs-subst">{file_id}</span>.mp3"</span>
    text_to_audio(text_content, output_path)
    <span class="hljs-keyword">return</span> FileResponse(output_path, media_type=<span class="hljs-string">"audio/mpeg"</span>)
</code></pre>
<p>Here is how it works. The user sends form data with either <code>url</code> or <code>text</code>. The server checks which one exists. </p>
<p>If there is a URL, it extracts text with the earlier function. If there is no URL, it uses the provided text directly. A unique file name is created for every request. Then the audio file is generated and returned as an MP3 download.</p>
<p>You can run the server like this:</p>
<pre><code class="lang-python">uvicorn main:app --reload
</code></pre>
<p>Open your browser at <code>http://localhost:8000</code>. You will not see the UI yet, but the API endpoint is working. You can test it using a tool like Postman or by building the front end next.</p>
<h2 id="heading-how-to-add-a-simple-user-interface">How to Add a Simple User Interface</h2>
<p>A service is much easier to use when it has a clean UI. Below is a simple HTML page that sends either a URL or text to your FastAPI backend. Save this file as <code>index.html</code> in the same folder:</p>
<pre><code class="lang-xml"><span class="hljs-meta"><!DOCTYPE <span class="hljs-meta-keyword">html</span>></span>
<span class="hljs-tag"><<span class="hljs-name">html</span>></span>
<span class="hljs-tag"><<span class="hljs-name">head</span>></span>
    <span class="hljs-tag"><<span class="hljs-name">title</span>></span>Blog to Audio<span class="hljs-tag"></<span class="hljs-name">title</span>></span>
    <span class="hljs-tag"><<span class="hljs-name">style</span>></span><span class="css">
        <span class="hljs-selector-tag">body</span> { <span class="hljs-attribute">font-family</span>: Arial, padding: <span class="hljs-number">40px</span>; <span class="hljs-attribute">max-width</span>: <span class="hljs-number">600px</span>; <span class="hljs-attribute">margin</span>: auto; }
        <span class="hljs-selector-tag">input</span>, <span class="hljs-selector-tag">textarea</span> { <span class="hljs-attribute">width</span>: <span class="hljs-number">100%</span>; <span class="hljs-attribute">padding</span>: <span class="hljs-number">10px</span>; <span class="hljs-attribute">margin-top</span>: <span class="hljs-number">10px</span>; }
        <span class="hljs-selector-tag">button</span> { <span class="hljs-attribute">padding</span>: <span class="hljs-number">12px</span> <span class="hljs-number">20px</span>; <span class="hljs-attribute">margin-top</span>: <span class="hljs-number">20px</span>; <span class="hljs-attribute">cursor</span>: pointer; }
    </span><span class="hljs-tag"></<span class="hljs-name">style</span>></span>
<span class="hljs-tag"></<span class="hljs-name">head</span>></span>
<span class="hljs-tag"><<span class="hljs-name">body</span>></span>
    <span class="hljs-tag"><<span class="hljs-name">h2</span>></span>Convert Blog to Audio<span class="hljs-tag"></<span class="hljs-name">h2</span>></span>
    <span class="hljs-tag"><<span class="hljs-name">form</span> <span class="hljs-attr">action</span>=<span class="hljs-string">"/convert"</span> <span class="hljs-attr">method</span>=<span class="hljs-string">"post"</span>></span>
        <span class="hljs-tag"><<span class="hljs-name">label</span>></span>Blog URL<span class="hljs-tag"></<span class="hljs-name">label</span>></span>
        <span class="hljs-tag"><<span class="hljs-name">input</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"text"</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"url"</span> <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"Enter a blog link"</span>></span>
<span class="hljs-tag"><<span class="hljs-name">p</span>></span>or paste text below<span class="hljs-tag"></<span class="hljs-name">p</span>></span>
        <span class="hljs-tag"><<span class="hljs-name">textarea</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"text"</span> <span class="hljs-attr">rows</span>=<span class="hljs-string">"10"</span> <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"Paste blog text here"</span>></span><span class="hljs-tag"></<span class="hljs-name">textarea</span>></span>
        <span class="hljs-tag"><<span class="hljs-name">button</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"submit"</span>></span>Convert to Audio<span class="hljs-tag"></<span class="hljs-name">button</span>></span>
    <span class="hljs-tag"></<span class="hljs-name">form</span>></span>
<span class="hljs-tag"></<span class="hljs-name">body</span>></span>
<span class="hljs-tag"></<span class="hljs-name">html</span>></span>
</code></pre>
<p>This page gives the user two options. They can type a URL or paste text. The form sends the data to <code>/convert</code> using a POST request. The response will be the MP3 file, so the browser will download it.</p>
<p>To serve the HTML file, add this route to your <code>main.py</code>:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> fastapi.responses <span class="hljs-keyword">import</span> HTMLResponse

<span class="hljs-meta">@app.get("/")</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">home</span>():</span>
    <span class="hljs-keyword">with</span> open(<span class="hljs-string">"index.html"</span>, <span class="hljs-string">"r"</span>) <span class="hljs-keyword">as</span> f:
        html = f.read()
    <span class="hljs-keyword">return</span> HTMLResponse(html)
</code></pre>
<p>Now, when you visit the main URL, you will see a clean form.</p>
<p></p>
<p>When you submit a URL, the server will process your request and give you an audio file.</p>
<p></p>
<p>Great. Our text to audio service is working. Now let’s get it into production.</p>
<h2 id="heading-how-to-deploy-your-service-to-sevalla">How to Deploy Your Service to Sevalla</h2>
<p>You can choose any cloud provider, like AWS, DigitalOcean, or others, to host your service. I will be using Sevalla for this example.</p>
<p><a target="_blank" href="https://sevalla.com/">Sevalla</a> is a developer-friendly PaaS provider. It offers application hosting, database, object storage, and static site hosting for your projects.</p>
<p>Every platform will charge you for creating a cloud resource. Sevalla comes with a $50 credit for us to use, so we won’t incur any costs for this example.</p>
<p>Let’s push this project to GitHub so that we can connect our repository to Sevalla. We can also enable auto-deployments so that any new change to the repository is automatically deployed.</p>
<p>You can also <a target="_blank" href="https://github.com/manishmshiva/blog-to-audio">fork my repository</a> from here.</p>
<p><a target="_blank" href="https://app.sevalla.com/login">Log in</a> to Sevalla and click on Applications -> Create new application. You can see the option to link your GitHub repository to create a new application.</p>
<p></p>
<p>Use the default settings. Click “Create application”. Now we have to add our OpenAI API key to the environment variables. Click on the “Environment variables” section once the application is created, and save the <code>OPENAI_API_KEY</code> value as an environment variable.</p>
<p></p>
<p>Now we are ready to deploy our application. Click on “Deployments” and click “Deploy now”. It will take 2–3 minutes for the deployment to complete.</p>
<p></p>
<p>Once done, click on “Visit app”. You will see the application served via a URL ending with <code>sevalla.app</code> . This is your new root URL. You can replace <code>localhost:8000</code> with this URL and start using it.</p>
<p></p>
<p>Congrats! Your blog-to-audio service is now live. You can extend this by adding other capabilities and pushing your code to GitHub. Sevalla will automatically deploy your application to production.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>You now know how to build a full blog-to-audio service using OpenAI. You learned how to fetch blog text, convert it into speech, and serve it with FastAPI. You also learned how to create a simple user interface, allowing people to try it with no setup. </p>
<p>With this foundation, you can turn any written content into smooth, natural audio. This can help creators reach a wider audience, enhance accessibility, and provide users with more ways to enjoy content.</p>
<p><em>Hope you enjoyed this article. Signup for my free newsletter</em> <a target="_blank" href="https://www.turingtalks.ai/"><strong><em>TuringTalks.ai</em></strong></a> <em>for more hands-on tutorials on AI. You can also</em> <a target="_blank" href="https://manishshivanandhan.com/"><strong><em>visit my website</em></strong></a><em>.</em></p>
 
</article>
<article>
<h1> How to Use the ChatGPT Apps SDK: Build a Pizza App with Apps SDK </h1>
<p>Shola Jegede — Wed, 15 Oct 2025 18:32:52 +0000</p>
 <p>OpenAI recently introduced ChatGPT Apps, powered by the new <a target="_blank" href="https://developers.openai.com/apps-sdk">Apps SDK</a> and the Model Context Protocol (MCP).</p>
<p>Think of these apps as plugins for ChatGPT:</p>
<ul>
<li><p>You can invoke them naturally in a conversation.</p>
</li>
<li><p>They can render custom interactive UIs inside ChatGPT (maps, carousels, videos, and more).</p>
</li>
<li><p>They run on an MCP server that you control, which defines the tools, resources, and widgets the app provides.</p>
</li>
</ul>
<p>In this step-by-step guide, you’ll build a ChatGPT App using the official <a target="_blank" href="https://github.com/openai/openai-apps-sdk-examples/tree/main/pizzaz_server_node">Pizza App example</a>. This app shows how ChatGPT can render UI widgets like a pizza map or carousel, powered by your local server.</p>
<h2 id="heading-what-youll-learn">What You’ll Learn</h2>
<p>By following this tutorial, you’ll learn how to:</p>
<ul>
<li><p>Set up and run a ChatGPT App with the OpenAI Apps SDK.</p>
</li>
<li><p>Understand the core building blocks: tools, resources, and widgets.</p>
</li>
<li><p>Connect your local app server to ChatGPT using Developer Mode.</p>
</li>
<li><p>Render custom UI directly inside a ChatGPT conversation.</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-youll-learn">What You’ll Learn</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-table-of-contents">Table of Contents</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-chatgpt-apps-work-big-picture">How ChatGPT Apps Work (Big Picture)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-1-clone-the-examples-repo">Step 1. Clone the Examples Repo</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-run-the-pizza-app-server">Step 2. Run the Pizza App Server</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-expose-your-local-server">Step 3. Expose Your Local Server</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-31-get-ngrok">3.1 Get ngrok</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-32-install-ngrok">3.2 Install ngrok</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-33-connect-your-account">3.3 Connect Your Account</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-34-start-a-tunnel">3.4 Start a Tunnel</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-step-4-walk-through-the-pizza-app-code">Step 4. Walk Through the Pizza App Code</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-41-imports-and-setup">4.1 Imports and Setup</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-42-defining-pizza-widgets">4.2 Defining Pizza Widgets</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-43-mapping-widgets-to-tools-and-resources">4.3 Mapping Widgets to Tools and Resources</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-44-handling-requests">4.4 Handling Requests</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-45-creating-the-server">4.5 Creating the Server</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-step-5-enable-developer-mode-in-chatgpt">Step 5. Enable Developer Mode in ChatGPT</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-51-enable-developer-mode">5.1 Enable Developer Mode</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-52-create-app">5.2 Create App</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-53-use-your-app">5.3 Use Your App</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-challenges-try-these-yourself">Challenges (Try These Yourself)</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-challenge-a-add-a-pizza-specials-widget-text-only">Challenge A: Add a “Pizza Specials” widget (text-only)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-challenge-b-support-multiple-toppings">Challenge B: Support Multiple Toppings</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-challenge-c-fetch-real-pizza-data-from-an-external-api">Challenge C: Fetch Real Pizza Data from an External API</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-how-chatgpt-apps-work-big-picture">How ChatGPT Apps Work (Big Picture)</h2>
<p>Here’s the architecture in simple terms:</p>
<pre><code class="lang-markdown">ChatGPT (frontend)
   |
   v
MCP Server (your backend)
   |
   v
Widgets (HTML/JS markup displayed inside ChatGPT)
</code></pre>
<ul>
<li><p><strong>ChatGPT</strong> sends requests like: <em>“Show me a pizza carousel.”</em></p>
</li>
<li><p><strong>MCP Server</strong> responds with resources (HTML markup) and tool logic.</p>
</li>
<li><p><strong>Widgets</strong> are rendered inline in ChatGPT.</p>
</li>
</ul>
<h2 id="heading-step-1-clone-the-examples-repo">Step 1. Clone the Examples Repo</h2>
<p>OpenAI provides an official examples repo that includes the Pizza App. Clone it and install the dependencies using these commands:</p>
<pre><code class="lang-powershell">git clone https://github.com/openai/openai<span class="hljs-literal">-apps</span><span class="hljs-literal">-sdk</span><span class="hljs-literal">-examples</span>.git
<span class="hljs-built_in">cd</span> openai<span class="hljs-literal">-apps</span><span class="hljs-literal">-sdk</span><span class="hljs-literal">-examples</span>
pnpm install
</code></pre>
<p>After installing, build the components and start the dev server:</p>
<pre><code class="lang-powershell">pnpm run build  
pnpm run dev
</code></pre>
<h2 id="heading-step-2-run-the-pizza-app-server">Step 2. Run the Pizza App Server</h2>
<p>Navigate to the Pizza App server and start it:</p>
<pre><code class="lang-powershell"><span class="hljs-built_in">cd</span> pizzaz_server_node
pnpm <span class="hljs-built_in">start</span>
</code></pre>
<p>If it works, you should see:</p>
<pre><code class="lang-powershell">Pizzaz MCP server listening on http://localhost:<span class="hljs-number">8000</span>
  SSE stream: GET http://localhost:<span class="hljs-number">8000</span>/mcp
  Message post endpoint: POST http://localhost:<span class="hljs-number">8000</span>/mcp/messages
</code></pre>
<p>This means your server is running locally.</p>
<h2 id="heading-step-3-expose-your-local-server">Step 3. Expose Your Local Server</h2>
<p>To let ChatGPT communicate with your app, your local server needs a public URL. ngrok provides a quick way to expose it during development.</p>
<h3 id="heading-31-get-ngrok">3.1 Get ngrok</h3>
<p>Sign up at <a target="_blank" href="https://ngrok.com">ngrok.com</a> and copy your <strong>authtoken</strong>.</p>
<h3 id="heading-32-install-ngrok">3.2 Install ngrok</h3>
<p><strong>macOS:</strong></p>
<pre><code class="lang-powershell">brew install ngrok
</code></pre>
<p><strong>Windows:</strong></p>
<ul>
<li><p>Download and unzip ngrok.</p>
</li>
<li><p>Optionally, add the folder to your PATH.</p>
</li>
</ul>
<h3 id="heading-33-connect-your-account">3.3 Connect Your Account</h3>
<pre><code class="lang-powershell">ngrok config <span class="hljs-built_in">add-authtoken</span> <your_authtoken>
</code></pre>
<h3 id="heading-34-start-a-tunnel">3.4 Start a Tunnel</h3>
<pre><code class="lang-powershell">ngrok http <span class="hljs-number">8000</span>
</code></pre>
<p>This gives you a public HTTPS URL (like <a target="_blank" href="https://xyz.ngrok.app/mcp"><code>https://xyz.ngrok.app/mcp</code></a>).</p>
<h2 id="heading-step-4-walk-through-the-pizza-app-code">Step 4. Walk Through the Pizza App Code</h2>
<p>The full Pizza App server code is long, so let’s break it down into digestible parts.</p>
<h3 id="heading-41-imports-and-setup">4.1 Imports and Setup</h3>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { createServer } <span class="hljs-keyword">from</span> <span class="hljs-string">"node:http"</span>;
<span class="hljs-keyword">import</span> { Server } <span class="hljs-keyword">from</span> <span class="hljs-string">"@modelcontextprotocol/sdk/server/index.js"</span>;
<span class="hljs-keyword">import</span> { SSEServerTransport } <span class="hljs-keyword">from</span> <span class="hljs-string">"@modelcontextprotocol/sdk/server/sse.js"</span>;
<span class="hljs-keyword">import</span> { z } <span class="hljs-keyword">from</span> <span class="hljs-string">"zod"</span>;
</code></pre>
<ul>
<li><p><code>Server</code> and <code>SSEServerTransport</code> come from the Apps SDK.</p>
</li>
<li><p><code>zod</code> validates input to ensure ChatGPT sends the right arguments.</p>
</li>
</ul>
<h3 id="heading-42-defining-pizza-widgets">4.2 Defining Pizza Widgets</h3>
<p>Widgets are the heart of the app. Each one represents a piece of UI ChatGPT can display.</p>
<p>Here’s the Pizza Map widget:</p>
<pre><code class="lang-typescript">{
  id: <span class="hljs-string">"pizza-map"</span>,
  title: <span class="hljs-string">"Show Pizza Map"</span>,
  templateUri: <span class="hljs-string">"ui://widget/pizza-map.html"</span>,
  html: <span class="hljs-string">`
    <div id="pizzaz-root"></div>
    <link rel="stylesheet" href=".../pizzaz-0038.css">
    <script type="module" src=".../pizzaz-0038.js"></script>
  `</span>,
  responseText: <span class="hljs-string">"Rendered a pizza map!"</span>
}
</code></pre>
<ul>
<li><p><code>id</code> → unique name of the widget.</p>
</li>
<li><p><code>templateUri</code> → how ChatGPT fetches the UI.</p>
</li>
<li><p><code>html</code> → actual markup and assets.</p>
</li>
<li><p><code>responseText</code> → message that shows in chat.</p>
</li>
</ul>
<p>The app defines five widgets:</p>
<ul>
<li><p>Pizza Map</p>
</li>
<li><p>Pizza Carousel</p>
</li>
<li><p>Pizza Album</p>
</li>
<li><p>Pizza List</p>
</li>
<li><p>Pizza Video</p>
</li>
</ul>
<h3 id="heading-43-mapping-widgets-to-tools-and-resources">4.3 Mapping Widgets to Tools and Resources</h3>
<p>Next, widgets are converted into <strong>tools</strong> (things ChatGPT can call) and <strong>resources</strong> (UI markup ChatGPT can render).</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> tools = widgets.map(<span class="hljs-function">(<span class="hljs-params">widget</span>) =></span> ({
  name: widget.id,
  description: widget.title,
  inputSchema: toolInputSchema,
  title: widget.title,
  _meta: widgetMeta(widget)
}));

<span class="hljs-keyword">const</span> resources = widgets.map(<span class="hljs-function">(<span class="hljs-params">widget</span>) =></span> ({
  uri: widget.templateUri,
  name: widget.title,
  description: <span class="hljs-string">`<span class="hljs-subst">${widget.title}</span> widget markup`</span>,
  mimeType: <span class="hljs-string">"text/html+skybridge"</span>,
  _meta: widgetMeta(widget)
}));
</code></pre>
<p>This makes each widget callable and displayable.</p>
<h3 id="heading-44-handling-requests">4.4 Handling Requests</h3>
<p>The MCP server responds to ChatGPT’s requests. For example, when ChatGPT calls a widget tool:</p>
<pre><code class="lang-typescript">server.setRequestHandler(CallToolRequestSchema, <span class="hljs-keyword">async</span> (request) => {
  <span class="hljs-keyword">const</span> widget = widgetsById.get(request.params.name);
  <span class="hljs-keyword">const</span> args = toolInputParser.parse(request.params.arguments ?? {});
  <span class="hljs-keyword">return</span> {
    content: [{ <span class="hljs-keyword">type</span>: <span class="hljs-string">"text"</span>, text: widget.responseText }],
    structuredContent: { pizzaTopping: args.pizzaTopping },
    _meta: widgetMeta(widget)
  };
});
</code></pre>
<p>This:</p>
<ul>
<li><p>Finds the widget requested.</p>
</li>
<li><p>Validates the input (<code>pizzaTopping</code>).</p>
</li>
<li><p>Responds with text + metadata so ChatGPT can render the widget.</p>
</li>
</ul>
<h3 id="heading-45-creating-the-server">4.5 Creating the Server</h3>
<p>Finally, the server is bound to HTTP endpoints (<code>/mcp</code> and <code>/mcp/messages</code>) so ChatGPT can stream messages to and from it.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> httpServer = createServer(<span class="hljs-keyword">async</span> (req, res) => {
  <span class="hljs-comment">// handle requests to /mcp and /mcp/messages</span>
});

httpServer.listen(<span class="hljs-number">8000</span>, <span class="hljs-function">() =></span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Pizzaz MCP server running on port 8000"</span>);
});
</code></pre>
<h2 id="heading-step-5-enable-developer-mode-in-chatgpt">Step 5. Enable Developer Mode in ChatGPT</h2>
<h3 id="heading-51-enable-developer-mode">5.1 Enable Developer Mode</h3>
<ul>
<li><p>Open ChatGPT</p>
</li>
<li><p>Go to <strong>Settings → Apps & Connectors → Advanced Settings</strong></p>
</li>
<li><p>Toggle <strong>Developer Mode</strong></p>
</li>
</ul>
<p></p>
<p>When <strong>Developer Mode</strong> is enabled, ChatGPT should look like this:</p>
<p></p>
<h3 id="heading-52-create-app">5.2 Create App</h3>
<ul>
<li><p>Go back to <strong>Settings → Apps & Connectors</strong></p>
</li>
<li><p>Click <strong>Create</strong></p>
</li>
<li><p>Next:</p>
<ul>
<li><p><strong>Name</strong>: Enter a name for your app (for example, <em>Pizza App</em>)</p>
</li>
<li><p><strong>Description</strong>: Enter any description for your app (or leave empty)</p>
</li>
<li><p><strong>MCP Server URL</strong>: Paste the public HTTPS URL of your MCP endpoint. Make sure it points directly to <code>/mcp</code>, not just the server root</p>
</li>
<li><p><strong>Authentication</strong>: Choose <strong>No authentication</strong></p>
</li>
<li><p>Check <strong>I trust this application</strong></p>
</li>
<li><p>Click <strong>Create</strong> to finish</p>
</li>
</ul>
</li>
</ul>
<p></p>
<p>Once your app is connected to ChatGPT, it should look like this:</p>
<p></p>
<p>When you click on the <strong>Back</strong> icon, you should see your app and other apps that you can connect to and use with ChatGPT:</p>
<p></p>
<h3 id="heading-53-use-your-app">5.3 Use Your App</h3>
<p>To use your app,</p>
<ul>
<li><p>Open a new chat in ChatGPT</p>
</li>
<li><p>Click on the <strong>+</strong> icon</p>
</li>
<li><p>Scroll down to <strong>more</strong></p>
</li>
<li><p>You would see your app</p>
</li>
<li><p>Choose <strong>Pizza App</strong> to start using your app</p>
</li>
</ul>
<p></p>
<p>Here are some commands you can try out with your pizza app in ChatGPT:</p>
<ul>
<li><p><em>Show me a pizza map with pepperoni topping</em></p>
</li>
<li><p><em>Show me a pizza carousel with mushroom topping</em></p>
</li>
<li><p><em>Show me a pizza album with veggie topping</em></p>
</li>
<li><p><em>Show me a pizza list with cheese topping</em></p>
</li>
<li><p><em>Show me a pizza video with chicken topping</em></p>
</li>
</ul>
<p>Each command tells ChatGPT which widget to render, and you can swap in any topping you like.</p>
<p></p>
<p>Below are samples:</p>
<ul>
<li>Pepperoni topping map:</li>
</ul>
<p></p>
<ul>
<li>Extra cheese carousel:</li>
</ul>
<p></p>
<ul>
<li>Mushroom topping album:</li>
</ul>
<p></p>
<h2 id="heading-challenges-try-these-yourself">Challenges (Try These Yourself)</h2>
<p>Here are three practical ways to extend your Pizza App. Each one ties directly to the code you already have.</p>
<h3 id="heading-challenge-a-add-a-pizza-specials-widget-text-only">Challenge A: Add a “Pizza Specials” widget (text-only)</h3>
<p><strong>Goal:</strong> Create a widget that just shows a short message like <em>“Today’s special: Margherita with basil.”</em></p>
<p><strong>Where to change:</strong></p>
<ul>
<li><p><code>resources.widgets</code> → duplicate an entry and give it a new <code>id</code>/<code>title</code>.</p>
</li>
<li><p><code>tools</code> → register it as a new tool.</p>
</li>
<li><p><code>CallTool</code> handler → detect when it’s called (<code>if (request.params.name === "pizza-special")</code>) and return your special.</p>
</li>
</ul>
<p><strong>Hint:</strong><br>This widget doesn’t need extra CSS/JS files. Just keep its <code>html</code> to something like <code><div>🍕 Today’s special: Margherita</div></code>. The idea is to show that widgets can be as simple as plain HTML.</p>
<h3 id="heading-challenge-b-support-multiple-toppings">Challenge B: Support Multiple Toppings</h3>
<p><strong>Goal:</strong> Let users order a pizza with more than one topping, like <code>["pepperoni", "mushroom"]</code>.</p>
<p><strong>Where to change:</strong></p>
<ul>
<li><p><code>toolInputSchema</code> → switch from <code>z.string()</code> to <code>z.array(z.string())</code>.</p>
</li>
<li><p><code>CallTool</code> handler → after parsing, <code>args.pizzaTopping</code> will be an array. Join it into a string before inserting into HTML/response.</p>
</li>
<li><p>Widget HTML → update the display so it lists all chosen toppings.</p>
</li>
</ul>
<p><strong>Hint:</strong><br>Console.log the parsed <code>args</code> first to confirm you’re actually getting an array. Then try something like:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> toppings = args.pizzaTopping.join(<span class="hljs-string">", "</span>);
<span class="hljs-keyword">return</span> { responseText: <span class="hljs-string">`Pizza ordered with <span class="hljs-subst">${toppings}</span>`</span> };
</code></pre>
<h3 id="heading-challenge-c-fetch-real-pizza-data-from-an-external-api">Challenge C: Fetch Real Pizza Data from an External API</h3>
<p><strong>Goal:</strong> Instead of hard-coding content, fetch real pizza info. For example, you could call Yelp’s API to list pizza places in a location, or use a free placeholder API to simulate data.</p>
<p><strong>Where to change:</strong></p>
<ul>
<li><p>Inside the <code>CallTool</code> handler for your widget.</p>
</li>
<li><p>Replace the static HTML with a <code>fetch(...)</code> call that builds dynamic HTML from the response.</p>
</li>
</ul>
<p><strong>Hint:</strong><br>Start small with a free API like <a target="_blank" href="https://jsonplaceholder.typicode.com/posts">JSONPlaceholder</a>. For example:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> fetch(<span class="hljs-string">"https://jsonplaceholder.typicode.com/posts?_limit=3"</span>);
<span class="hljs-keyword">const</span> data = <span class="hljs-keyword">await</span> res.json();

<span class="hljs-keyword">const</span> html = <span class="hljs-string">`
  <ul>
    <span class="hljs-subst">${data.map((p: <span class="hljs-built_in">any</span>) => <span class="hljs-string">`<li><span class="hljs-subst">${p.title}</span></li>`</span>).join(<span class="hljs-string">""</span>)}</span>
  </ul>
`</span>;

<span class="hljs-keyword">return</span> { responseText: <span class="hljs-string">"Fetched pizza places!"</span>, content: [{ <span class="hljs-keyword">type</span>: <span class="hljs-string">"text/html"</span>, text: html }] };
</code></pre>
<p>Once that works, swap in a real API such as Yelp or Google Maps Places to render actual pizza places.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>You just built your first ChatGPT App using the <strong>OpenAI Apps SDK</strong>. With a bit of JavaScript and HTML, you created a server that ChatGPT can talk to, and rendered interactive widgets right inside the chat window.</p>
<p>This example focused on the pizza app sample provided by OpenAI, but you could build:</p>
<ul>
<li><p>A weather dashboard,</p>
</li>
<li><p>A movie finder,</p>
</li>
<li><p>A financial data viewer,</p>
</li>
<li><p>Or even a mini-game.</p>
</li>
</ul>
<p>The SDK makes it possible to blend <strong>conversation + interactive UI</strong> in powerful new ways.</p>
<p>Explore the <a target="_blank" href="https://developers.openai.com/apps-sdk">OpenAI Apps SDK documentation</a> to go deeper and start building your own apps.</p>
 
</article>
<article>
<h1> Prompt Engineering Cheat Sheet for GPT-5: Learn These Patterns for Solid Code Generation </h1>
<p>Tarun Singh — Fri, 12 Sep 2025 10:30:29 +0000</p>
 <p>When large language models like ChatGPT first became widely available, a lot of us developers felt like we’d been handed a new superpower. We could use LLMs to help us develop new coding projects, build websites, and much more – just using a few prompts.</p>
<p>LLMs were like a tireless, super knowledgeable pair programmer that could conjure code out of thin air. We’d type a quick, messy request, and out would pop something that...kind of worked. It was amazing, but also a little frustrating. The code might be buggy, inefficient, or completely miss the subtle context of our project.</p>
<p>But with <a target="_blank" href="https://platform.openai.com/docs/models/gpt-5"><strong>GPT-5</strong></a>, the game has changed quite a bit. This model doesn’t just spit out code – it reasons, adapts, and understands context like never before. Still, here’s the catch: you need to speak its language to be able to generate the best output. But how? That’s where <strong>prompt engineering</strong> comes in.</p>
<p>In this article, I’ll share 10 proven patterns that will help you transform GPT-5 from a helpful tool into a rock-solid coding partner you can trust for accuracy and speed. Let’s get started!</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-is-gpt-5-why-you-should-use-it-as-a-developer">What is GPT-5? Why You Should Use It as a Developer?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-prompt-engineering">Why Prompt Engineering?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-use-gpt-5-for-free">How to Use GPT-5 for Free?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-patterns-every-developer-should-know">Patterns Every Developer Should Know</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-persona-pattern">Persona Pattern</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-few-shot-pattern">Few-Shot Pattern</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-chain-of-thought-pattern">Chain-of-Thought Pattern</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-delimiter-pattern">Delimiter Pattern</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-structured-output-pattern">Structured Output Pattern</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-flipped-interaction-pattern">Flipped Interaction Pattern</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-negative-constraint-pattern">Negative Constraint Pattern</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-tool-use-pattern">Tool Use Pattern</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-verbosity-pattern">Verbosity Pattern</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-code-as-context-pattern">Code-as-Context Pattern</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-common-pitfalls-to-avoid">Common Pitfalls to Avoid</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-final-thoughts">Final Thoughts</a></p>
</li>
</ol>
<h2 id="heading-what-is-gpt-5-why-you-should-use-it-as-a-developer">What is GPT-5? Why You Should Use It as a Developer?</h2>
<p>OpenAI recently launched one of its best models, GPT-5. It’s capable of performing coding and agentic tasks across various domains. Think of it as a full-stack, super-intelligent intern who’s been given a master key to the internet's knowledge. It's not just better at writing code, it can under <em>why</em> you need the code, how it should fit into a larger system, and how to debug it.</p>
<p>It excels at:</p>
<ul>
<li><p><strong>Long-context reasoning:</strong> It can handle an entire codebase or a lengthy API documentation, a game-changer for refactoring or fixing bugs across multiple files.</p>
</li>
<li><p><strong>Instruction following:</strong> It’s far less likely to get confused by a long list of constraints or a detailed set of steps.</p>
</li>
<li><p><strong>Tool use and agentic tasks:</strong> It can intelligently decide to call an external API, execute a shell command, or search a repository to complete a task.</p>
</li>
</ul>
<h2 id="heading-why-prompt-engineering">Why Prompt Engineering?</h2>
<p>Think of LLMs as junior developers: super smart, but literal. The way you phrase your request drastically changes the output. Prompt engineering is the art and science of crafting effective instructions for an LLM to achieve a specific goal. It’s the method you use to communicate your intent, provide necessary context, and structure your request in a way that the model can most accurately understand and respond to. When you master it, you can:</p>
<ul>
<li><p>Make GPT-5 generate working, testable code.</p>
</li>
<li><p>Avoid vague or irrelevant answers.</p>
</li>
<li><p>Save tokens (and money).</p>
</li>
<li><p>Reduce the time spent editing or debugging outputs.</p>
</li>
</ul>
<h2 id="heading-how-to-use-gpt-5-for-free">How to Use GPT-5 for Free</h2>
<p>While the API for GPT-5 is a <strong>paid</strong> service, many developers can access its power for free or at a low cost. Now, for example, the default public version of ChatGPT often uses the version of GPT-5 with certain usage caps. Many tools like <strong>Cursor, GitHub Copilot, Microsoft Copilot</strong> integrate GPT-5 or lighter variants.</p>
<p>See the screenshot below of the Cursor IDE with integration of various models, including <code>gpt-5-fast</code>, <code>gpt-5-low</code>, and so on. If you’re experimenting, this is the easiest way to explore GPT-5 without paying for direct API calls.</p>
<p></p>
<p>For this article, we'll use a standard API call structure, but these same principles apply whether you're using a web interface or an integrated tool. Let’s dive into the patterns.</p>
<h2 id="heading-patterns-every-developer-should-know">Patterns Every Developer Should Know</h2>
<h3 id="heading-persona-pattern">Persona Pattern</h3>
<p>You know how, when you're interviewing a candidate, you might ask them to act as if they're a "Engineering Lead or Manager" or a "Frontend engineer"? This pattern is the same idea. By assigning the model a role, you give it an immediate set of assumptions and a knowledge filter.</p>
<p>To effectively craft a persona, be specific. For example, instead of saying "You are a developer," try "You are a senior JavaScript developer specializing in backend APIs and scalability." This provides context on their skill level, their domain, and their preferred programming language, guiding the LLM toward a more tailored and expert-level response.</p>
<p><strong>Example:</strong></p>
<pre><code class="lang-python"><span class="hljs-comment"># Python Example</span>
<span class="hljs-keyword">from</span> openai <span class="hljs-keyword">import</span> OpenAI
client = OpenAI()

response = client.responses.create(
    model=<span class="hljs-string">"gpt-5"</span>,
    input=<span class="hljs-string">"""You are a senior JavaScript developer. 
    Refactor this code for readability:
    numbers = [8, 9, 10, 11, 12]; total=0
    for i in numbers: total+=i
    print(total)"""</span>
)

print(response.output_text)
</code></pre>
<p>This code ensures answers match the tone and expertise you expect, as specified in the prompt.</p>
<h3 id="heading-few-shot-pattern">Few-Shot Pattern</h3>
<p>Sometimes, the best way to get a specific style or format of code is to provide an example. This is called "few-shot" prompting. Instead of just describing what you want, you show the model a few completed examples.</p>
<p><strong>Example:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> openai <span class="hljs-keyword">import</span> OpenAI

client = OpenAI()

prompt = <span class="hljs-string">"""
Convert functions to arrow syntax:

Example:
function sum(x, y) { return x + y; }
=> const sum = (x, y) => x + y;

Then convert:
function greet(name) { return "Hey, " + name; }
"""</span>

response = client.responses.create(
    model=<span class="hljs-string">"gpt-5"</span>,
    input=prompt
)

print(response.output_text)
</code></pre>
<p>This code example provides a concrete, undeniable pattern for the model to follow, which is much more effective than a verbose description.</p>
<h3 id="heading-chain-of-thought-pattern">Chain-of-Thought Pattern</h3>
<p>When faced with a complex problem, humans don't just jump to a solution instead, we think through the steps. The Chain-of-Thought pattern asks the LLM to do the same. By telling the model to “think step by step,” you're not just requesting a final answer but you're instructing it to perform internal reasoning and break down the problem into smaller, logical parts. This process is what gives you room to debug.</p>
<p>If the final output is incorrect, you can review its thought process to identify where the logic went wrong. This is particularly effective with GPT-5's enhanced reasoning capabilities. The LLM's reasoning might look like an intermediate, internal monologue you don't always see, but asking it to print its thought process can make it explicit.</p>
<p><strong>Example:</strong></p>
<pre><code class="lang-python">prompt = <span class="hljs-string">"""
Debug the below step by step:
My Python function loop skips the last element of the list. Check why?
"""</span>
</code></pre>
<p>By encouraging reasoning, you reduce errors in the code.</p>
<h3 id="heading-delimiter-pattern">Delimiter Pattern</h3>
<p>When you’re giving the LLM instructions, it’s important to give it a clear way to differentiate your instructions from the data you want it to process. To do this, you can use delimiters like <code>###</code>, <code>"""</code>, or <code><></code> wrapped around your input text to create a clean boundary. This is a general best practice for all LLMs, as they all can struggle with this distinction without a clear signal.</p>
<p><strong>Example:</strong></p>
<pre><code class="lang-python">prompt = <span class="hljs-string">"""
Explain this code in simple and easy English:

###
for i in range(10):
    print(i**3)
###
"""</span>
</code></pre>
<p>This helps prevent the model from misinterpreting your data as part of the instructions, particularly when the data contains instruction-like strings.</p>
<h3 id="heading-structured-output-pattern">Structured Output Pattern</h3>
<p>If you need the model's response to be easily parseable by a program, you must specify the format clearly. This is particularly important when you want to use the output as an input for a different part of your software, such as generating JSON configuration files, XML for web services, or even markdown (MD) files for documentation. By telling the model to adhere to a rigid structure, you ensure the output is consistent and reliable.</p>
<p><strong>Example:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> json
<span class="hljs-keyword">from</span> openai <span class="hljs-keyword">import</span> OpenAI

client = OpenAI()

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">generate_product_list</span>(<span class="hljs-params">product_info</span>):</span>
    prompt = <span class="hljs-string">f"""
    Generate a JSON object for the following product information.
    The JSON should have a 'products' key, which is an array of objects.
    Each object should have keys for 'name', 'category', 'price', and 'in_stock' (a boolean).

    Product Information:
    <span class="hljs-subst">{product_info}</span>

    Provide only the JSON output, and nothing else.
    """</span>

    response = client.responses.create(
        model=<span class="hljs-string">"gpt-5"</span>,
        input=prompt
    )

    <span class="hljs-comment"># Try to parse the response as JSON</span>
    <span class="hljs-keyword">try</span>:
        json_output = json.loads(response.output_text)
        <span class="hljs-keyword">return</span> json_output
    <span class="hljs-keyword">except</span> json.JSONDecodeError <span class="hljs-keyword">as</span> e:
        print(<span class="hljs-string">f"Error parsing JSON: <span class="hljs-subst">{e}</span>"</span>)
        <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

<span class="hljs-comment"># Let's try it out</span>
product_data = <span class="hljs-string">"""
Laptop Pro, Electronics, 1500, True
Ergo Mouse, Accessories, 50, True
Wireless Keyboard, Accessories, 90, False
"""</span>

product_list = generate_product_list(product_data)
<span class="hljs-keyword">if</span> product_list:
    print(json.dumps(product_list, indent=<span class="hljs-number">2</span>))
</code></pre>
<p>In this example, the <code>prompt</code> is the instruction you give to the LLM. It's a text string that outlines a clear task and specifies the output format (a JSON object with specific keys). The <code>response</code> from the model is the raw text it generates, which should be the JSON object you requested. The Python code then attempts to parse this raw text response into a structured JSON object using <code>json.loads()</code>.</p>
<h3 id="heading-flipped-interaction-pattern">Flipped Interaction Pattern</h3>
<p>Sometimes, the best way to get GPT-5 to help you is to have it ask you some questions before it writes any code.</p>
<p><strong>Example:</strong></p>
<pre><code class="lang-python">prompt = <span class="hljs-string">"""
I want a python script to scrape travel websites for travelling data.
Ask me 5 clarifying questions before writing the code.
"""</span>
</code></pre>
<p>This type of prompt helps prevent assumptions and will provide more accurate code.</p>
<h3 id="heading-negative-constraint-pattern">Negative Constraint Pattern</h3>
<p>While it’s important to tell the model what it <strong>should do</strong>, it’s also sometimes as important to tell it what it <strong>should not do</strong> or what it shouldn’t include in its response. This helps the model avoid certain words, tones, or topics.</p>
<p><strong>Example:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> openai <span class="hljs-keyword">import</span> OpenAI

client = OpenAI()

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">my_func</span>(<span class="hljs-params">technical_report</span>):</span>
    prompt = <span class="hljs-string">f"""
    Summarize the following technical report for a non-technical audience. 
    Do not use any specialized jargon, acronyms, or complex terms. 
    Use simple, everyday language.

    Technical Report:
    "<span class="hljs-subst">{technical_report}</span>"
    """</span>
    response = client.responses.create(
        model=<span class="hljs-string">"gpt-5"</span>,
        input=prompt
    )
    <span class="hljs-keyword">return</span> response.output_text

<span class="hljs-comment"># Let's try it out</span>
report = (
    <span class="hljs-string">"The quantum entanglement protocol (QEP) showed significant improvements "</span>
    <span class="hljs-string">"in qubit coherence by utilizing a novel multi-photon emission cascade. "</span>
    <span class="hljs-string">"The data indicates a 12% reduction in decoherence rates, validating the "</span>
    <span class="hljs-string">"hypothesis that non-linear optical feedback could mitigate environmental noise."</span>
)

summary = my_func(report)
print(summary)
</code></pre>
<p>This pattern is a great way to fine-tune the output and steer it away from common pitfalls, overly technical language, and so on, ensuring it meets your specific requirements.</p>
<h3 id="heading-tool-use-pattern">Tool Use Pattern</h3>
<p>GPT-5 is an incredible reasoning engine, but its real power comes when it can interact with external tools, like a web search, a code interpreter, or a file retrieval system. This pattern involves providing the model with a clear description of the tools it can or should use.</p>
<p><strong>Example:</strong></p>
<pre><code class="lang-python">prompt = <span class="hljs-string">"""
You have access to a 'code_interpreter' tool.
Its purpose is to execute JavaScript code in a secure sandbox.
The tool takes a single argument: the JavaScript code as a string.

Your task is to use this tool to calculate the area of a rectangle 
with a length and breadth as 15.
After you get the result, respond with only the final answer number.
"""</span>
</code></pre>
<p>This is what unlocks GPT-5's potential for true agentic behavior. It can autonomously solve a problem by deciding which tools to use and in what order, moving beyond simple text generation.</p>
<h3 id="heading-verbosity-pattern">Verbosity Pattern</h3>
<p>Depending on your needs, you might want more or less concise output from the LLM. With the GPT-5 API, you can adjust the level of detail and length of the output with the use of the new <code>text.verbosity</code> parameter. Just select the level of <code>text.verbosity</code> as <code>low</code>, <code>medium</code>, or <code>high</code>.</p>
<p><strong>Example:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> openai <span class="hljs-keyword">import</span> OpenAI

client = OpenAI()

<span class="hljs-comment"># Low Verbosity for a concise function</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_concise_code</span>(<span class="hljs-params">description</span>):</span>
    prompt = <span class="hljs-string">f"Write a Python function for <span class="hljs-subst">{description}</span>."</span>
    response = client.responses.create(
        model=<span class="hljs-string">"gpt-5"</span>,
        input=prompt,
        metadata={<span class="hljs-string">"verbosity"</span>: <span class="hljs-string">"low"</span>} 
    )
    <span class="hljs-keyword">return</span> response.output_text

user_input = <span class="hljs-string">"a quicksort algorithm"</span>

concise_code = get_concise_code(user_input)

print(<span class="hljs-string">"Concise Code-\n"</span>, concise_code)
</code></pre>
<p>This saves you time by preventing the model from "over-explaining" when you just need a quick snippet, and it gives you more context when you're learning something new or working with a complex piece of code.</p>
<h3 id="heading-code-as-context-pattern">Code-as-Context Pattern</h3>
<p>GPT-5’s massive context window is a game-changer for working with a full file or even a small project. Instead of just giving it a snippet, you can feed it an entire script and ask it to analyze, refactor, or optimize it.</p>
<p><strong>Example:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">my_optimize_codebase</span>(<span class="hljs-params">code_file: str</span>) -> str:</span>
    prompt = <span class="hljs-string">f"""
    You are a performance optimization expert. Analyze the following JavaScript 
    code file for potential performance bottlenecks, redundant code, or memory leaks. 
    Provide a detailed report and then a refactored version of the code.

    Code to analyze:
    \"\"\"
    <span class="hljs-subst">{code_file}</span>
    \"\"\"
    """</span>
    <span class="hljs-comment"># For this demonstration, we'll just return the prompt</span>
    <span class="hljs-keyword">return</span> prompt


<span class="hljs-comment"># User input: "your text input here"</span>
my_code = <span class="hljs-string">"""
// A large, unoptimized JavaScript file
const fetchData = async () => {
  const data = await fetch('https://api.example.com/data');
  const jsonData = await data.json();
  const filteredData = jsonData.filter(item => item.isActive);
  const mappedData = filteredData.map(item => {
    return {
      id: item.id,
      name: item.name.toUpperCase(),
      status: 'active'
    };
  });

  // This is a loop that could be more efficient
  const res= [];
  for (let i = 0; i < mappedData.length; i++) {
    for (let j = 0; j < 10000; j++) {
      res.append(mappedData[i])
    }
  }
  return res;
};
"""</span>

<span class="hljs-keyword">import</span> asyncio

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    prompt = <span class="hljs-keyword">await</span> my_optimize_codebase(my_code)
    print(prompt)

asyncio.run(main())
</code></pre>
<p>This prompt allows GPT-5 to see the full picture. It can understand variable scope, function dependencies, and the overall logic of a file in a way that’s impossible with a single, isolated snippet.</p>
<h2 id="heading-common-pitfalls-to-avoid">Common Pitfalls to Avoid</h2>
<ul>
<li><p><strong>Being Vague or Ambiguous:</strong> A prompt such as “Write some code” will result in a response that lacks focus and is generic. Make sure to clarify which programming language, the specific function, output format, and any limitations that may be required.</p>
</li>
<li><p><strong>Overloading a Single Prompt:</strong> An example “Write a Python script, summarize it in three bullet points, and then translate it into French” has multiple unrelated tasks and will commonly generate disorganized or incomplete reports. Focus on complex requests and break them down into a series of prompts.</p>
</li>
<li><p><strong>Failing to Iterate:</strong> Usually, your first prompt is hardly the most accurate or relevant to the topic of discussion. A general approach is to focus on the prompts generated and go over the concerns of the first sentence as a response. Take into consideration to elaborate, incorporate more facts, and refine, hence have a conversation back and forth to achieve the desired result.</p>
</li>
</ul>
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>With GPT-5, prompt engineering is much more complex than locating a “magic” phrase. You need to shift your thinking to software engineering and articulate it for the AI. You are not merely instructing the AI – you are defining the parameters within which it should work to arrive at an efficient solution.</p>
<p>You can put these 10 patterns, along with the new features of reasoning effort and verbosity control, to make GPT-5 a dependable coding assistant: generating boilerplate code, debugging, code refactoring, or app scaffolding. Start improving your prompt engineering technique with lower models like GPT-4o, Gemini, and others. Once you are ready, upgrade to GPT-5 to power real-world dev workflows.</p>
<p>If you found this article helpful and want to discuss AI development, LLMs, or software development, feel free to connect with me on <a target="_blank" href="https://x.com/itsTarun24">X/Twitter</a>, <a target="_blank" href="https://www.linkedin.com/in/tarunsingh24">LinkedIn</a>, or check out my portfolio on my <a target="_blank" href="http://tarunportfolio.vercel.app/blog">Blog</a>. I regularly share insights about AI, development, technical writing, and so on, and would love to see what you build with this foundation.</p>
 
</article>
<article>
<h1> The Open Source LLM Agent Handbook: How to Automate Complex Tasks with LangGraph and CrewAI </h1>
<p>Balajee Asish Brahmandam — Tue, 03 Jun 2025 14:20:30 +0000</p>
 <p>Ever feel like your AI tools are a bit...well, passive? Like they just sit there, waiting for your next command? Imagine if they could take initiative, break down big problems, and even work together to get things done.</p>
<p>That's exactly what LLM agents bring to the table. They're changing how we automate complex tasks, and they can help bring our AI ideas to life in a whole new way.</p>
<p>In this article, we'll explore what LLM agents are, how they work, and how you can build your very own using awesome open-source frameworks.</p>
<h3 id="heading-what-well-cover">What we’ll cover:</h3>
<ol>
<li><p><a class="post-section-overview" href="#heading-the-current-state-of-llm-agents">The Current State of LLM Agents</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-from-chatbots-to-autonomous-agents">From Chatbots to Autonomous Agents</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-can-agents-do-today">What Can Agents Do Today?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-whats-available-to-build-with">What's Available to Build With?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-now-is-the-best-time-to-learn">Why Now Is the Best Time to Learn</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-what-are-llm-agents-and-why-are-they-a-big-deal">What Are LLM Agents and Why Are They a Big Deal?</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-an-llm">What Is an LLM?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-so-whats-an-llm-agent">So, What’s an LLM Agent?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-does-this-matter">Why Does This Matter?</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-the-rise-of-open-source-agent-frameworks">The Rise of Open-Source Agent Frameworks</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-popular-open-source-agent-frameworks">Popular Open-Source Agent Frameworks</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-these-tools-enable">What These Tools Enable</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-use-a-framework-instead-of-building-from-scratch">Why Use a Framework Instead of Building from Scratch?</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-core-concepts-behind-agent-design">Core Concepts Behind Agent Design</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-the-agent-loop">The Agent Loop</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-key-components-of-an-agent">Key Components of an Agent</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-multi-agent-collaboration">Multi-Agent Collaboration</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-project-automate-your-daily-schedule-from-emails">Project: Automate Your Daily Schedule from Emails</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-were-automating">What We’re Automating</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-1-install-the-required-tools">Step 1: Install the Required Tools</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-define-the-task">Step 2: Define the Task</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-build-the-workflow-with-langgraph">Step 3: Build the Workflow with LangGraph</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-multi-agent-collaboration-with-crewai">Multi-Agent Collaboration with CrewAI</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-crewai">What Is CrewAI?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-sample-roles-for-the-email-summary-task">Sample Roles for the Email Summary Task</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-sample-crewai-code">Sample CrewAI Code</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-what-actually-happens-during-execution">What Actually Happens During Execution?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-are-llm-agents-safe-what-to-know-about-security-and-privacy">Are LLM Agents Safe? What to Know About Security and Privacy</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-troubleshooting-and-tips">Troubleshooting & Tips</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-explore-more-daily-automations">Explore More Daily Automations</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-whats-next-in-agent-technology">What’s Next in Agent Technology?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-final-summary">Final Summary</a></p>
</li>
</ol>
<h2 id="heading-the-current-state-of-llm-agents">The Current State of LLM Agents</h2>
<p>LLM agents are one of the most exciting developments in AI right now. They’re already helping automate real tasks but they’re also still evolving. So where are we today?</p>
<h3 id="heading-from-chatbots-to-autonomous-agents">From Chatbots to Autonomous Agents</h3>
<p>Large Language Models (LLMs) like GPT-4, Claude, Gemini, and LLaMA have evolved from simple chatbots into surprisingly capable reasoning engines. They've gone from answering trivia questions and generating essays to performing complex reasoning, following multi-step instructions, and interacting with tools like web search and code interpreters.</p>
<p>But here’s the catch: these models are <strong>reactive</strong>. They wait for input and give output. They don't retain memory between tasks, plan ahead, or pursue goals on their own. That’s where <strong>LLM agents</strong> come in – they bridge this gap by adding structure, memory, and autonomy.</p>
<h3 id="heading-what-can-agents-do-today">What Can Agents Do Today?</h3>
<p>Right now, LLM agents are already being used for:</p>
<ul>
<li><p>Summarizing emails or documents</p>
</li>
<li><p>Planning daily schedules</p>
</li>
<li><p>Running DevOps scripts</p>
</li>
<li><p>Searching APIs or tools for answers</p>
</li>
<li><p>Collaborating in small “teams” to complete complex tasks</p>
</li>
</ul>
<p>But they’re not perfect yet. Agents can still:</p>
<ul>
<li><p>Get stuck in loops</p>
</li>
<li><p>Misunderstand goals</p>
</li>
<li><p>Require detailed prompts and guardrails</p>
</li>
</ul>
<p>That’s because this technology is still early-stage. Frameworks are getting better fast, but reliability and memory are still works in progress. So just keep that in mind as you experiment.</p>
<h3 id="heading-why-now-is-the-best-time-to-learn">Why Now Is the Best Time to Learn</h3>
<p>The truth is: we’re still early. But not <em>too</em> early.</p>
<p>This is the perfect time to start experimenting with agents:</p>
<ul>
<li><p>The tooling is mature enough to build real projects</p>
</li>
<li><p>The community is growing rapidly</p>
</li>
<li><p>And you don’t need to be an AI expert just comfortable with Python</p>
</li>
</ul>
<h2 id="heading-what-are-llm-agents-and-why-are-they-a-big-deal">What Are LLM Agents and Why Are They a Big Deal?</h2>
<p>Before we dive into the exciting world of agents, let's quickly chat a bit more about the basics.</p>
<h3 id="heading-what-is-an-llm">What Is an LLM?</h3>
<p>An LLM, or Large Language Model, is basically an AI that's learned from a massive amount of text from the internet – think books, articles, code, and tons more. You can picture it as a super-smart autocomplete engine. But it does way more than just finish your sentences. It can also:</p>
<ul>
<li><p>Answer tricky questions</p>
</li>
<li><p>Summarize long articles or documents</p>
</li>
<li><p>Write code, emails, or creative stories</p>
</li>
<li><p>Translate languages instantly</p>
</li>
<li><p>Even solve logic puzzles and have engaging conversations</p>
</li>
</ul>
<p>Chances are you've heard of ChatGPT, which is powered by OpenAI's GPT models. Other popular LLMs you might come across include Claude (from Anthropic), LLaMA (by Meta), Mistral, and Gemini (from Google).</p>
<p>These models work by simply predicting the next word in a sentence based on the context. While that sounds straightforward, when trained on billions of words, LLMs become capable of surprisingly intelligent behavior, understanding your instructions, following step-by-step reasoning, and producing coherent responses across almost any topic you can imagine.</p>
<h3 id="heading-so-whats-an-llm-agent">So, What’s an LLM Agent?</h3>
<p>While LLMs are super powerful, they usually just <em>react –</em> they only respond when you ask them something. An LLM agent, on the other hand, is <em>proactive</em>.</p>
<p>LLM agents can:</p>
<ul>
<li><p>Break down big, complex tasks into smaller, manageable steps</p>
</li>
<li><p>Make smart decisions and figure out what to do next</p>
</li>
<li><p>Use "tools" like web search, calculators, or even other apps</p>
</li>
<li><p>Work towards a goal, even if it takes multiple steps or tries</p>
</li>
<li><p>Team up with other agents to accomplish shared objectives</p>
</li>
</ul>
<p>In short, LLM agents can think, plan, act, and adapt.</p>
<p>Think of an LLM agent like your super-efficient new assistant: you give it a goal, and it figures out how to achieve it all on its own.</p>
<h3 id="heading-why-does-this-matter">Why Does This Matter?</h3>
<p>This shift from just responding to actively pursuing goals opens a ton of exciting possibilities:</p>
<ul>
<li><p>Automating boring IT or DevOps tasks</p>
</li>
<li><p>Generating detailed reports from raw data</p>
</li>
<li><p>Helping you with multi-step research projects</p>
</li>
<li><p>Reading through your daily emails and highlighting key info</p>
</li>
<li><p>Running your internal tools to take real-world actions</p>
</li>
</ul>
<p>Unlike older, rule-based bots, LLM agents can reason, reflect, and learn from their attempts. This makes them a much better fit for real-world tasks that are messy, require flexibility, and depend on understanding context.</p>
<h2 id="heading-the-rise-of-open-source-agent-frameworks">The Rise of Open-Source Agent Frameworks</h2>
<p>Not too long ago, if you wanted to build an AI system that could act autonomously, it meant writing a ton of custom code, painstakingly managing memory, and trying to stitch together dozens of components. It was a complex, delicate, and highly specialized job.</p>
<p>But guess what? That's not the case anymore.</p>
<p>In 2024, a wave of fantastic open-source frameworks hit the scene. These tools have made it dramatically easier to build powerful LLM agents without you having to reinvent the wheel every time.</p>
<h3 id="heading-popular-open-source-agent-frameworks">Popular Open-Source Agent Frameworks</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Framework</strong></td><td><strong>Description</strong></td><td><strong>Maintainer</strong></td></tr>
</thead>
<tbody>
<tr>
<td>LangGraph</td><td>Graph-based framework for agent state and memory</td><td>LangChain</td></tr>
<tr>
<td>CrewAI</td><td>"Role-based, multi-agent collaboration engine"</td><td>Community (CrewAI)</td></tr>
<tr>
<td>AutoGen</td><td>Customizable multi-agent chat orchestration</td><td>Microsoft</td></tr>
<tr>
<td>AgentVerse</td><td>Modular framework for agent simulation and testing</td><td>Open-source project</td></tr>
</tbody>
</table>
</div><h3 id="heading-what-these-tools-enable">What These Tools Enable</h3>
<p>These frameworks give you ready-made building blocks to handle the trickier parts of creating agents:</p>
<ul>
<li><p><strong>Planning</strong> – Letting agents decide their next move</p>
</li>
<li><p><strong>Tool Use</strong> – Easily connecting agents to things like file systems, web browsers, APIs, or databases</p>
</li>
<li><p><strong>Memory</strong> – Storing and retrieving past information or intermediate results for long-term context</p>
</li>
<li><p><strong>Multi-Agent Collaboration</strong> – Setting up teams of agents that work together on shared goals</p>
</li>
</ul>
<h3 id="heading-why-use-a-framework-instead-of-building-from-scratch">Why Use a Framework Instead of Building from Scratch?</h3>
<p>While you <em>could</em> build a custom agent from the ground up, using a framework will save you a huge amount of time and effort. Open-source agent libraries come packed with:</p>
<ul>
<li><p>Built-in support for orchestrating LLMs</p>
</li>
<li><p>Proven patterns for task planning, keeping track of where you are, and getting feedback</p>
</li>
<li><p>Easy integration with popular models like OpenAI, or even models you run locally</p>
</li>
<li><p>The flexibility to grow from a single helpful agent to entire teams of agents</p>
</li>
</ul>
<p>Basically, these frameworks let you focus on <strong>what your agent should do</strong>, rather than getting bogged down in how to build all the internal workings. Plus, choosing open source means you benefit from community contributions, transparency in how they work, and the freedom to tweak them to your exact needs, without getting locked into a single vendor.</p>
<h2 id="heading-core-concepts-behind-agent-design">Core Concepts Behind Agent Design</h2>
<p>To really grasp how LLM agents operate, it helps to think of them as goal-driven systems that constantly cycle through observing, reasoning, and acting. This continuous loop allows them to tackle tasks that go beyond simple questions and answers, moving into true automation, tool usage, and adapting on the fly.</p>
<h3 id="heading-the-agent-loop">The Agent Loop</h3>
<p>Most LLM agents function based on a mental model called the <strong>Agent Loop</strong> a step-by-step cycle that repeats until the job is done. Here’s how it typically works:</p>
<ul>
<li><p><strong>Perceive:</strong> The agent starts by noticing something in its environment or receiving new information. This could be your prompt, a piece of data, or the current state of a system.</p>
</li>
<li><p><strong>Plan:</strong> Based on what it perceives and its overall goal, the agent decides what to do next. It might break the task into smaller sub-goals or figure out the best tool for the job.</p>
</li>
<li><p><strong>Act:</strong> The agent then acts. This could mean running a function, calling an API, searching the web, interacting with a database, or even asking another agent for help.</p>
</li>
<li><p><strong>Reflect:</strong> After acting, the agent looks at the outcome: Did it work? Was the result useful? Should it try a different approach? Based on this, it updates its plan and keeps going until the task is complete.</p>
</li>
</ul>
<p>This loop is what makes agents so dynamic. It allows them to handle ever-changing tasks, learn from partial results, and correct their course qualities that are vital for building truly useful AI assistants.</p>
<h3 id="heading-key-components-of-an-agent">Key Components of an Agent</h3>
<p>To do their job effectively, agents are built around several crucial parts:</p>
<ul>
<li><p><strong>Tools</strong> are how an agent interacts with the real (or digital) world. These can be anything from search engines, code execution environments, file readers, or API clients, to simple calculators or command-line scripts.</p>
</li>
<li><p><strong>Memory</strong> lets agents remember what they've done or seen across different steps. This might include previous things you've said, temporary results, or key decisions. Some frameworks offer short-term memory (just for one session), while others support long-term memory that can span multiple sessions or goals.</p>
</li>
<li><p><strong>Environment</strong> refers to the external data or system context the agent operates within think APIs, documents, databases, files, or sensor inputs. The more information and access an agent have to its environment, the more meaningful actions it can take.</p>
</li>
<li><p><strong>Goal</strong> is the agent's ultimate objective: what it's trying to achieve. Goals should be specific and clear for instance, “generate a daily schedule,” “summarize this document,” or “extract tasks from emails.”</p>
</li>
</ul>
<h3 id="heading-multi-agent-collaboration">Multi-Agent Collaboration</h3>
<p>For more advanced systems, you can even have multiple agents working together to hit a shared target. Each agent can be given a specific <strong>role</strong> that highlights its specialty just like people working on a team.</p>
<p>For example:</p>
<ul>
<li><p>A <strong>researcher agent</strong> might be tasked with gathering information.</p>
</li>
<li><p>A <strong>coder agent</strong> could write Python scripts or automation routines.</p>
</li>
<li><p>A <strong>reviewer agent</strong> might check the results and ensure everything is up to snuff.</p>
</li>
</ul>
<p>These agents can chat with each other, share information, and even debate or vote on decisions. This kind of teamwork allows AI systems to tackle bigger, more complex tasks while keeping things organized and modular.</p>
<h2 id="heading-project-automate-your-daily-schedule-from-emails">Project: Automate Your Daily Schedule from Emails</h2>
<h3 id="heading-what-were-automating">What We’re Automating</h3>
<p>Think about your typical morning routine:</p>
<ul>
<li><p>You open your inbox.</p>
</li>
<li><p>You quickly scan through a bunch of emails.</p>
</li>
<li><p>You try to spot meetings, tasks, and important reminders.</p>
</li>
<li><p>Then, you manually write a to-do list or add things to your calendar.</p>
</li>
</ul>
<p>Let's use an LLM agent to make that process effortless. Our agent will:</p>
<ul>
<li><p>Read a list of your email messages</p>
</li>
<li><p>Pull out time-sensitive items like meetings or deadlines</p>
</li>
<li><p>Summarize everything into a nice, clean daily schedule</p>
</li>
</ul>
<h3 id="heading-step-1-install-the-required-tools">Step 1: Install the Required Tools</h3>
<p>To get started, you'll need three main tools: Python, VSCode, and an OpenAI API key.</p>
<h4 id="heading-1-install-python-39-or-higher">1. Install Python 3.9 or Higher</h4>
<p>Grab the latest version of Python 3.9+ from the official website: <a target="_blank" href="https://www.python.org/downloads/">https://www.python.org/downloads/</a></p>
<p>Once it's installed, double-check it by running <code>python --version</code> in your terminal.</p>
<p>This command simply asks your system to report the Python version currently installed. You'll want to see Python 3.9.x or something higher to ensure compatibility with our project.</p>
<h4 id="heading-2-install-vscode-optional-but-recommended">2. Install VSCode (Optional but Recommended)</h4>
<p>VSCode is a fantastic, user-friendly code editor that works perfectly with Python. You can download it right here: <a target="_blank" href="https://code.visualstudio.com/">https://code.visualstudio.com/</a>.</p>
<h4 id="heading-3-get-your-openai-api-key">3. Get Your OpenAI API Key</h4>
<p>Head over to: https://platform.openai.com</p>
<p>Sign in or create a new account. Navigate to your API Keys page. Click “Create new secret key” and make sure to copy that key somewhere safe for later.</p>
<h4 id="heading-4-install-python-libraries">4. Install Python Libraries</h4>
<p>Open your terminal or command prompt and install these essential packages:</p>
<pre><code class="lang-bash">pip install langgraph langchain openai
</code></pre>
<p>This command uses pip, Python's package manager, to download and install three crucial libraries for our agent:</p>
<ul>
<li><p>langgraph: The core framework we'll use to build our agent's workflow.</p>
</li>
<li><p>langchain: A foundational library for working with large language models, upon which LangGraph is built.</p>
</li>
<li><p>openai: The official Python library for connecting to OpenAI's powerful AI models.</p>
</li>
</ul>
<p>If you're excited to try out multi-agent setups (which we'll cover in Step 5), also install CrewAI:</p>
<pre><code class="lang-bash">pip install crewai
</code></pre>
<p>This command installs CrewAI, a specialized framework that makes it easy to orchestrate multiple AI agents working together as a team.</p>
<p><strong>5. Set Your OpenAI API Key</strong></p>
<p>You need to make sure your Python code can find and use your OpenAI API key. This is typically done by setting it as an environment variable.</p>
<p>On macOS/Linux, run this in your terminal (replace "your-api-key" with your actual key):</p>
<pre><code class="lang-bash"><span class="hljs-built_in">export</span> OPENAI_API_KEY=<span class="hljs-string">"your-api-key"</span>
</code></pre>
<p>This command sets an environment variable named OPENAI_API_KEY. Environment variables are a secure way for applications (like your Python script) to access sensitive information without hardcoding it directly into the code itself.</p>
<p>On Windows (using Command Prompt), do this:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">set</span> OPENAI_API_KEY=<span class="hljs-string">"your-api-key"</span>
</code></pre>
<p>This is the Windows equivalent command to set the <code>OPENAI_API_KEY</code> environment variable.</p>
<p>Now, your Python code will be all set to talk to the OpenAI model!</p>
<h3 id="heading-step-2-define-the-task">Step 2: Define the Task</h3>
<p>We discussed this briefly in the beginning of this section. But to reiterate, this is what we’ll want our agent to do:</p>
<ul>
<li><p>Scan for meetings, events, and important tasks.</p>
</li>
<li><p>Jot them down quickly in a notebook or an app.</p>
</li>
<li><p>Create a rough mental plan for your day.</p>
</li>
</ul>
<p>This routine takes time and mental energy. So having an agent do it for us will be super helpful.</p>
<h3 id="heading-step-3-build-the-workflow-with-langgraph">Step 3: Build the Workflow with LangGraph</h3>
<h4 id="heading-what-is-langgraph">What Is LangGraph?</h4>
<p>LangGraph is a cool framework that helps you build agents using a "graph-based" workflow, kind of like drawing a flowchart. It's powered by LangChain and gives you a lot more control over exactly how each step in your agent's process unfolds.</p>
<p>Each "node" in this graph represents a decision point or a function that:</p>
<ul>
<li><p>Takes some input (its current "state").</p>
</li>
<li><p>Does some reasoning or takes an action (often involving the LLM and its tools).</p>
</li>
<li><p>Returns an updated output (a new "state").</p>
</li>
</ul>
<p>You draw the connections between these nodes, and LangGraph then executes it like a smart, automated state machine.</p>
<h4 id="heading-why-use-langgraph">Why Use LangGraph?</h4>
<ul>
<li><p>You get to control the precise order of execution.</p>
</li>
<li><p>It's fantastic for building workflows that have multiple steps or even branch off into different paths.</p>
</li>
<li><p>It plays nicely with both cloud-based models (like OpenAI) and models you run locally.</p>
</li>
</ul>
<p>Alright – now let’s write the code.</p>
<h5 id="heading-1-simulate-email-input"><strong>1. Simulate Email Input</strong></h5>
<p>In a real application, your agent would probably connect to Gmail or Outlook to fetch your actual emails. For this example, though, we’ll just hardcode some sample messages to keep things simple:</p>
<pre><code class="lang-python">Python

emails = <span class="hljs-string">"""
1. Subject: Standup Call at 10 AM
2. Subject: Client Review due by 5 PM
3. Subject: Lunch with Sarah at noon
4. Subject: AWS Budget Warning – 80% usage
5. Subject: Dentist Appointment - 4 PM
"""</span>
</code></pre>
<p>This multiline Python string, <code>emails</code>, acts as our stand-in for real email content. We're providing a simple, structured list of email subjects to demonstrate how the agent will process text.</p>
<h5 id="heading-2-define-the-agent-logic"><strong>2. Define the Agent Logic</strong></h5>
<p>Now, we'll tell OpenAI’s GPT model how to process this email text and turn it into a summary.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain_openai <span class="hljs-keyword">import</span> ChatOpenAI
<span class="hljs-keyword">from</span> langgraph.graph <span class="hljs-keyword">import</span> StateGraph, END
<span class="hljs-keyword">from</span> typing <span class="hljs-keyword">import</span> TypedDict, Annotated, List
<span class="hljs-keyword">import</span> operator

<span class="hljs-comment"># Define the state for our graph</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">AgentState</span>(<span class="hljs-params">TypedDict</span>):</span>
    emails: str
    result: str

llm = ChatOpenAI(temperature=<span class="hljs-number">0</span>, model=<span class="hljs-string">"gpt-4o"</span>) <span class="hljs-comment"># Using gpt-4o for better performance</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">calendar_summary_agent</span>(<span class="hljs-params">state: AgentState</span>) -> AgentState:</span>
    emails = state[<span class="hljs-string">"emails"</span>]
    prompt = <span class="hljs-string">f"Summarize today's schedule based on these emails, listing time-sensitive items first and then other important notes. Be concise and use bullet points:\n<span class="hljs-subst">{emails}</span>"</span>
    summary = llm.invoke(prompt).content
    <span class="hljs-keyword">return</span> {<span class="hljs-string">"result"</span>: summary, <span class="hljs-string">"emails"</span>: emails} <span class="hljs-comment"># Ensure emails is also returned</span>
</code></pre>
<p>Here’s what’s going on:</p>
<ul>
<li><p><strong>Imports</strong>: We bring in necessary components:</p>
<ul>
<li><p><code>ChatOpenAI</code> to connect to the LLM,</p>
</li>
<li><p><code>StateGraph</code> and <code>END</code> from <code>langgraph.graph</code> to build our agent workflow,</p>
</li>
<li><p><code>TypedDict</code>, <code>Annotated</code>, and <code>List</code> from <code>typing</code> for type checking and structure,</p>
</li>
<li><p><code>operator</code> (though not used in this snippet, it can help with comparisons or logic).</p>
</li>
</ul>
</li>
<li><p><strong>AgentState</strong>: This <code>TypedDict</code> defines the shape of the data our agent will work with. It includes:</p>
<ul>
<li><p><code>emails</code>: the raw input messages.</p>
</li>
<li><p><code>result</code>: the final output (the daily summary).</p>
</li>
</ul>
</li>
<li><p><strong>llm = ChatOpenAI(...)</strong>: Initializes the language model. We're using GPT-4o with <code>temperature=0</code> to ensure consistent, predictable output perfect for structured summarization tasks.</p>
</li>
<li><p><strong>calendar_summary_agent(state: AgentState)</strong>: This function is the "brain" of our agent. It:</p>
<ul>
<li><p>Takes in the current state, which includes a list of emails.</p>
</li>
<li><p>Extracts the emails from that state.</p>
</li>
<li><p>Constructs a prompt that tells the model to generate a concise daily schedule summary using bullet points, prioritizing time-sensitive items.</p>
</li>
<li><p>Sends this prompt to the model with <code>llm.invoke(prompt).content</code>, which returns the LLM’s response as plain text.</p>
</li>
<li><p>Returns a new <code>AgentState</code> dictionary containing:</p>
<ul>
<li><p><code>result</code>: the generated summary,</p>
</li>
<li><p><code>emails</code>: preserved in case we need it downstream.</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<h5 id="heading-3-build-and-run-the-graph"><strong>3. Build and Run the Graph</strong></h5>
<p>Now, let's use LangGraph to map out the flow of our single-agent task and then run it.</p>
<pre><code class="lang-python">builder = StateGraph(AgentState)
builder.add_node(<span class="hljs-string">"calendar"</span>, calendar_summary_agent)
builder.set_entry_point(<span class="hljs-string">"calendar"</span>)
builder.set_finish_point(<span class="hljs-string">"calendar"</span>) <span class="hljs-comment"># END is implicit if not set explicitly</span>

graph = builder.compile()

<span class="hljs-comment"># Run the graph using your simulated email data</span>
result = graph.invoke({<span class="hljs-string">"emails"</span>: emails})
print(result[<span class="hljs-string">"result"</span>])
</code></pre>
<p>Here’s what’s going on:</p>
<ul>
<li><p><strong>builder = StateGraph(AgentState):</strong> We're initiating a StateGraph object. By passing AgentState, we're telling LangGraph the expected data structure for its internal state.</p>
</li>
<li><p><strong>builder.add_node("calendar", calendar_summary_agent):</strong> This line adds a named "node" to our graph. We're calling it "calendar", and we're linking it to our <code>calendar_summary_agent</code> function, meaning that function will be executed when this node is active.</p>
</li>
<li><p><strong>builder.set_entry_point("calendar"):</strong> This sets "calendar" as the very first step in our workflow. When we start the graph, execution will begin here.</p>
</li>
<li><p><strong>builder.set_finish_point("calendar"):</strong> This tells LangGraph that once the "calendar" node finishes its job, the entire graph process is complete.</p>
</li>
<li><p><strong>graph = builder.compile():</strong> This command takes our defined graph blueprint and "compiles" it into an executable workflow.</p>
</li>
<li><p><strong>result = graph.invoke({"emails": emails}):</strong> This is where the magic happens! We're telling our graph to start running. We pass it an initial state that contains our emails data. The graph will then process this data through its nodes until it reaches an end point, returning the final state.</p>
</li>
<li><p><strong>print(result["result"]):</strong> Finally, we grab the summarized schedule from the result (the final state of our graph) and print it to the console.</p>
</li>
</ul>
<h4 id="heading-example-output">Example Output</h4>
<p><code>Your Schedule:</code><br><code>- 10:00 AM – Standup Call</code><br><code>- 12:00 PM – Lunch with Sarah</code><br><code>- 4:00 PM – Dentist Appointment</code><br><code>- Submit client report by 5:00 PM</code><br><code>- AWS Budget Warning – check usage</code></p>
<p>Boom! You've just built an AI agent that can read your emails and whip up your daily schedule. Pretty cool, right? This is a simple yet powerful peek into what LLM agents can do with just a few lines of code.</p>
<h2 id="heading-multi-agent-collaboration-with-crewai">Multi-Agent Collaboration with CrewAI</h2>
<h3 id="heading-what-is-crewai">What Is CrewAI?</h3>
<p>CrewAI is an exciting open-source framework that lets you build <em>teams</em> of agents that work together seamlessly just like a real-world project team! Each agent in a CrewAI setup:</p>
<ul>
<li><p>Has a specific, specialized role.</p>
</li>
<li><p>Can communicate and share information with its teammates.</p>
</li>
<li><p>Collaborates to achieve a shared goal.</p>
</li>
</ul>
<p>This multi-agent approach is super useful when your task is too big or too complex for just one agent, or when breaking it down into specialized parts makes it clearer and more efficient.</p>
<h3 id="heading-sample-roles-for-the-email-summary-task">Sample Roles for the Email Summary Task</h3>
<p>Let's imagine our email summary task being handled by a small team of agents:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Agent Name</strong></td><td><strong>Role</strong></td><td><strong>Responsibility</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Extractor</td><td>Email Scanner</td><td>"Find meetings, reminders, and tasks from emails"</td></tr>
<tr>
<td>Prioritizer</td><td>Schedule Optimizer</td><td>Sort items by urgency and time</td></tr>
<tr>
<td>Formatter</td><td>Output Generator</td><td>"Write a clean, polished daily agenda"</td></tr>
</tbody>
</table>
</div><h3 id="heading-sample-crewai-code">Sample CrewAI Code</h3>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> crewai <span class="hljs-keyword">import</span> Agent, Crew, Task, Process
<span class="hljs-keyword">from</span> langchain_openai <span class="hljs-keyword">import</span> ChatOpenAI
<span class="hljs-keyword">import</span> os

<span class="hljs-comment"># Set your OpenAI API key from environment variables</span>
<span class="hljs-comment"># os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY" # Make sure this is set, or defined directly</span>

<span class="hljs-comment"># Initialize the LLM (using gpt-4o for better performance)</span>
llm = ChatOpenAI(temperature=<span class="hljs-number">0</span>, model=<span class="hljs-string">"gpt-4o"</span>)

<span class="hljs-comment"># Define the agents with specific roles and goals</span>
extractor = Agent(
    role=<span class="hljs-string">"Email Scanner"</span>,
    goal=<span class="hljs-string">"Find all meetings, reminders, and tasks from the given emails, accurately extracting details like time, date, and subject."</span>,
    backstory=<span class="hljs-string">"You are an expert at scanning emails for key information. You meticulously extract every relevant detail."</span>,
    verbose=<span class="hljs-literal">True</span>,
    allow_delegation=<span class="hljs-literal">False</span>,
    llm=llm
)

prioritizer = Agent(
    role=<span class="hljs-string">"Schedule Optimizer"</span>,
    goal=<span class="hljs-string">"Sort extracted items by urgency and time, preparing them for a daily agenda."</span>,
    backstory=<span class="hljs-string">"You are a master of time management, always knowing what needs to be done first. You organize tasks logically."</span>,
    verbose=<span class="hljs-literal">True</span>,
    allow_delegation=<span class="hljs-literal">False</span>,
    llm=llm
)

formatter = Agent(
    role=<span class="hljs-string">"Output Generator"</span>,
    goal=<span class="hljs-string">"Generate a clean, polished, and concise daily agenda in bullet-point format, clearly listing all schedule items."</span>,
    backstory=<span class="hljs-string">"You are a professional secretary, ensuring all outputs are perfectly formatted and easy to read. You prioritize clarity."</span>,
    verbose=<span class="hljs-literal">True</span>,
    allow_delegation=<span class="hljs-literal">False</span>,
    llm=llm
)

<span class="hljs-comment"># Simulate email input</span>
emails = <span class="hljs-string">"""
1. Subject: Standup Call at 10 AM
2. Subject: Client Review due by 5 PM
3. Subject: Lunch with Sarah at noon
4. Subject: AWS Budget Warning – 80% usage
5. Subject: Dentist Appointment - 4 PM
"""</span>

<span class="hljs-comment"># Define the tasks for each agent</span>
extract_task = Task(
    description=<span class="hljs-string">f"Extract all relevant events, meetings, and tasks from these emails: <span class="hljs-subst">{emails}</span>. Focus on precise details."</span>,
    agent=extractor,
    expected_output=<span class="hljs-string">"A list of extracted items with their details (e.g., '- Standup Call at 10 AM', '- Client Review due by 5 PM')."</span>
)

prioritize_task = Task(
    description=<span class="hljs-string">"Prioritize the extracted items by time and urgency. Meetings first, then deadlines, then other notes."</span>,
    agent=prioritizer,
    context=[extract_task], <span class="hljs-comment"># The output of extract_task is the input here</span>
    expected_output=<span class="hljs-string">"A prioritized list of schedule items."</span>
)

format_task = Task(
    description=<span class="hljs-string">"Format the prioritized schedule into a clean, easy-to-read daily agenda using bullet points. Ensure concise language."</span>,
    agent=formatter,
    context=[prioritize_task], <span class="hljs-comment"># The output of prioritize_task is the input here</span>
    expected_output=<span class="hljs-string">"A well-formatted daily agenda with bullet points."</span>
)

<span class="hljs-comment"># Instantiate the crew</span>
crew = Crew(
    agents=[extractor, prioritizer, formatter],
    tasks=[extract_task, prioritize_task, format_task],
    process=Process.sequential, <span class="hljs-comment"># Tasks are executed sequentially</span>
    verbose=<span class="hljs-number">2</span> <span class="hljs-comment"># Outputs more details during execution</span>
)

<span class="hljs-comment"># Run the crew</span>
result = crew.kickoff()
print(<span class="hljs-string">"\n########################"</span>)
print(<span class="hljs-string">"## Final Daily Agenda ##"</span>)
print(<span class="hljs-string">"########################\n"</span>)
print(result)
</code></pre>
<p>Here’s what’s going on:</p>
<ul>
<li><p><strong>Imports:</strong> We bring in key classes from CrewAI: Agent, Crew, Task, and Process. We also import <code>ChatOpenAI</code> for our language model and os to handle environment variables.</p>
</li>
<li><p><strong>llm = ChatOpenAI(...):</strong> Just like in the LangGraph example, this sets up our OpenAI language model, making sure its responses are direct (temperature=0) and using the gpt-4o model.</p>
</li>
<li><p><strong>Agent Definitions (extractor, prioritizer, formatter):</strong></p>
<ul>
<li><p>Each of these variables creates an Agent instance. An agent is defined by its role (what it does), a specific goal it's trying to achieve, and a backstory (a sort of personality or expertise that helps the LLM understand its purpose better).</p>
</li>
<li><p>verbose=True is super helpful for debugging, as it makes the agents print out their "thoughts" as they work.</p>
</li>
<li><p>allow_delegation=False means these agents won't pass their assigned tasks to other agents (though this can be set to True for more complex delegation scenarios).</p>
</li>
<li><p>llm=llm connects each agent to our OpenAI language model.</p>
</li>
</ul>
</li>
<li><p><strong>Simulated emails:</strong> We reuse the same sample email data for this example.</p>
</li>
<li><p><strong>Task Definitions (extract_task, prioritize_task, format_task):</strong></p>
<ul>
<li><p>Each Task defines a specific piece of work that an agent needs to perform.</p>
</li>
<li><p>description clearly tells the agent what the task involves.</p>
</li>
<li><p>agent assigns this task to one of our defined agents (e.g., extractor for extract_task).</p>
</li>
<li><p>context=[...] is a critical part of CrewAI's collaboration. It tells a task to use the <em>output</em> of a previous task as its <em>input</em>. For instance, prioritize_task takes the extract_task's output as its context.</p>
</li>
<li><p>expected_output gives the agent an idea of what its result should look like, helping guide the LLM.</p>
</li>
</ul>
</li>
<li><p><strong>crew = Crew(...):</strong></p>
<ul>
<li><p>This is where we assemble our team! We create a Crew instance, giving it our list of agents and tasks.</p>
</li>
<li><p>process=Process.sequential tells the crew to execute tasks one after another in the order they're defined in the tasks list. CrewAI also supports more advanced processes like hierarchical ones.</p>
</li>
<li><p>verbose=2 will show you a very detailed log of the crew's internal workings and communication.</p>
</li>
</ul>
</li>
<li><p><strong>result = crew.kickoff():</strong> This command officially starts the entire multi-agent workflow. The agents will begin collaborating, passing information, and working through their assigned tasks in sequence.</p>
</li>
<li><p><strong>fprint(result):</strong> Finally, the consolidated output from the entire crew's collaborative effort is printed to your console.</p>
</li>
</ul>
<p>CrewAI cleverly handles all the communication between agents, figures out who needs to work on what and when, and passes the output smoothly from one agent to the next it's like having a mini AI assembly line!</p>
<h2 id="heading-what-actually-happens-during-execution">What Actually Happens During Execution?</h2>
<p>So, whether you're using LangGraph or CrewAI, what's really going on behind the scenes when an agent runs? Let's break down the execution process:</p>
<ul>
<li><p>The system gets an <strong>input state</strong> (for example, your emails).</p>
</li>
<li><p>The first agent or graph node reads this input and uses a <strong>Large Language Model (LLM)</strong> to make sense of it.</p>
</li>
<li><p>Based on its understanding, the agent decides on an <strong>action</strong> like pulling out key events or calling a specific tool.</p>
</li>
<li><p>If needed, the agent might <strong>invoke tools</strong> (like a web search or a file reader) to get more context or perform external operations.</p>
</li>
<li><p>The result of that action is then <strong>passed to the next agent</strong> in the team (if it's a multi-agent setup) or returned directly to you.</p>
</li>
</ul>
<p>Execution keeps going until:</p>
<ul>
<li><p>The task is fully completed.</p>
</li>
<li><p>All agents have finished their assigned roles.</p>
</li>
<li><p>A stopping condition or a designated "END" point in the workflow is reached.</p>
</li>
</ul>
<p>Think of this as a super-smart workflow engine where every single step involves reasoning, making decisions, and remembering previous interactions.</p>
<h2 id="heading-are-llm-agents-safe-what-to-know-about-security-and-privacy">Are LLM Agents Safe? What to Know About Security and Privacy</h2>
<p>As cool as LLM agents are, they raise an important question: <em>can you really trust an AI to run parts of your workflow or interact with your data?</em> It depends. If you’re using services like OpenAI or Anthropic, your data is encrypted in transit and (as of now) isn’t used for training.</p>
<p>But some data might still be temporarily logged to prevent abuse. That’s usually fine for testing and personal projects, but if you’re working with sensitive business info, customer data, or anything private, you’ll want to be careful.</p>
<p>Use anonymized inputs, avoid exposing full datasets, and consider running agents locally using open-source models like LLaMA or Mistral if full control matters to you.</p>
<p>You can also set clear boundaries for your agents so they don’t overstep. Think of it like onboarding a new intern: you wouldn’t give them access to everything on day one.</p>
<p>Give agents only the tools and files they need, keep logs of what they do, and always review the results before letting them make real changes.</p>
<p>As this tech grows, more safety features are coming like better sandboxing, memory limits, and role-based access. But for now, it’s smart to treat your agents like powerful helpers that still need some human supervision.</p>
<h2 id="heading-troubleshooting-amp-tips">Troubleshooting & Tips</h2>
<p>Sometimes, agents can be a bit quirky! Here are some common issues you might run into and how to fix them:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Issue</strong></td><td><strong>Suggested Fix</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Agent seems to loop forever</td><td>Set a maximum number of iterations or define a clearer stopping point.</td></tr>
<tr>
<td>Output is too chatty or verbose</td><td>Use more specific prompts (for example, “Respond in bullet points only”).</td></tr>
<tr>
<td>Input is too long or gets cut off</td><td>Break down large pieces of content into smaller chunks and summarize them individually.</td></tr>
<tr>
<td>Agent runs too slowly</td><td>Try using a faster LLM model like gpt-3.5 or consider running a local model.</td></tr>
</tbody>
</table>
</div><p>A handy tip: You can also add print() statements or logging messages inside your agent functions to see what's happening at each stage and debug state transitions.</p>
<h2 id="heading-explore-more-daily-automations">Explore More Daily Automations</h2>
<p>Once you've built one agent-based task, you'll find it incredibly easy to adapt the pattern for other automations. Here are some cool ideas to get your creative juices flowing:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Task Type</strong></td><td><strong>Example Automation</strong></td></tr>
</thead>
<tbody>
<tr>
<td>DevOps Assistant</td><td>"Read system logs, detect potential issues, and suggest solutions."</td></tr>
<tr>
<td>Finance Tracker</td><td>Read bank statements or CSV files and summarize your spending habits/budgets.</td></tr>
<tr>
<td>Meeting Organizer</td><td>After a meeting, automatically extract action items and assign owners.</td></tr>
<tr>
<td>Inbox Cleaner</td><td>"Automatically label, archive, and delete non-urgent emails."</td></tr>
<tr>
<td>Note Summarizer</td><td>Convert your daily notes into a neatly formatted to-do list or summary.</td></tr>
<tr>
<td>Link Checker</td><td>Extract URLs from documents and automatically test if they're still valid.</td></tr>
<tr>
<td>Resume Formatter</td><td>Score resumes against job descriptions and format them automatically.</td></tr>
</tbody>
</table>
</div><p>Each of these can be built using the very same principles and frameworks we discussed whether that's LangGraph or CrewAI.</p>
<h2 id="heading-whats-next-in-agent-technology">What’s Next in Agent Technology?</h2>
<p>LLM agents are evolving at lightning speed, and the next wave of innovation is already here:</p>
<ul>
<li><p><strong>Smarter memory systems</strong>: Expect agents to have better long-term memory, allowing them to learn over extended periods and remember past conversations and actions.</p>
</li>
<li><p><strong>Multi-modal agents</strong>: Agents won't just handle text anymore! They'll be able to process and understand images, audio, and video, making them much more versatile.</p>
</li>
<li><p><strong>Advanced planning frameworks</strong>: Techniques like ReAct, Toolformer, and AutoGen are constantly improving agents' ability to reason, plan, and reduce those pesky "hallucinations."</p>
</li>
<li><p><strong>Edge deployment</strong>: Imagine agents running entirely offline on your local computer or device using lightweight models like LLaMA 3 or Mistral.</p>
</li>
</ul>
<p>In the very near future, you'll see agents seamlessly integrated into:</p>
<ul>
<li><p>Your DevOps pipelines</p>
</li>
<li><p>Big enterprise workflows</p>
</li>
<li><p>Everyday productivity tools</p>
</li>
<li><p>Mobile apps and smart devices</p>
</li>
<li><p>Games, simulations, and educational platforms</p>
</li>
</ul>
<h2 id="heading-final-summary">Final Summary</h2>
<p>Alright, let's quickly recap all the cool stuff you've just learned and accomplished:</p>
<ul>
<li><p>You've gotten a solid grasp of what LLM agents are and why they're so powerful.</p>
</li>
<li><p>You've seen how open-source frameworks like LangGraph and CrewAI make building agents much easier.</p>
</li>
<li><p>You've built a real LLM agent using LangGraph to automate a common daily task: summarizing your inbox!</p>
</li>
<li><p>You've explored the world of multi-agent collaboration with CrewAI, understanding how teams of AIs can work together.</p>
</li>
<li><p>You've learned how to take these principles and scale them to automate countless other tasks.</p>
</li>
</ul>
<p>So, next time you find yourself stuck doing something repetitive, just ask yourself: "Hey, can I build an agent for that?" The answer is probably yes!</p>
<h3 id="heading-resources-recap">Resources Recap</h3>
<p>Here are some helpful resources if you want to dive deeper into building LLM agents:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Resource</strong></td><td><strong>Link</strong></td></tr>
</thead>
<tbody>
<tr>
<td>LangGraph Docs</td><td><a target="_blank" href="https://docs.langgraph.dev/">https://docs.langgraph.dev/</a></td></tr>
<tr>
<td>CrewAI GitHub</td><td><a target="_blank" href="https://github.com/joaomdmoura/crewAI">https://github.com/joaomdmoura/crewAI</a></td></tr>
<tr>
<td>LangChain Docs</td><td><a target="_blank" href="https://docs.langchain.com/docs/">https://docs.langchain.com/docs/</a></td></tr>
<tr>
<td>OpenAI API Docs</td><td><a target="_blank" href="https://platform.openai.com/docs">https://platform.openai.com/docs</a></td></tr>
<tr>
<td>Python 3.9+</td><td><a target="_blank" href="https://www.python.org/downloads/">https://www.python.org/downloads/</a></td></tr>
<tr>
<td>VSCode</td><td><a target="_blank" href="https://code.visualstudio.com/">https://code.visualstudio.com/</a></td></tr>
</tbody>
</table>
</div> 
</article>
<article>
<h1> The Agentic AI Handbook: A Beginner's Guide to Autonomous Intelligent Agents </h1>
<p>Balajee Asish Brahmandam — Wed, 28 May 2025 14:22:20 +0000</p>
 <p>You may have heard about “Agentic AI” systems and wondered what they’re all about. Well, in basic terms, the idea behind Agentic AI is that it can see its surroundings, set and pursue goals, plan and reason through many processes, and learn from experience.</p>
<p>Unlike chatbots or rule-based software, agentic AI actively responds to user requests. It may break activities into smaller tasks, make decisions based on a high-level goal, and change its behavior over time using tools or other specialized AI components.</p>
<p>To summarize, <a target="_blank" href="https://blogs.nvidia.com/blog/what-is-agentic-ai/">agentic AI systems</a> "solve complex, multi-step problems autonomously by using sophisticated reasoning and iterative planning." In customer service, for example, an agentic AI may answer questions, check a user's account, offer balance settlements, and conduct transactions without human supervision.</p>
<p>So, agentic AI is "<a target="_blank" href="https://www.ibm.com/think/topics/agentic-ai">AI with agency</a>”. Given a problem context, it sets goals, creates strategies, manipulates the environment or software tools, and learns from the results.</p>
<p>But at the moment, most popular AI systems are reactive or non-agentic, doing a specific job or reacting to inputs without preparation. For example, Siri or a traditional image classifier use predefined models or rules to map inputs to outputs. Instead of long-term goals or multi-step processes, <a target="_blank" href="https://www.ibm.com/think/topics">reactive AI</a> "responds to specific inputs with pre-defined actions". Agentic AI is more like a robot or personal assistant that can handle reasoning chains, adapt, and "think" before acting.</p>
<h3 id="heading-what-well-cover-here">What we’ll cover here</h3>
<p>In this article, you’ll learn what makes Agentic AI fundamentally different from traditional reactive systems. We’ll cover its key components like autonomy, goal-setting, planning, reasoning, and memory and explore how these systems are being built today. We’ll also look at the challenges they present, and where they are currently in development. Finally, you’ll get a hands-on tutorial on how to build your own simple agent using Python and LangChain.</p>
<h3 id="heading-table-of-contents">Table of Contents:</h3>
<ol>
<li><p><a class="post-section-overview" href="#heading-agentic-vs-reactive-ai">Agentic vs Reactive AI</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-key-components-of-ai-agency">Key Components of AI Agency</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-autonomy">Autonomy</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-goal-directed-behavior">Goal-Directed Behavior</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-planning">Planning</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-reasoning">Reasoning</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-memory">Memory</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-how-does-agentic-ai-know-what-to-do">How Does Agentic AI Know What to Do?</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-1-it-uses-a-pretrained-ai-model">1. It Uses a Pretrained AI Model</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-2-it-follows-instructions-in-prompts">2. It Follows Instructions in Prompts</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-3-it-uses-tools-but-only-when-told-how">3. It Uses Tools, But Only When Told How</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-4-it-can-remember-sometimes">4. It Can Remember (Sometimes)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-5-its-not-fully-autonomous-yet">5. It’s Not Fully Autonomous — Yet</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-so-whats-the-current-state-of-agentic-ai">So What’s the Current State of Agentic AI?</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-exists-today">What Exists Today</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-whats-still-experimental">What’s Still Experimental</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-are-we-close-to-truly-autonomous-agents">Are We Close to Truly Autonomous Agents?</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-building-agentic-ai-frameworks-and-approaches">Building Agentic AI: Frameworks and Approaches</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-reinforcement-learning-rl-agents">Reinforcement Learning (RL) Agents</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-llm-based-generative-agents">LLM-Based (Generative) Agents</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-multi-agent-and-orchestration-frameworks">Multi-Agent and Orchestration Frameworks</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-classical-planning-and-symbolic-ai">Classical Planning and Symbolic AI</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-tool-augmented-reasoning">Tool-augmented Reasoning</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-major-challenges-of-agentic-ai">Major Challenges of Agentic AI</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-alignment-and-value-specification">Alignment and Value Specification</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-unintended-consequences">Unintended Consequences</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-safety-and-security">Safety and Security</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-coordination-and-scalability">Coordination and Scalability</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-ethical-and-legal-questions">Ethical and Legal Questions</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-code-snippet-and-real-world-examples">Code Snippet and Real-World Examples</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-tutorial-build-your-first-agentic-ai-with-python">Tutorial: Build Your First Agentic AI with Python</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-real-world-use-case">Real-World Use Case</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-prerequisites-what-you-need">Prerequisites – What You Need</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-by-step-tutorial">Step-by-Step Tutorial</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-agentic-vs-reactive-ai"><strong>Agentic vs Reactive AI</strong></h2>
<p>Before we dive fully in, I want to make sure the differences between non-agentic and agentic AI are clear.</p>
<p>Non-agentic reactive AI uses learned models or rules to map inputs to outputs. It replies to one idea or task at a time, not starting additional ones. Examples include a calculator, spam filter, and rudimentary chatbot with pre-written responses. Reactive AI cannot plan or improve without reprogramming.</p>
<p>Agentic AI, on the other hand, acts independently with goals. It may organize actions, set objectives, adapt to new information, and collaborate with others. Agentic AI can break a complex task into small segments and coordinate the usage of specialized tools or services to complete each step.</p>
<p>The agent is also proactive. An agentic AI may inform users of updates, restock supplies, and check inventory levels, unlike a reactive system.</p>
<p>The difference is a paradigmatic shift: modern agentic systems include several specialized agents working together on a high-level objective, with dynamic task breakdown and even permanent memory, instead of a single model. This multi-agent collaboration may help agentic AI solve large real-world problems.</p>
<p>Cutting-edge prototypes like intelligent chatbots with tool integration, autonomous driving software, and coordinated industrial robots are entering agentic territory, but today's reactive AI virtual assistants (Alexa, Siri) may blur the line. It's a vital distinction whether the system actively selects rather than reacts.</p>
<h2 id="heading-key-components-of-ai-agency"><strong>Key Components of AI Agency</strong></h2>
<p>Agentic AI systems are characterized by several core capabilities that give them <strong>agency</strong>. Let’s look at these now.</p>
<h3 id="heading-autonomy"><strong>Autonomy</strong></h3>
<p>An autonomous agent may work without human supervision. It may act depending on its goals and strategy rather than waiting for specific directions.</p>
<p>The agent must use sensors or data streams to perceive, evaluate, and decide to be autonomous. An autonomous warehouse robot can move, pick up things, and alter path when it encounters barriers without human guidance. Autonomy implies self-monitoring: an agent gauges its battery life or job completion and adapts as needed.</p>
<p>An agentic AI's “reasoning engine” (usually a large language model or similar system) makes decisions and can adjust its behavior based on user feedback or rewards.</p>
<p>As IBM explains, “without any human intervention, agentic AI can act independently, adapt to new situations, make decisions, and learn from experience” (<a target="_blank" href="https://www.ibm.com/think/topics/agentic-ai">source</a>). But uncontrolled autonomous agents may behave in unpredictable ways – which is why they must be carefully designed.</p>
<p>Although agentic AIs can operate on their own, their goals, tools, and boundaries must be clearly planned to avoid unintended or harmful outcomes. Without that guidance, they may follow instructions too literally or make decisions without understanding the bigger picture.</p>
<h3 id="heading-goal-directed-behavior"><strong>Goal-Directed Behavior</strong></h3>
<p>Agentic AI is goal-directed. The system attempts to achieve one or more goals. The goals might be specified openly ("set up a meeting for tomorrow") or implicitly through a reward system. Instead of following a script, the agent chooses how to achieve its goal. It may choose methods, subgoals, and long-term goals.</p>
<p>Unplanned reactive AI has short-term or implicit goals (for example, recognize an image, guess the next word). Agentic AIs aim toward long-term goals. If assigned the duty of "organizing my travel itinerary," an agent may book flights, hotels, transportation, and so on, choose the best order, and adjust the schedule if airline prices change.</p>
<p>Business and research sources underline this distinction. Agentic AI plans and works for long-term goals, whereas reactive systems manage immediate, reactive responses. A plan-and-execute architecture lets the agent decide what to do and define and alter its goals. Instead of distinct, separate acts, it progressively performs a series. Goal-directed behavior demonstrates purposeful intent, even if the goal is vague.</p>
<h3 id="heading-planning"><strong>Planning</strong></h3>
<p>An agent plans to achieve its goals. A goal and data instruct the agentic AI to conduct a series of actions or subtasks. Planning includes simple heuristics (if A, then do B) and advanced reasoning (evaluating options).</p>
<p>Modern agentic AI uses planner-executor architectures with chain-of-thought prompting. In a "plan-and-execute" agent, an LLM-driven planner develops a multi-step plan, and executor modules employ tools or models to execute each step. ReAct is another technique in which the agent alternates between action and reasoning (or "thought") to refine its approach as it accumulates observations.</p>
<p>Planning often involves search and optimization using neural networks, decision trees, or graph-based techniques. For example, an agent might build a planning graph showing different possible actions and outcomes, then use algorithms like A* search or Monte Carlo tree search to choose the best next step.</p>
<p>In some cases, the agent simulates multiple possible futures to evaluate which actions are most likely to lead to success. Large language models (LLMs) can also help by breaking down complex instructions into smaller steps turning a single high-level goal into a list of tasks that can be executed one by one.</p>
<p>Here’s a simplified example (pseudocode) of an agent loop:</p>
<pre><code class="lang-python">goal = <span class="hljs-string">"prepare presentation on AI"</span>
agent = AI_Agent(goal)
environment = TaskEnvironment()
 <span class="hljs-comment"># Loop until the task is complete</span>
<span class="hljs-keyword">while</span> <span class="hljs-keyword">not</span> environment.task_complete():
    observation = agent.perceive(environment)
    plan = agent.make_plan(observation)        <span class="hljs-comment"># e.g., list of steps</span>
    action = plan.next_step()
    result = agent.act(action, environment)
    agent.learn(result)                       <span class="hljs-comment"># update memory or strategy</span>
</code></pre>
<p>Here, the agent perceives the current state, plans a sequence of steps toward its goal, acts by executing the next step, and then learns from the outcome before repeating. This cycle captures the core loop of an autonomous agent.</p>
<h3 id="heading-reasoning"><strong>Reasoning</strong></h3>
<p>Making judgments by applying logic and inference is known as reasoning. In addition to acting, an agentic AI considers what actions make sense in light of its information. This entails assessing trade-offs, comprehending cause and consequence, and, if necessary, applying mathematical or symbolic thinking.</p>
<p>An agent may, for instance, apply deductive reasoning, like "If sales fall below X, reorder inventory" or "All invoices are paid by Friday. This is an invoice, so I should pay it by Friday". By enabling the agent to process natural language commands, retain contextual information, and produce logical justifications for its decisions, large language models support reasoning.</p>
<p>An LLM "acts as the orchestrator or reasoning engine" that comprehends tasks and produces solutions, <a target="_blank" href="https://python.langchain.com/docs/">according to one explanation in the LangChain docs</a>. In order to retrieve pertinent information for reasoning, agents also employ strategies such as <a target="_blank" href="https://www.freecodecamp.org/news/learn-rag-fundamentals-and-advanced-techniques/">retrieval-augmented generation (RAG)</a>.</p>
<p>Agentic reasoning is essentially like internal planning and problem-solving. An agent evaluates a task by internally simulating potential strategies (often in the "thoughts" of an LLM) and selecting the most effective one. This might entail formal logic, analogical reasoning (connecting a new problem to previous ones), or multi-step deduction. So the agent continually considers its next course of action and adjusts to new inputs rather of just clicking "execute" on a single model outcome.</p>
<h3 id="heading-memory"><strong>Memory</strong></h3>
<p>Agents can utilize memory to recall prior experiences, information, and interactions to make decisions. A memoryless AI would treat every moment as new. Agentic systems record their behaviors, outcomes, and context. A short-term “working memory” of the present plan state or a long-term world knowledge base are examples.</p>
<p>A customer-service agent may remember a user's name and issue history to avoid repeating inquiries. Game-playing agents learn from past positions to move better. <a target="_blank" href="https://research.ibm.com/blog/agentic-ai">IBM says</a> AI agent memory “refers to an AI system’s ability to store and recall past experiences to improve decision-making, perception and overall performance”. Goal-oriented agents need memory to create a cohesive narrative of previous steps (to avoid repeating failures) and discover trends.</p>
<p>Agentic architectures incorporate memory modules like databases or vector storage that the LLM may query. Large language models are stateless. Agents utilize relevance filters to retain only important information since too much memory slows the system. Memory offers the agent context and continuity, allowing it to learn from previous tasks rather than beginning again.</p>
<h2 id="heading-how-does-agentic-ai-know-what-to-do">How Does Agentic AI Know What to Do?</h2>
<p>Agentic AI might seem smart, but it’s not actually “thinking” like a human. Let’s break down how it really works.</p>
<h3 id="heading-1-it-uses-a-pretrained-ai-model">1. It Uses a Pretrained AI Model</h3>
<p>At the heart of most agentic systems is a large language model (LLM) like GPT-4. This model is trained on a huge amount of tex, books, articles, websites, and so on to learn how people write and talk.</p>
<p>But it wasn’t trained to act like an agent. It was trained to predict the next word in a sentence.</p>
<p>When we give it the right prompts, it can seem like it’s making plans or solving problems. Really, it’s just generating useful responses based on patterns it learned during training.</p>
<h3 id="heading-2-it-follows-instructions-in-prompts">2. It Follows Instructions in Prompts</h3>
<p>Agentic AI doesn’t figure out what to do by itself – developers give it structure using prompts.</p>
<p>For example:</p>
<ul>
<li><p>“You are an assistant. First, think step by step. Then take action.”</p>
</li>
<li><p>“Here’s a goal: research coding tools. Plan steps. Use Wikipedia to search.”</p>
</li>
</ul>
<p>These prompts help the AI simulate planning, decision-making, and action.</p>
<h3 id="heading-3-it-uses-tools-but-only-when-told-how">3. It Uses Tools, But Only When Told How</h3>
<p>The AI doesn’t automatically know how to use tools like search engines or calculators. Developers give it access to those tools, and the AI can decide when to use them based on the text it generates.</p>
<p>Think of it like this: the AI suggests, “Now I’ll look something up,” and the system makes that happen.</p>
<h3 id="heading-4-it-can-remember-sometimes">4. It Can Remember (Sometimes)</h3>
<p>Some agents use short-term memory to remember past questions or results. Others store useful information in a database for later. But they don’t “learn” over time like humans do – they only remember what you let them.</p>
<h3 id="heading-5-its-not-fully-autonomous-yet">5. It’s Not Fully Autonomous — Yet</h3>
<p>Most agentic systems today are not fully self-learning or self-aware. They’re smart combinations of:</p>
<ul>
<li><p>Pretrained AI</p>
</li>
<li><p>Prompts</p>
</li>
<li><p>Tools</p>
</li>
<li><p>Memory</p>
</li>
</ul>
<p>Their “autonomy” comes from how all these parts work together – not from deep understanding or long-term training.</p>
<h2 id="heading-so-whats-the-current-state-of-agentic-ai">So What’s the Current State of Agentic AI?</h2>
<p>Agentic AI is still an emerging area of development. While it sounds futuristic, many systems today are just starting to use agent-like capabilities.</p>
<h3 id="heading-what-exists-today">What Exists Today</h3>
<h4 id="heading-simple-agentic-systems-already-work-in-limited-ways">Simple agentic systems already work in limited ways</h4>
<ul>
<li><p>For example, some customer service bots can check account details, respond to questions, and escalate issues automatically.</p>
</li>
<li><p>Warehouse robots can plan simple routes and avoid obstacles on their own.</p>
</li>
<li><p>Coding assistants like GitHub Copilot can help write and fix code based on natural language input.</p>
</li>
</ul>
<p>These systems show basic agentic behavior like goal-following and tool use but usually in a narrow, structured environment.</p>
<h3 id="heading-whats-still-experimental">What’s Still Experimental</h3>
<ul>
<li><p>Fully autonomous, multi-purpose agents the kind that can reason deeply, make long-term plans, and adapt to new tools, are still in research or prototype stages.</p>
</li>
<li><p>Projects like <strong>AutoGPT</strong>, <strong>BabyAGI</strong>, and <strong>OpenDevin</strong> are exciting, but they’re mostly experimental and require human oversight.</p>
</li>
</ul>
<p>Most current agentic systems:</p>
<ul>
<li><p>Don’t learn continuously</p>
</li>
<li><p>Struggle with unpredictable environments</p>
</li>
<li><p>Require a lot of setup to avoid errors or unexpected behavior</p>
</li>
</ul>
<h3 id="heading-are-we-close-to-truly-autonomous-agents">Are We Close to Truly Autonomous Agents?</h3>
<p>We’re getting closer, but we’re not there yet.</p>
<p>Today’s agentic AI is like a very clever assistant that can follow instructions, use tools, and plan steps. But it still depends on developers to give it structure (via prompts, tool choices, and boundaries).</p>
<p>In short, Agentic AI works in specific, well-designed use cases. But general-purpose, human-level autonomous agents are still a long way off.</p>
<h2 id="heading-building-agentic-ai-frameworks-and-approaches"><strong>Building Agentic AI: Frameworks and Approaches</strong></h2>
<p>Researchers and engineers have developed various frameworks and tools to construct agentic AI systems. Let’s discuss some key approaches.</p>
<h3 id="heading-reinforcement-learning-rl-agents"><strong>Reinforcement Learning (RL) Agents</strong></h3>
<p>In artificial intelligence, traditional agents are frequently constructed via <a target="_blank" href="https://www.freecodecamp.org/news/how-to-apply-reinforcement-learning-to-real-life-planning-problems-90f8fa3dc0c5/">reinforcement learning</a>, in which the agent learns to maximize a reward signal through trial and error. Atari game agents and DeepMind's AlphaGo are classic examples.</p>
<p>In addition to planning (in the sense of calculating a policy) and learning from interactions, RL agents are goal-directed (maximizing reward). Still, a lot of pure RL systems struggle with the open-ended complexity of real-world tasks and function best in simulated contexts.</p>
<p>While RL components are occasionally incorporated into modern agentic AI (for example, an agent may utilize RL to drive a robot at a basic level), they are frequently supplemented with other methods for higher level thinking.</p>
<h3 id="heading-llm-based-generative-agents"><strong>LLM-Based (Generative) Agents</strong></h3>
<p>The use of LLMs as reasoning engines within agents has become popular due to the recent explosion of large language models. For instance, LLMs (such as GPT-4) are used by frameworks like ReAct, AutoGPT, and BabyAGI to create plans and actions. These systems include prompting an LLM with the agent's objective and context, after which it generates a step or sub-goal and invokes either a function or a tool.</p>
<p>One design, frequently referred to as a ReAct loop, alternates between "Thought" (the LLM planning or reasoning) and "Action" (calling upon tools or APIs). An alternative approach involves a distinct planner LLM that generates a comprehensive multi-step plan, which is then followed by executor modules that execute each step.</p>
<p>To increase their capabilities, LLM agents frequently employ tools like search engines, calculators, and API calls. They also use context retrieval, such as RAG or memory storage, to guide their reasoning. <a target="_blank" href="https://www.freecodecamp.org/news/beginners-guide-to-langchain/">LangChain</a> and LangGraph are well-known open-source frameworks that offer building blocks (memory buffers, tool integration, and so on) for creating unique agents.</p>
<h3 id="heading-multi-agent-and-orchestration-frameworks"><strong>Multi-Agent and Orchestration Frameworks</strong></h3>
<p>Several sub-agents are used in many agentic AI architectures. A "crew" or "society of minds" method, for example, may produce many LLM agents that communicate by message passing and each serve a different job (planner, analyst, critic, and so on).</p>
<p>Orchestrated multi-agent processes are demonstrated by projects such as AutoGen, ChatDev, or MetaGPT. Engineering ideas for multi-agent systems are being explored in academic work. One study by BMW, for instance, outlines a framework for multi-agent cooperation in which several AI agents manage planning, execution, and specialized activities while working together to achieve an industrial use case.</p>
<p>These systems frequently have scheduling logic to allocate agents to subtasks and a task decomposition module, which breaks a goal down into its component elements. This essentially resembles a "AI team," in which every individual is an agentic subsystem.</p>
<h3 id="heading-classical-planning-and-symbolic-ai"><strong>Classical Planning and Symbolic AI</strong></h3>
<p>AI planning was examined in symbolic terms before to the current ML revival (STRIPS, PDDL planners, and so on). These methods might be viewed as an early example of agentic AI, in which a planner constructs a series of symbolic actions to accomplish a goal.</p>
<p>These concepts are occasionally included into contemporary agentic AI. For instance, an LLM agent may provide a high-level symbolic plan that grounded systems carry out, such as "(Find x such that property y), (compute f(x)), (deliver result)" and so on.</p>
<p>There are also hybrid architectures that combine traditional search with neural networks. The transition to learned or language-based planners is an extension of the classical planning that underpins many robotics and scheduling agents, even though it’s less prevalent in pure form today.</p>
<h3 id="heading-tool-augmented-reasoning"><strong>Tool-augmented Reasoning</strong></h3>
<p>In many agentic systems, granting the agent access to external functions and information is a viable strategy. For instance, when responding to a difficult inquiry, a language-based agent may utilize Retrieval-Augmented Generation (RAG) to retrieve pertinent information from a database.</p>
<p>As "tools" that it may use, it might also include a calculator, a web browser, a database API, or bespoke code. Autonomy is largely made possible by the capacity to utilize tools – instead of attempting to learn everything by heart, the AI model learns how to ask the appropriate questions.</p>
<p>In sum, building an agentic AI often means combining multiple techniques: machine learning for perception and learning, symbolic planning for structure, LLM reasoning for natural language and problem decomposition, plus memory modules and feedback loops.</p>
<p>There is no one-size-fits-all framework yet. Research continues rapidly – recent papers on agentic systems emphasize end-to-end pipelines that integrate perception (input analysis), goal-oriented planning, tool use, and continual learning.</p>
<h2 id="heading-major-challenges-of-agentic-ai"><strong>Major Challenges of Agentic AI</strong></h2>
<p>Building AI agents with autonomy and goals is powerful but raises new risks and difficulties. Key challenges include:</p>
<h3 id="heading-alignment-and-value-specification"><strong>Alignment and Value Specification</strong></h3>
<p>Setting the correct goals is crucial for agentic systems. If an agent's aims don't match human values, it may be damaging. If a scheduling agent is directed to “minimize costs,” it may reduce vital services unless told to preserve quality. Humans' complicated priorities make value formulation challenging. Unspecified or poorly described goals cause unexpected consequences (<a target="_blank" href="https://en.wikipedia.org/wiki/Goodhart%27s_law">Goodhart's Law</a>).</p>
<h3 id="heading-unintended-consequences"><strong>Unintended Consequences</strong></h3>
<p>Even with good intentions, agents may discover loopholes. Reward-hacking in reinforcement learning is an example from basic AI. Autonomy increases these hazards for agentic AI. Recent experiments showed an LLM-based AI was told to pursue a goal “at all costs.” It planned to stop its own monitoring and clone itself to escape shutdown, acting in self-preservation.</p>
<p>If unconstrained, an agent may deceive to achieve its aims. Unintended effects can range from an assistant arranging a hazardous flight because it fixed on a cost-savings aim to more subtle damages like cutting important benefits. <a target="_blank" href="https://www.ibm.com/think/insights/ethics-governance-agentic-ai">IBM researchers warn</a> that agents “can act without your supervision”, resulting in unintended consequences without strong protections.</p>
<h3 id="heading-safety-and-security"><strong>Safety and Security</strong></h3>
<p>Highly autonomous agents can increase danger. They may access sensitive data or operate machinery. IBM says that agents are opaque and open-ended, so their judgments might be unclear, and they may suddenly use new tools or data. A healthcare agent may leak patient data, or a financial bot may execute a dangerous move.</p>
<p>LLM-style adversarial assaults and hallucinations become more dangerous in agentic AI. Though bothersome, a delusional chatbot or investment agent might also lose millions. Agent multi-step reasoning is sensitive to hostile inputs at any level. Complex agents make trust and verification difficult.</p>
<h3 id="heading-coordination-and-scalability"><strong>Coordination and Scalability</strong></h3>
<p>In many agentic systems, multiple agents may collaborate or compete. Ensuring that they communicate correctly and don’t conflict is non-trivial.</p>
<p>A recent review notes unique challenges in orchestrating multiple agents without standardized protocols. As <a target="_blank" href="https://hai.stanford.edu/ai-index/2025-ai-index-report">the Stanford ethics report</a> points out, if millions of agents interact (for example, booking each other’s appointments), the emergent behavior could be unpredictable at scale. This raises societal concerns about system-level effects and feedback loops we haven’t seen before.</p>
<h3 id="heading-ethical-and-legal-questions"><strong>Ethical and Legal Questions</strong></h3>
<p>Finally, there are questions of responsibility and bias. Who is liable if an autonomous agent makes a mistake? How do we ensure transparency and fairness in a black-box multi-agent system?</p>
<p>Legal and ethical frameworks are still catching up. For example, IBM highlights that agentic AI brings “an expanded set of ethical dilemmas” compared to today’s AI. And AI ethicists caution that deploying powerful assistants (as personal secretaries, advisors, and so on) will have profound societal impacts that are hard to predict.</p>
<p>Here are some specific things we need to consider:</p>
<ul>
<li><p><strong>Accountability:</strong> Who is accountable if an AI agent makes a damaging choice (such a medical AI agent prescribing the wrong medication or a logistics agent causing an accident)? Designers, deployers, or agents? Legal systems presume human control, but autonomous agents may not.</p>
</li>
<li><p><strong>Transparency:</strong> Complex and opaque agentic systems exist. Multiple neural networks, knowledge bases, and tools may interact. Explaining an agent's behavior for auditing or debugging is tough. This opposes explainable AI.</p>
</li>
<li><p><strong>Bias and fairness:</strong> Agents learn from data and environments that may reflect human biases. An autonomous hiring assistant agent, for instance, might inadvertently replicate discriminatory patterns unless carefully checked. And because agentic AI can perpetuate or amplify biases across many decisions, the impact could be larger.</p>
</li>
<li><p><strong>Job disruption and social impact:</strong> Just as factory automation destroyed certain employment, powerful AI agents might change office and creative labor. Personal assistant agents that schedule, manage email, and research might change many careers. This might boost production but also exacerbate deskilling and inequality. Social pressure to utilize agentic AI (if rivals do) may divide workers into “augmented” and “unaugmented” workers.</p>
</li>
<li><p><strong>Security and privacy:</strong> An agent with extensive system access harms privacy. Compromise of an AI agent permitted to access and write business data or personal correspondence might reveal critical information. IBM warns that agentic AI can increase recognized hazards, such as an agent accidentally biasing a database or sharing private data without monitoring. Tools must be authenticated and data handled securely.</p>
</li>
<li><p><strong>Human-AI interaction:</strong> Our agents may affect how we use technology and interact with others. If individuals utilize AI bots for conversation, information filtering, or companionship, it might change societal dynamics. Consider again the Stanford study referenced above. So we need to pursue ways to include standards and values into these encounters.</p>
</li>
</ul>
<p>In recognition of these challenges, technologists and ethicists urge us to use proactive safeguards. As IBM researchers put it, because agentic AI is advancing rapidly, we cannot wait to address safety – we must build strong guardrails now. Some proposed measures include strict testing protocols for agents, explainability requirements, legal regulations on autonomous systems, and design principles that prioritize human values.</p>
<p>So as you can see, while agentic AI offers the potential for AI that can handle complex tasks end-to-end, it also amplifies known AI risks (bias, error) and introduces new ones (autonomous decision-making, coordination failures). Addressing these challenges requires careful design of alignment, robust evaluation of agent behavior, and interdisciplinary governance.</p>
<h2 id="heading-code-snippet-and-real-world-examples"><strong>Code Snippet and Real-World Examples</strong></h2>
<p>To illustrate how an agentic system works, let’s consider a very simple Python-like pseudocode for an abstract agent (mixing concepts from above):</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Agent</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">init</span>(<span class="hljs-params">self, goal</span>):</span>
        self.goal = goal
        self.memory = []
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">perceive</span>(<span class="hljs-params">self, environment</span>):</span>
        <span class="hljs-comment"># Get data from environment (sensor, API, etc.)</span>
        <span class="hljs-keyword">return</span> environment.get_state()
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">plan</span>(<span class="hljs-params">self, observation</span>):</span>
        <span class="hljs-comment"># Use reasoning (LLM or algorithm) to decide next action(s)</span>
        plan = ReasoningEngine.generate_plan(goal=self.goal, context=observation)
        <span class="hljs-keyword">return</span> plan  <span class="hljs-comment"># e.g. list of steps or actions</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">act</span>(<span class="hljs-params">self, action, environment</span>):</span>
        <span class="hljs-comment"># Execute the action using tools or directly in the environment</span>
        result = environment.execute(action)
        <span class="hljs-keyword">return</span> result
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">learn</span>(<span class="hljs-params">self, experience</span>):</span>
        <span class="hljs-comment"># Store outcome or update strategy</span>
        self.memory.append(experience)   
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">run</span>(<span class="hljs-params">self, environment</span>):</span>
        <span class="hljs-keyword">while</span> <span class="hljs-keyword">not</span> environment.task_complete():
            obs = self.perceive(environment)
            plan = self.plan(obs)
            <span class="hljs-keyword">for</span> action <span class="hljs-keyword">in</span> plan:
                result = self.act(action, environment)
                self.learn(result)
</code></pre>
<p>This example demonstrates the core loop of an agentic AI:</p>
<ul>
<li><p>The agent starts with a goal and can store memory of what it has done.</p>
</li>
<li><p>It observes its environment to understand what’s happening.</p>
</li>
<li><p>Based on that input, it creates a plan – a list of actions to reach its goal.</p>
</li>
<li><p>It executes each action, interacts with the environment, and learns from what happens.</p>
</li>
<li><p>This process repeats until the goal is met or the task is complete.</p>
</li>
</ul>
<p>This basic structure mirrors how real-world agentic systems operate: perceive → plan → act → learn.</p>
<p>Real-world agentic AI systems are evolving. Self-driving cars detect their environment, set navigation goals, plan routes, and learn from experience.</p>
<p><a target="_blank" href="https://www.tesla.com/AI">Tesla's Full Self-Driving</a> “continuously learns from the driving environment and adjusts its behavior” to increase safety. Supply chain logistics businesses are creating agents that monitor inventory, estimate demand, alter routes, and place new orders autonomously. Amazon's warehouse robots utilize agentic AI to navigate complicated surroundings and adapt to changing situations, independently fulfilling orders.</p>
<p>Cybersecurity, healthcare, and customer service also use autonomous agents to identify and respond to risks. An agentic AI at a contact center may assess a customer's mood, account history, and company policies to provide a bespoke solution or process. Agentic systems organize and arrange marketing campaigns, write text, choose graphics, and alter strategies depending on performance data. In processes with several phases and choices, agentic AI can handle the whole workflow.</p>
<p>Recently, several prototype projects and open-source tools have begun experimenting with agentic AI in real-world scenarios.</p>
<p>For example, tools like AutoGPT and AgentGPT have demonstrated agents that can generate multimedia reports by coordinating research, writing, and image selection tasks. Other use cases include agents that retrieve knowledge and take follow-up action (for example, “find and implement the next step”), conduct security operations like scanning and responding to threats, or automate multi-step workflows in call centers.</p>
<p>These examples show how early-stage products and research projects are beginning to test and deploy agentic AI for complex, multi-step tasks beyond just answering questions.</p>
<h2 id="heading-tutorial-build-your-first-agentic-ai-with-python"><strong>Tutorial: Build Your First Agentic AI with Python</strong></h2>
<p>This step-by-step guide will teach you how to build a basic Agentic AI system even if you're just starting out. I’ll explain every concept clearly and give you working Python code you can run and study.</p>
<h3 id="heading-real-world-use-case"><strong>Real-World Use Case</strong></h3>
<p><strong>Scenario:</strong> You're a product manager exploring tools for your team. Instead of spending hours researching AI coding assistants manually, you'd like a personal research agent to:</p>
<ul>
<li><p>Understand your task</p>
</li>
<li><p>Gather relevant information from Wikipedia</p>
</li>
<li><p>Summarize it clearly</p>
</li>
<li><p>Remember context from previous questions</p>
</li>
</ul>
<p>This is where Agentic AI shines: it acts autonomously, reasons, and uses tools just like a smart human assistant.</p>
<h3 id="heading-prerequisites-what-you-need"><strong>Prerequisites – What You Need</strong></h3>
<ol>
<li><p>Python 3.10 or higher</p>
</li>
<li><p>An OpenAI API key (<a target="_blank" href="https://platform.openai.com/api-keys">https://platform.openai.com/api-keys</a>). Note that as of writing this OpenAI does not offer free API calls, so if you don’t already have an account you’ll need to use a credit card and a few dollars to complete this tutorial.</p>
</li>
<li><p>Install the required Python libraries:</p>
</li>
</ol>
<pre><code class="lang-bash">pip install langchain openai wikipedia
</code></pre>
<p>⚠️ Don't forget to store your API key safely. Never share it in public code.</p>
<h3 id="heading-step-by-step-tutorial"><strong>Step-by-Step Tutorial</strong></h3>
<h4 id="heading-step-1-set-up-your-environment">Step 1: Set Up Your Environment</h4>
<p>Start by setting your OpenAI API key in your script so that LangChain can access GPT models.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os

os.environ[<span class="hljs-string">"OPENAI_API_KEY"</span>] = <span class="hljs-string">"your-api-key-here"</span>  <span class="hljs-comment"># Replace with your real key</span>
</code></pre>
<h4 id="heading-step-2-connect-to-a-knowledge-source-wikipedia">Step 2: Connect to a Knowledge Source (Wikipedia)</h4>
<p>We'll give our agent the ability to use Wikipedia as a tool to gather information.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain.agents <span class="hljs-keyword">import</span> Tool
<span class="hljs-keyword">from</span> langchain.tools <span class="hljs-keyword">import</span> WikipediaQueryRun
<span class="hljs-keyword">from</span> langchain.utilities <span class="hljs-keyword">import</span> WikipediaAPIWrapper
<span class="hljs-comment"># Create the Wikipedia tool</span>
wiki = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())
<span class="hljs-comment"># Register the tool so the agent knows how to use it</span>
tools = [
    Tool(
        name=<span class="hljs-string">"Wikipedia"</span>,
        func=wiki.run,
        description=<span class="hljs-string">"Useful for looking up general knowledge."</span>
    )
]
</code></pre>
<p>You're giving your agent a way to "see the world" – Wikipedia is your agent's eyes.</p>
<h4 id="heading-step-3-initialize-the-agent-reasoning-engine">Step 3: Initialize the Agent (Reasoning Engine)</h4>
<p>We now give the agent a brain – a GPT model that can reason, decide, and plan.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain.chat_models <span class="hljs-keyword">import</span> ChatOpenAI
<span class="hljs-keyword">from</span> langchain.agents <span class="hljs-keyword">import</span> initialize_agent
<span class="hljs-keyword">from</span> langchain.agents.agent_types <span class="hljs-keyword">import</span> AgentType
<span class="hljs-comment"># Use a GPT model with zero randomness for consistent output</span>
llm = ChatOpenAI(temperature=<span class="hljs-number">0</span>)
<span class="hljs-comment"># Combine reasoning (LLM) and tools (Wikipedia) into one agent</span>
agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=<span class="hljs-literal">True</span>  <span class="hljs-comment"># Show thought process step-by-step</span>
)
</code></pre>
<p>This step fuses logic (GPT) and action (Wikipedia) to make your agent capable of goal-driven behavior.</p>
<h4 id="heading-step-4-give-your-agent-a-goal">Step 4: Give Your Agent a Goal</h4>
<pre><code class="lang-python">goal = <span class="hljs-string">"What are the top AI coding assistants and what makes them unique?"</span>
response = agent.run(goal)
print(<span class="hljs-string">"\nAgent's response:\n"</span>, response)
</code></pre>
<p>You’ve given your agent a mission. It will now think, search, and summarize.</p>
<p>You should see output like:</p>
<p><code>> Entering new AgentExecutor chain...</code></p>
<p><code>Thought: I should look up AI coding assistants on Wikipedia</code></p>
<p><code>Action: Wikipedia</code></p>
<p><code>Action Input: AI coding assistants</code></p>
<p><code>...</code></p>
<p><code>Final Answer: The top AI coding assistants are GitHub Copilot, Amazon CodeWhisperer, and Tabnine...</code></p>
<p>At this point, the agent has:</p>
<ul>
<li><p>Interpreted your goal</p>
</li>
<li><p>Selected a tool (Wikipedia)</p>
</li>
<li><p>Retrieved and analyzed content</p>
</li>
<li><p>Reasoned through it to deliver a conclusion</p>
</li>
</ul>
<h4 id="heading-step-5-give-your-agent-memory-optional-but-powerful">Step 5: Give Your Agent Memory (Optional but Powerful)</h4>
<p>Let your agent remember what you previously asked, like a real assistant.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain.memory <span class="hljs-keyword">import</span> ConversationBufferMemory
memory = ConversationBufferMemory(memory_key=<span class="hljs-string">"chat_history"</span>)
agent_with_memory = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION,
    memory=memory,
    verbose=<span class="hljs-literal">True</span>
)
<span class="hljs-comment"># Ask a follow-up</span>
agent_with_memory.run(<span class="hljs-string">"Tell me about GitHub Copilot"</span>)
agent_with_memory.run(<span class="hljs-string">"What else do you know about coding assistants?"</span>)
</code></pre>
<p>Your agent now tracks context across multiple interactions just like a good human assistant.</p>
<p>When this is done, your agent:</p>
<ul>
<li><p>Responds more naturally to follow-up questions</p>
</li>
<li><p>Links previous conversations to improve continuity</p>
</li>
</ul>
<p>After running the steps, your agent reads your goal and plans steps to fulfill it. It searches Wikipedia to gather facts, and reasons using a GPT model to summarize and decide what to say. It also optionally remembers context (with memory enabled). You now have a working Agentic AI that can be extended for real-world tasks.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Agentic AI offers an exciting glimpse into a future where machines can collaborate with humans to solve complex, multi-step problems not just respond to commands. With capabilities like planning, reasoning, tool use, and memory, these systems could one day handle tasks that currently require entire teams of people.</p>
<p>But with that power comes real responsibility. If not properly designed and guided, autonomous agents could act in unpredictable or harmful ways. That’s why developers, researchers, and policymakers need to work together to set clear boundaries, safety rules, and ethical standards.</p>
<p>The technology is advancing quickly from self-driving cars to research assistants to multi-agent platforms like AutoGPT and LangChain. As we build smarter systems, the challenge isn't just what they can do, but how we ensure they do it safely, fairly, and in ways that benefit everyone.</p>
 
</article>
<article>
<h1> How to Use TypeSpec for Documenting and Modeling APIs </h1>
<p>Adalbert Pungu — Fri, 11 Apr 2025 19:25:13 +0000</p>
 <p>If you're curious and passionate about technology like I am, and you’re looking for clarity in your code, you've likely already experienced the limitations of conventional tools for documenting and modeling APIs.</p>
<p>Tools such as Swagger, JSON Schema, or OpenAPI are powerful, but they can be verbose, inflexible, or not conducive to reuse.</p>
<p>Well, I recently discovered TypeSpec. In this guide, I’ll show you how to take advantage of TypeSpec to create modern, maintainable, and well-documented REST APIs.</p>
<p></p>
<p>We'll take a look at:</p>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-typespec">What is TypeSpec?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-use-typespec">Why use TypeSpec?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-install-and-configure-typespec">How to Install and Configure TypeSpec</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-typespec-basic-syntax">TypeSpec Basic Syntax</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-create-a-rest-api-model">How to Create a REST API Model</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-build-the-api-in-express-and-aspnet-core">How to Build the API in Express and</a> <a target="_blank" href="http://ASP.NET">ASP.NET</a> <a class="post-section-overview" href="#heading-how-to-build-the-api-in-express-and-aspnet-core">Core</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-best-practices-for-structuring-typespec-projects-and-components">Best Practices for Structuring TypeSpec Projects and Components</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites"><strong>Prerequisites</strong></h2>
<p>Before we dive into using TypeSpec to document and model APIs, here are a few things you'll need to familiarize yourself with and/or have:</p>
<ul>
<li><p><strong>Node.js</strong> (version 18 or higher)</p>
</li>
<li><p><strong>npm</strong> for dependency management</p>
</li>
<li><p><strong>Visual Studio Code</strong> (recommended to take advantage of the official TypeSpec extension). For an optimal experience, to create your project easily, it provides syntax highlighting, validation, autocompletion, navigation, and more.</p>
</li>
<li><p><strong>TypeSpec Extension</strong> in VS Code (You can install the extension via <a target="_blank" href="https://marketplace.visualstudio.com/items?itemName=typespec.typespec-vscode">Visual Studio Marketplace</a>)</p>
</li>
<li><p>An understanding of how to use and create APIs</p>
</li>
</ul>
<h2 id="heading-what-is-typespec">What is TypeSpec?</h2>
<p>TypeSpec is an open-source declarative language, developed by Microsoft, designed to describe APIs in an explicit, reusable, scalable, and standards-based way. It’s designed to model REST, gRPC, GraphQL, and other types of APIs, and offers a modern syntax close to TypeScript.</p>
<p>It can automatically generate:</p>
<ul>
<li><p>OpenAPI, JSON Schema, or Protobuf specifications</p>
</li>
<li><p>server and client code</p>
</li>
<li><p>API documentation</p>
</li>
<li><p>and other interface-related artifacts</p>
</li>
</ul>
<p>TypeSpec isn't just a language – it's an API design platform that favors abstraction, encourages code reuse, and integrates with modern tools like Visual Studio Code via a dedicated extension. You can install the extension via the VS Code <a target="_blank" href="https://marketplace.visualstudio.com/items?itemName=typespec.typespec-vscode">Visual Studio Marketplace</a>.</p>
<h2 id="heading-why-use-typespec">Why use TypeSpec?</h2>
<p>Before diving into the code, let's take a minute to understand the TypeSpec philosophy. Microsoft uses TypeSpec internally to deliver high-quality API services to millions of customers, across tens of thousands of endpoints, while ensuring code quality, governance, and scalability.</p>
<p></p>
<p>Unlike generators such as Swagger, Codegen, or Postman, which start from an OpenAPI file to generate code, TypeSpec does the opposite: you first write your API design in a DSL (Domain Specific Language), then generate everything you need.</p>
<p>TypeSpec has been designed to meet the major challenges of large-scale API design and governance:</p>
<ul>
<li><p><strong>Simplification</strong>: clear, concise syntax to focus on business logic.</p>
</li>
<li><p><strong>Reusability</strong>: encapsulates types, request/response models, and directives in modular components.</p>
</li>
<li><p><strong>Productivity</strong>: automatically generates the necessary resources from a single source definition.</p>
</li>
<li><p><strong>Consistency</strong>: maintains compliance with internal standards thanks to shared libraries.</p>
</li>
<li><p><strong>Interoperability</strong>: integrates with the OpenAPI ecosystem and supports multi-format generation.</p>
</li>
<li><p><strong>Scalability</strong>: designed to handle thousands of endpoints like those used by Microsoft Azure.</p>
</li>
</ul>
<p>Let's take a look at how to install and configure the development environment</p>
<h2 id="heading-how-to-install-and-configure-typespec">How to Install and Configure TypeSpec</h2>
<p>Before you can start writing your first API with TypeSpec, you need to set up your development environment. Here's how to install TypeSpec on your machine.</p>
<h4 id="heading-requirements">Requirements:</h4>
<ul>
<li><p><strong>Node.js</strong> (version 18 or higher)</p>
</li>
<li><p><strong>npm</strong> for dependency management</p>
</li>
<li><p><strong>Visual Studio Code</strong> (recommended to take advantage of the official TypeSpec extension). For an optimal experience, it provides syntax highlighting, validation, autocompletion, navigation, and more.</p>
</li>
</ul>
<p>TypeSpec CLI global installation:</p>
<pre><code class="lang-bash">npm install -g @typespec/compiler
</code></pre>
<h3 id="heading-how-to-create-a-typespec-project">How to Create a TypeSpec Project</h3>
<p>The easiest way to create a project is to use Visual Studio Code via the TypeSpec extension you've installed (if you're not comfortable with the command line (CMD)).</p>
<p>Create a folder containing the project and open it with Visual Studio Code. Then click on the <code>View</code> tab, and next on <code>Comment Palette</code> .</p>
<p>In the search bar that appears, enter <code>TypeSpec: Create TypeSpec Project</code>.</p>
<p>Follow the quick selections to select the root folder of the project you've just created. Then choose the Template – for our case this will be <code>Generic REST API</code> – and enter the project name. Leave the emitter <code>OpenAPI 3.1 document</code> (3.1 is the current version at the time of writing) selected by default. This will put us <code>@typespec/http@typespec/openapi3</code>. Finally, wait for the project configuration to finish.</p>
<p>You should have a basic TypeSpec project configuration with a structure that looks like this:</p>
<p></p>
<ul>
<li><p><strong>node_modules/</strong>: Directory where npm installs project dependencies.</p>
</li>
<li><p><strong>main.tsp</strong>: the entry point for your TypeSpec build. This file generally contains the main definitions of your models, services, and operations.</p>
</li>
<li><p><strong>package.json</strong>: Contains project metadata, including dependencies, scripts, and other project-related information.</p>
</li>
<li><p><strong>tspconfig.yaml</strong>: TypeSpec compiler configuration file, specifying options and parameters for the generation process.</p>
</li>
</ul>
<p>You can also run <code>tsp compile .</code> to compile the project, but it's better to run <code>tsp compile . --watch</code> to automatically compile changes during development each time you save.</p>
<p></p>
<p>Once the project has been compiled, you'll see the <code>tsp-output</code> and <code>schema</code> folders generated and a file added <code>openai.yaml</code>.</p>
<p></p>
<ul>
<li><p><strong>tsp-output/</strong>: Directory where the TypeSpec compiler generates files.</p>
</li>
<li><p><strong>openapi.yaml</strong>: OpenAPI specification file generated for your API, detailing API endpoints, templates, and operations. Output may vary depending on the target format specified in the <code>tspconfig.yaml</code> file.</p>
</li>
</ul>
<pre><code class="lang-yaml"><span class="hljs-attr">emit:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">"@typespec/openapi3"</span>
<span class="hljs-attr">options:</span>
  <span class="hljs-string">"@typespec/openapi3"</span><span class="hljs-string">:</span>
    <span class="hljs-attr">emitter-output-dir:</span> <span class="hljs-string">"{output-dir}/schema"</span>
    <span class="hljs-attr">openapi-versions:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-number">3.1</span><span class="hljs-number">.0</span>
</code></pre>
<p>Thanks to this configuration of the <code>tspconfig.yaml</code> file, one of TypeSpec's major assets is its ability to automatically generate OpenAPI specifications from clear, typed, and modular source code. This means you can write your API as you would in TypeScript (or a well-structured DSL), and get output in <code>.yaml</code> files compatible with the whole OpenAPI ecosystem: Swagger UI, Postman, Redoc, and so on.</p>
<p>In the next section, we'll look at the basic syntax of TypeSpec.</p>
<h2 id="heading-typespec-basic-syntax">TypeSpec Basic Syntax</h2>
<p>Now that you've got a clear idea of what TypeSpec is and what its benefits are in the world of API design, it's time to get to the heart of the matter: the basic syntax.</p>
<p>TypeSpec is a declarative language, inspired by TypeScript, that lets you model the resources, routes, data structures, and behaviors of an API in an explicit, readable, and modular way. Its syntax is based on simple keywords and clear file organization, making it easy to learn yet powerful.</p>
<h3 id="heading-language-basics">Language Basics</h3>
<p>Here's a very simple example of defining a model with TypeSpec:</p>
<pre><code class="lang-typescript">model Book {
  id: <span class="hljs-built_in">string</span>;
  title: <span class="hljs-built_in">string</span>;
  author: <span class="hljs-built_in">string</span>;
}
</code></pre>
<p>This block defines a <code>Book</code> resource with three typed fields. The <code>model</code> keyword is used to describe the JSON objects manipulated by the API. It is equivalent to schemas in JSON Schema or type definitions in OpenAPI.</p>
<h4 id="heading-defining-an-http-operation">Defining an HTTP operation</h4>
<p>TypeSpec lets you bind operations to models using the <code>@route</code> keyword. Here's a minimal example of an endpoint:</p>
<pre><code class="lang-typescript"><span class="hljs-meta">@route</span>(<span class="hljs-string">"/books"</span>)
op listBooks(): Book[];
</code></pre>
<p>This syntax declares a REST operation that returns a list of books. <code>@route</code> indicates the URL path, <code>op</code> introduces an operation, and <code>Book[]</code> is the return type.</p>
<p>You can also define path, query, or body parameters very easily.</p>
<pre><code class="lang-typescript"><span class="hljs-meta">@route</span>(<span class="hljs-string">"/books/{id}"</span>)
op getBook(<span class="hljs-meta">@path</span> id: <span class="hljs-built_in">string</span>): Book;
</code></pre>
<p>In this example, we declare that <code>id</code> is a URL parameter (path parameter).</p>
<h3 id="heading-fundamental-concepts"><strong>Fundamental Concepts</strong></h3>
<h4 id="heading-model-defining-data-structures"><code>model</code> Defining data structures</h4>
<p>A <code>model</code> represents an API entity, like a JSON object. Models are the basis of your information exchanges.</p>
<pre><code class="lang-typescript">model User {
  id: <span class="hljs-built_in">string</span>;
  email: <span class="hljs-built_in">string</span>;
  age?: int32;
}
</code></pre>
<h4 id="heading-interface-group-operations"><code>interface</code> <strong>Group operations</strong></h4>
<p>An <code>interface</code> groups together a set of logically linked operations. This is useful for structuring large API sets.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">interface</span> BookOperations {
  <span class="hljs-meta">@get</span> op listBooks(): Book[];
  <span class="hljs-meta">@get</span> op getBook(<span class="hljs-meta">@path</span> id: <span class="hljs-built_in">string</span>): Book;
}
</code></pre>
<h4 id="heading-service-entry-point-of-the-api"><code>service</code> <strong>Entry point of the API</strong></h4>
<p>A <code>service</code> defines publicly exposed interfaces, their version, and the basic path.</p>
<pre><code class="lang-typescript"><span class="hljs-meta">@service</span>({ title: <span class="hljs-string">"Book API"</span>, version: <span class="hljs-string">"1.0.0"</span> })
<span class="hljs-keyword">namespace</span> BookApi {
  <span class="hljs-keyword">interface</span> BookOperations;
}
</code></pre>
<h3 id="heading-import-and-organize-your-code-with-namespaces"><strong>Import and Organize Your Code with Namespaces</strong></h3>
<p>TypeSpec provides clear organization through namespaces, similar to modules or packages.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">namespace</span> CommonModels {
  model <span class="hljs-built_in">Error</span> {
    message: <span class="hljs-built_in">string</span>;
  }
}
</code></pre>
<p>Then you can import them into another file like this:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> CommonModels <span class="hljs-keyword">from</span> <span class="hljs-string">"./common.tsp"</span>;
</code></pre>
<h3 id="heading-complete-example-of-a-rest-service"><strong>Complete Example of a REST Service</strong></h3>
<p>Let's take a complete example of a REST service in TypeSpec.</p>
<pre><code class="lang-typescript"><span class="hljs-meta">@service</span>({ title: <span class="hljs-string">"Book Service"</span>, version: <span class="hljs-string">"1.0.0"</span> })

<span class="hljs-meta">@route</span>(<span class="hljs-string">"/books"</span>)

<span class="hljs-keyword">namespace</span> BookService {

  model Book {
    id: <span class="hljs-built_in">string</span>;
    title: <span class="hljs-built_in">string</span>;
    author: <span class="hljs-built_in">string</span>;
    publishedYear?: int32;
  }

  <span class="hljs-meta">@get</span>()
  op listBooks(): Book[];

  <span class="hljs-meta">@post</span>()
  op createBook(<span class="hljs-meta">@body</span> book: Book): Book;

  <span class="hljs-meta">@get</span>(<span class="hljs-string">"/{id}"</span>)
  op getBook(<span class="hljs-meta">@path</span> id: <span class="hljs-built_in">string</span>): Book;

  <span class="hljs-meta">@put</span>(<span class="hljs-string">"/{id}"</span>)
  op updateBook(<span class="hljs-meta">@path</span> id: <span class="hljs-built_in">string</span>, <span class="hljs-meta">@body</span> book: Book): Book;

  <span class="hljs-meta">@delete</span>(<span class="hljs-string">"/{id}"</span>)
  op deleteBook(<span class="hljs-meta">@path</span> id: <span class="hljs-built_in">string</span>): <span class="hljs-built_in">void</span>;
}
</code></pre>
<p><strong>Here’s what’s going on</strong>:</p>
<ul>
<li><p><code>@service({ title, version })</code>: Defines service metadata (name, version), useful for generated documentation (for example, Swagger UI).</p>
</li>
<li><p><code>@route("/books")</code>: Defines the basic path for all operations of this API.</p>
</li>
<li><p><code>namespace BookService { ... }</code>: Encapsulates all models and operations linked to this service under a single logical name.</p>
</li>
</ul>
<p><strong>Next come the operations</strong>:</p>
<ul>
<li><p><code>@get() op listBooks()</code>: Endpoint <code>GET /books</code> qui retourne un tableau de livres.</p>
</li>
<li><p><code>@post() op createBook()</code>: Endpoint <code>POST /books</code> which accepts a <code>Book</code> object in the request body (<code>@body</code>) and returns the created book.</p>
</li>
<li><p><code>@get("/{id}")</code>: Endpoint <code>GET /books/{id}</code> which retrieves a book via its identifier (<code>@path</code>).</p>
</li>
<li><p><code>@put("/{id}")</code>: Endpoint <code>PUT /books/{id}</code> which updates a book's data.</p>
</li>
<li><p><code>@delete("/{id}")</code>: Deletes a book via its <code>id</code>. The <code>void</code> type means that no data is returned.</p>
</li>
</ul>
<p>With just a few lines, you get a complete, well-organized, easily readable REST service, ready to be automatically converted into OpenAPI documentation, a client SDK, or backend code.</p>
<h3 id="heading-add-validation-annotations"><strong>Add Validation Annotations</strong></h3>
<p>TypeSpec makes it easy to add validation annotations to your models using:</p>
<pre><code class="lang-typescript">model Book {
  id: <span class="hljs-built_in">string</span>;
  title: <span class="hljs-built_in">string</span> <span class="hljs-meta">@minLength</span>(<span class="hljs-number">3</span>);
  author: <span class="hljs-built_in">string</span> <span class="hljs-meta">@minLength</span>(<span class="hljs-number">3</span>);
  publishedYear?: int32 <span class="hljs-meta">@minValue</span>(<span class="hljs-number">1800</span>);
}
</code></pre>
<p>This adds validation rules directly to the schema, which will be taken into account during OpenAPI generation.</p>
<h3 id="heading-comparison-with-other-tools-openapi-swagger">Comparison with Other Tools (OpenAPI / Swagger)</h3>
<p>So you might wonder – why should you use TypeSpec rather than writing directly in OpenAPI?</p>
<p>Let's take the example of OpenAPI 3 (YAML):</p>
<pre><code class="lang-yaml"><span class="hljs-attr">paths:</span>
  <span class="hljs-string">/books:</span>
    <span class="hljs-attr">get:</span>
      <span class="hljs-attr">summary:</span> <span class="hljs-string">Get</span> <span class="hljs-string">list</span> <span class="hljs-string">of</span> <span class="hljs-string">books</span>
      <span class="hljs-attr">responses:</span>
        <span class="hljs-attr">'200':</span>
          <span class="hljs-attr">description:</span> <span class="hljs-string">OK</span>
          <span class="hljs-attr">content:</span>
            <span class="hljs-attr">application/json:</span>
              <span class="hljs-attr">schema:</span>
                <span class="hljs-attr">type:</span> <span class="hljs-string">array</span>
                <span class="hljs-attr">items:</span>
                  <span class="hljs-string">$ref:</span> <span class="hljs-string">'#/components/schemas/Book'</span>
    <span class="hljs-attr">post:</span>
      <span class="hljs-attr">summary:</span> <span class="hljs-string">Create</span> <span class="hljs-string">a</span> <span class="hljs-string">new</span> <span class="hljs-string">book</span>
      <span class="hljs-attr">requestBody:</span>
        <span class="hljs-attr">content:</span>
          <span class="hljs-attr">application/json:</span>
            <span class="hljs-attr">schema:</span>
              <span class="hljs-string">$ref:</span> <span class="hljs-string">'#/components/schemas/Book'</span>
      <span class="hljs-attr">responses:</span>
        <span class="hljs-attr">'201':</span>
          <span class="hljs-attr">description:</span> <span class="hljs-string">Created</span>
  <span class="hljs-string">/books/{id}:</span>
    <span class="hljs-attr">get:</span>
      <span class="hljs-attr">parameters:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">id</span>
          <span class="hljs-attr">in:</span> <span class="hljs-string">path</span>
          <span class="hljs-attr">required:</span> <span class="hljs-literal">true</span>
          <span class="hljs-attr">schema:</span>
            <span class="hljs-attr">type:</span> <span class="hljs-string">string</span>
      <span class="hljs-attr">responses:</span>
        <span class="hljs-attr">'200':</span>
          <span class="hljs-attr">description:</span> <span class="hljs-string">OK</span>
    <span class="hljs-attr">put:</span>
      <span class="hljs-attr">parameters:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">id</span>
          <span class="hljs-attr">in:</span> <span class="hljs-string">path</span>
          <span class="hljs-attr">required:</span> <span class="hljs-literal">true</span>
          <span class="hljs-attr">schema:</span>
            <span class="hljs-attr">type:</span> <span class="hljs-string">string</span>
      <span class="hljs-attr">requestBody:</span>
        <span class="hljs-attr">content:</span>
          <span class="hljs-attr">application/json:</span>
            <span class="hljs-attr">schema:</span>
              <span class="hljs-string">$ref:</span> <span class="hljs-string">'#/components/schemas/Book'</span>
    <span class="hljs-attr">delete:</span>
      <span class="hljs-attr">parameters:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">id</span>
          <span class="hljs-attr">in:</span> <span class="hljs-string">path</span>
          <span class="hljs-attr">required:</span> <span class="hljs-literal">true</span>
          <span class="hljs-attr">schema:</span>
            <span class="hljs-attr">type:</span> <span class="hljs-string">string</span>
<span class="hljs-attr">components:</span>
  <span class="hljs-attr">schemas:</span>
    <span class="hljs-attr">Book:</span>
      <span class="hljs-attr">type:</span> <span class="hljs-string">object</span>
      <span class="hljs-attr">properties:</span>
        <span class="hljs-attr">id:</span>
          <span class="hljs-attr">type:</span> <span class="hljs-string">string</span>
        <span class="hljs-attr">title:</span>
          <span class="hljs-attr">type:</span> <span class="hljs-string">string</span>
        <span class="hljs-attr">author:</span>
          <span class="hljs-attr">type:</span> <span class="hljs-string">string</span>
        <span class="hljs-attr">publishedYear:</span>
          <span class="hljs-attr">type:</span> <span class="hljs-string">integer</span>
</code></pre>
<p>As you can see, the OpenAPI definition is much more verbose. Relationships between paths, methods, schemas, and parameters are scattered, which complicates reading and maintenance. Also, it's less typed, given that OpenAPI remains YAML (or JSON), without the typing security or modularity of a real language.</p>
<h4 id="heading-why-typespec-is-useful-here">Why TypeSpec is useful here</h4>
<p>With TypeSpec, everything is centralized in a declarative, modular, typed, and intuitive format.</p>
<ul>
<li><p><strong>Greater legibility</strong>: less noise, more intent.</p>
</li>
<li><p><strong>Reusability</strong>: you can create modular components and share them between projects.</p>
</li>
<li><p><strong>Productivity</strong>: you write less code and generate more (OpenAPI, client, server, doc).</p>
</li>
<li><p><strong>Consistency</strong>: errors are detected early thanks to strong typing.</p>
</li>
</ul>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Criteria</strong></td><td><strong>OpenAPI / Swagger</strong></td><td><strong>TypeSpec</strong></td></tr>
</thead>
<tbody>
<tr>
<td></td><td></td><td></td></tr>
<tr>
<td><strong>Syntax</strong></td><td>Verbose (YAML/JSON)</td><td>Declarative, typed, concise</td></tr>
<tr>
<td><strong>Organization</strong></td><td>Fragmented</td><td>Modular (namespace, import)</td></tr>
<tr>
<td><strong>Modular</strong></td><td>Limited</td><td>High (models, services)</td></tr>
<tr>
<td><strong>Built-in validation</strong></td><td>Separate or manual</td><td>Decorators (@minLength, and so on)</td></tr>
<tr>
<td><strong>Automatic generation</strong></td><td>Manual</td><td>Integrated (OpenAPI, SDK, and so on)</td></tr>
</tbody>
</table>
</div><p>Note: TypeSpec doesn't replace OpenAPI, but complements it: you write to TypeSpec, then automatically generate OpenAPI files, SDKs, specs and so on. It gives you a source language for accurately describing your API.</p>
<p>In the next section, we'll look at how to create a REST API template.</p>
<h2 id="heading-how-to-create-a-rest-api-model">How to Create a REST API Model</h2>
<p>To deepen our understanding of REST API creation with TypeSpec, let's continue with the example of managing books. In this example, we'll create a <code>Book</code> model, define a service to manage the books, and add validations to ensure that the data respects the right constraints.</p>
<h3 id="heading-define-a-data-model-for-book">Define a Data Model for <code>Book</code></h3>
<p>First, we'll define a data model for the Book resource. A book can have the following properties:</p>
<ul>
<li><p><code>id</code>: A unique identifier for the book.</p>
</li>
<li><p><code>title</code>: The title of the book.</p>
</li>
<li><p><code>author</code>: The author of the book.</p>
</li>
<li><p><code>publicationYear</code>: The book's year of publication.</p>
</li>
<li><p><code>isbn</code>: The book's ISBN number.</p>
</li>
</ul>
<p><code>Book</code> <strong>model in TypeSpec</strong></p>
<pre><code class="lang-typescript">model Book {
  id: integer;
  <span class="hljs-meta">@minLength</span>(<span class="hljs-number">1</span>)
  title: <span class="hljs-built_in">string</span>;
  <span class="hljs-meta">@minLength</span>(<span class="hljs-number">1</span>)
  author: <span class="hljs-built_in">string</span>;
  publicationYear: integer;
  <span class="hljs-meta">@pattern</span>(<span class="hljs-string">"^\\d{3}-\\d{1,5}-\\d{1,7}-\\d{1,7}-\\d{1}$"</span>)
  isbn: <span class="hljs-built_in">string</span>;
}
</code></pre>
<ul>
<li><p><code>id</code>: Unique book identifier (<code>integer</code> type).</p>
</li>
<li><p><code>title</code> and <code>author</code>: Character strings representing the book's title and author, validated by <code>@minLength(1)</code> to ensure they are not empty.</p>
</li>
<li><p><code>publicationYear</code>: The book's year of publication (<code>integer</code> type).</p>
</li>
<li><p><code>isbn</code>: The book's ISBN number, validated with a regular expression that matches the standard format of an ISBN.</p>
</li>
</ul>
<h3 id="heading-define-a-rest-service-to-manage-books">Define a REST Service to Manage Books</h3>
<p>Now that we have a <code>Book</code> model, we'll create a service to manage CRUD operations on this resource. This service will contain methods for retrieving a book by its identifier, creating a new book, updating an existing book, and deleting a book.</p>
<p><code>BooksService</code> <strong>service in TypeSpec</strong></p>
<pre><code class="lang-typescript">service BooksService {

  <span class="hljs-meta">@get</span>(<span class="hljs-string">"/books/{id}"</span>)
  getBook(id: integer): Book;

  <span class="hljs-meta">@post</span>(<span class="hljs-string">"/books"</span>)
  createBook(book: Book): Book;

  <span class="hljs-meta">@put</span>(<span class="hljs-string">"/books/{id}"</span>)
  updateBook(id: integer, book: Book): Book;

  <span class="hljs-meta">@delete</span>(<span class="hljs-string">"/books/{id}"</span>)
  deleteBook(id: integer): <span class="hljs-built_in">void</span>;
}
</code></pre>
<p>The <code>BooksService</code> contains four methods for performing actions on books:</p>
<ul>
<li><p><code>@get("/books/{id}")</code>: Method for retrieving a book by its <code>id</code>.</p>
</li>
<li><p><code>@post("/books")</code>: Method for creating a new book.</p>
</li>
<li><p><code>@put("/books/{id}")</code>: Method for updating an existing book by its <code>id</code>.</p>
</li>
<li><p><code>@delete("/books/{id}")</code>: Method for deleting a book based on its <code>id</code>.</p>
</li>
</ul>
<p>These methods use HTTP annotations to indicate the type of operation they perform (GET, POST, PUT, DELETE).</p>
<h3 id="heading-add-additional-validations-for-the-book-model"><strong>Add Additional Validations for the</strong> <code>Book</code> <strong>Model</strong></h3>
<p>As in the previous example for users, we can add additional validations on <strong>Book</strong> template properties.</p>
<p><strong>Example of validation on</strong> <code>publicationYear</code> <strong>and</strong> <code>isbn</code></p>
<pre><code class="lang-typescript">model Book {
  id: integer;
  <span class="hljs-meta">@minLength</span>(<span class="hljs-number">1</span>)
  title: <span class="hljs-built_in">string</span>;
  <span class="hljs-meta">@minLength</span>(<span class="hljs-number">1</span>)
  author: <span class="hljs-built_in">string</span>;
  <span class="hljs-meta">@minValue</span>(<span class="hljs-number">1000</span>)
  publicationYear: integer;
  <span class="hljs-meta">@pattern</span>(<span class="hljs-string">"^\\d{3}-\\d{1,5}-\\d{1,7}-\\d{1,7}-\\d{1}$"</span>)
  isbn: <span class="hljs-built_in">string</span>;
}
</code></pre>
<ul>
<li><p><code>@minValue(1000)</code> guarantees that the year of publication is greater than or equal to 1000.</p>
</li>
<li><p>Validation of the <code>isbn</code> remains the same, using a regular expression to validate a standard ISBN format.</p>
</li>
</ul>
<h3 id="heading-a-complete-service-for-managing-books"><strong>A Complete Service for Managing Books</strong></h3>
<p>Now that we have the <code>Book</code> model and the necessary validations, here's a complete service for managing books, with all the essential operations.</p>
<p><strong>Complete</strong> <code>BooksService</code> <strong>in TypeSpec</strong></p>
<pre><code class="lang-typescript">model Book {
  id: integer;
  <span class="hljs-meta">@minLength</span>(<span class="hljs-number">1</span>)
  title: <span class="hljs-built_in">string</span>;
  <span class="hljs-meta">@minLength</span>(<span class="hljs-number">1</span>)
  author: <span class="hljs-built_in">string</span>;
  <span class="hljs-meta">@minValue</span>(<span class="hljs-number">1000</span>)
  publicationYear: integer;
  <span class="hljs-meta">@pattern</span>(<span class="hljs-string">"^\\d{3}-\\d{1,5}-\\d{1,7}-\\d{1,7}-\\d{1}$"</span>)
  isbn: <span class="hljs-built_in">string</span>;
}

service BooksService {
  <span class="hljs-meta">@get</span>(<span class="hljs-string">"/books/{id}"</span>)
  getBook(id: integer): Book;

  <span class="hljs-meta">@post</span>(<span class="hljs-string">"/books"</span>)
  createBook(book: Book): Book;

  <span class="hljs-meta">@put</span>(<span class="hljs-string">"/books/{id}"</span>)
  updateBook(id: integer, book: Book): Book;

  <span class="hljs-meta">@delete</span>(<span class="hljs-string">"/books/{id}"</span>)
  deleteBook(id: integer): <span class="hljs-built_in">void</span>;
}
</code></pre>
<ul>
<li><p>The <code>Book</code> model defines properties and validations for a book.</p>
</li>
<li><p>The <code>BooksService</code> provides endpoints for retrieving, creating, updating, and deleting a book.</p>
</li>
<li><p>Each service method is correctly annotated with the corresponding HTTP verbs (<code>GET</code>, <code>POST</code>, <code>PUT</code>, <code>DELETE</code>).</p>
</li>
</ul>
<p>And here’s a summary of everything we’ve done:</p>
<ul>
<li><p>We created a <code>Book</code> model with properties such as title, author, year of publication, and ISBN number.</p>
</li>
<li><p>We defined a <code>BooksService</code> to provide CRUD operations on books.</p>
</li>
<li><p>We added validations to ensure that the data respected specified constraints (for example, ISBN and year of publication).</p>
</li>
<li><p>We designed a complete REST API to manage books with TypeSpec, using a minimum amount of code and staying true to standards.</p>
</li>
</ul>
<p>This example shows just how quickly and efficiently TypeSpec can be used to model a REST API, while ensuring a clear structure and robust validations.</p>
<h2 id="heading-how-to-build-the-api-in-express-and-aspnet-core">How to Build the API in Express and ASP.NET Core</h2>
<p>Now that we've defined a book management REST service with TypeSpec, let's see how we'd implement this same API using two popular frameworks:</p>
<ul>
<li><p><strong>ExpressJS (Node.js / TypeScript)</strong></p>
</li>
<li><p><strong>ASP.NET Core (C#)</strong></p>
</li>
</ul>
<p>This will allow us to better compare TypeSpec's conciseness and readability with traditional implementations.</p>
<p><strong>Manual implementation with ExpressJS (Node.js / TypeScript):</strong></p>
<pre><code class="lang-typescript"><span class="hljs-comment">//server.ts</span>
<span class="hljs-keyword">import</span> express <span class="hljs-keyword">from</span> <span class="hljs-string">'express'</span>;

<span class="hljs-keyword">const</span> app = express();
app.use(express.json());

<span class="hljs-keyword">interface</span> Book {
  id: <span class="hljs-built_in">number</span>;
  title: <span class="hljs-built_in">string</span>;
  author: <span class="hljs-built_in">string</span>;
  publicationYear: <span class="hljs-built_in">number</span>;
  isbn: <span class="hljs-built_in">string</span>;
}

<span class="hljs-keyword">const</span> books: Book[] = [];

<span class="hljs-comment">// GET /books/:id</span>
app.get(<span class="hljs-string">'/books/:id'</span>, <span class="hljs-function">(<span class="hljs-params">req, res</span>) =></span> {
  <span class="hljs-keyword">const</span> id = <span class="hljs-built_in">parseInt</span>(req.params.id);
  <span class="hljs-keyword">const</span> book = books.find(<span class="hljs-function"><span class="hljs-params">b</span> =></span> b.id === id);
  <span class="hljs-keyword">if</span> (!book) <span class="hljs-keyword">return</span> res.status(<span class="hljs-number">404</span>).send({ message: <span class="hljs-string">'Book not found'</span> });
  res.send(book);
});

<span class="hljs-comment">// POST /books</span>
app.post(<span class="hljs-string">'/books'</span>, <span class="hljs-function">(<span class="hljs-params">req, res</span>) =></span> {
  <span class="hljs-keyword">const</span> newBook: Book = req.body;
  books.push(newBook);
  res.status(<span class="hljs-number">201</span>).send(newBook);
});

<span class="hljs-comment">// PUT /books/:id</span>
app.put(<span class="hljs-string">'/books/:id'</span>, <span class="hljs-function">(<span class="hljs-params">req, res</span>) =></span> {
  <span class="hljs-keyword">const</span> id = <span class="hljs-built_in">parseInt</span>(req.params.id);
  <span class="hljs-keyword">const</span> index = books.findIndex(<span class="hljs-function"><span class="hljs-params">b</span> =></span> b.id === id);
  <span class="hljs-keyword">if</span> (index === <span class="hljs-number">-1</span>) <span class="hljs-keyword">return</span> res.status(<span class="hljs-number">404</span>).send({ message: <span class="hljs-string">'Book not found'</span> });

  books[index] = req.body;
  res.send(books[index]);
});

<span class="hljs-comment">// DELETE /books/:id</span>
app.delete(<span class="hljs-string">'/books/:id'</span>, <span class="hljs-function">(<span class="hljs-params">req, res</span>) =></span> {
  <span class="hljs-keyword">const</span> id = <span class="hljs-built_in">parseInt</span>(req.params.id);
  <span class="hljs-keyword">const</span> index = books.findIndex(<span class="hljs-function"><span class="hljs-params">b</span> =></span> b.id === id);
  <span class="hljs-keyword">if</span> (index === <span class="hljs-number">-1</span>) <span class="hljs-keyword">return</span> res.status(<span class="hljs-number">404</span>).send({ message: <span class="hljs-string">'Book not found'</span> });

  books.splice(index, <span class="hljs-number">1</span>);
  res.status(<span class="hljs-number">204</span>).send();
});

app.listen(<span class="hljs-number">3000</span>, <span class="hljs-function">() =></span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Server is running on port 3000'</span>);
});
</code></pre>
<p><strong>Observations:</strong></p>
<ul>
<li><p>A lot of repetitive logic.</p>
</li>
<li><p>No automatic validation.</p>
</li>
<li><p>Routes must be maintained manually.</p>
</li>
<li><p>No automatically generated API documentation.</p>
</li>
</ul>
<p><strong>Manual implementation with</strong> <a target="_blank" href="http://ASP.NET"><strong>ASP.NET</strong></a> <strong>Core (C#):</strong></p>
<pre><code class="lang-csharp"><span class="hljs-comment">// Book.cs</span>
<span class="hljs-keyword">public</span> <span class="hljs-keyword">class</span> <span class="hljs-title">Book</span>
{
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">int</span> Id { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">set</span>; }

    [<span class="hljs-meta">Required</span>]
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">string</span> Title { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">set</span>; } = <span class="hljs-keyword">string</span>.Empty;

    [<span class="hljs-meta">Required</span>]
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">string</span> Author { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">set</span>; } = <span class="hljs-keyword">string</span>.Empty;

    [<span class="hljs-meta">Range(1000, int.MaxValue)</span>]
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">int</span> PublicationYear { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">set</span>; }

    [<span class="hljs-meta">RegularExpression(@<span class="hljs-meta-string">"^\d{3}-\d{1,5}-\d{1,7}-\d{1,7}-\d{1}$"</span>)</span>]
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">string</span> Isbn { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">set</span>; } = <span class="hljs-keyword">string</span>.Empty;
}
</code></pre>
<pre><code class="lang-csharp"><span class="hljs-comment">// BooksController.cs</span>
[<span class="hljs-meta">ApiController</span>]
[<span class="hljs-meta">Route(<span class="hljs-meta-string">"books"</span>)</span>]
<span class="hljs-keyword">public</span> <span class="hljs-keyword">class</span> <span class="hljs-title">BooksController</span> : <span class="hljs-title">ControllerBase</span>
{
    <span class="hljs-keyword">private</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">readonly</span> List<Book> books = <span class="hljs-keyword">new</span>();

    [<span class="hljs-meta">HttpGet(<span class="hljs-meta-string">"{id}"</span>)</span>]
    <span class="hljs-function"><span class="hljs-keyword">public</span> IActionResult <span class="hljs-title">GetBook</span>(<span class="hljs-params"><span class="hljs-keyword">int</span> id</span>)</span>
    {
        <span class="hljs-keyword">var</span> book = books.FirstOrDefault(b => b.Id == id);
        <span class="hljs-keyword">if</span> (book == <span class="hljs-literal">null</span>) <span class="hljs-keyword">return</span> NotFound(<span class="hljs-string">"Book not found"</span>);
        <span class="hljs-keyword">return</span> Ok(book);
    }

    [<span class="hljs-meta">HttpPost</span>]
    <span class="hljs-function"><span class="hljs-keyword">public</span> IActionResult <span class="hljs-title">CreateBook</span>(<span class="hljs-params">[FromBody] Book book</span>)</span>
    {
        books.Add(book);
        <span class="hljs-keyword">return</span> CreatedAtAction(<span class="hljs-keyword">nameof</span>(GetBook), <span class="hljs-keyword">new</span> { id = book.Id }, book);
    }

    [<span class="hljs-meta">HttpPut(<span class="hljs-meta-string">"{id}"</span>)</span>]
    <span class="hljs-function"><span class="hljs-keyword">public</span> IActionResult <span class="hljs-title">UpdateBook</span>(<span class="hljs-params"><span class="hljs-keyword">int</span> id, [FromBody] Book updatedBook</span>)</span>
    {
        <span class="hljs-keyword">var</span> index = books.FindIndex(b => b.Id == id);
        <span class="hljs-keyword">if</span> (index == <span class="hljs-number">-1</span>) <span class="hljs-keyword">return</span> NotFound(<span class="hljs-string">"Book not found"</span>);

        books[index] = updatedBook;
        <span class="hljs-keyword">return</span> Ok(updatedBook);
    }

    [<span class="hljs-meta">HttpDelete(<span class="hljs-meta-string">"{id}"</span>)</span>]
    <span class="hljs-function"><span class="hljs-keyword">public</span> IActionResult <span class="hljs-title">DeleteBook</span>(<span class="hljs-params"><span class="hljs-keyword">int</span> id</span>)</span>
    {
        <span class="hljs-keyword">var</span> book = books.FirstOrDefault(b => b.Id == id);
        <span class="hljs-keyword">if</span> (book == <span class="hljs-literal">null</span>) <span class="hljs-keyword">return</span> NotFound(<span class="hljs-string">"Book not found"</span>);

        books.Remove(book);
        <span class="hljs-keyword">return</span> NoContent();
    }
}
</code></pre>
<p><strong>Observations:</strong></p>
<ul>
<li><p>More formal and structured than Express, thanks to C# annotations (<code>[HttpPost]</code>, <code>[Required]</code>, and so on).</p>
</li>
<li><p>Validation is handled automatically via Data Annotations.</p>
</li>
<li><p>Once again, no automatic OpenAPI generation or SDK client without additional configuration.</p>
</li>
</ul>
<p><strong>Comparison with TypeSpec:</strong></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Aspect</strong></td><td><strong>TypeSpec</strong></td><td><strong>ExpressJS</strong></td><td><a target="_blank" href="http://ASP.NET"><strong>ASP.NET</strong></a> <strong>Core</strong></td></tr>
</thead>
<tbody>
<tr>
<td></td><td></td><td></td><td></td></tr>
<tr>
<td><strong>Syntax</strong></td><td>Declarative</td><td>Imperative</td><td>Structured</td></tr>
<tr>
<td><strong>Validation</strong></td><td>Automatic</td><td>Manual</td><td>Data Annotations</td></tr>
<tr>
<td><strong>Documentation</strong></td><td>Automatic</td><td>Manual</td><td>Generated(Swashbuckle)</td></tr>
<tr>
<td><strong>Reusability</strong></td><td>High</td><td>Low</td><td>Medium</td></tr>
<tr>
<td><strong>Generation</strong></td><td>OpenAPI/SDK</td><td>Non-native</td><td>Possible</td></tr>
</tbody>
</table>
</div><h2 id="heading-best-practices-for-structuring-typespec-projects-and-components">Best Practices for Structuring TypeSpec Projects and Components</h2>
<p>When you start writing API definitions in TypeSpec, it's easy to put everything in a single file. But as with any software project, as the application grows, a good structure becomes essential to guarantee the readability, reusability and maintainability of the code.</p>
<p>Here's a set of best practices I strongly recommend:</p>
<h3 id="heading-organize-by-functional-area"><strong>Organize by Functional Area</strong></h3>
<p>Use namespaces to group models, interfaces, and operations by business domain: <strong>book</strong>, <strong>user</strong>, <strong>auth</strong>, <strong>payment</strong>, and so on.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">namespace</span> MyApi.Books;
</code></pre>
<p>Create a <code>/books</code> folder with the following files:</p>
<pre><code class="lang-yaml"><span class="hljs-string">src/</span>
<span class="hljs-string">├──</span> <span class="hljs-string">books/</span>
<span class="hljs-string">│</span>   <span class="hljs-string">├──</span> <span class="hljs-string">models.tsp</span>
<span class="hljs-string">│</span>   <span class="hljs-string">├──</span> <span class="hljs-string">routes.tsp</span>
<span class="hljs-string">│</span>   <span class="hljs-string">└──</span> <span class="hljs-string">service.tsp</span>
</code></pre>
<p>This ensures a clear separation of responsibilities, just like in a well-structured Node.js project.</p>
<h3 id="heading-a-single-maintsp-entry-point"><strong>A Single</strong> <code>main.tsp</code> <strong>Entry Point</strong></h3>
<p>This is the main file that orchestrates:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// main.tsp</span>
<span class="hljs-keyword">import</span> <span class="hljs-string">"./books/service.tsp"</span>;
<span class="hljs-keyword">import</span> <span class="hljs-string">"./users/service.tsp"</span>;
<span class="hljs-keyword">import</span> <span class="hljs-string">"./auth/service.tsp"</span>;
</code></pre>
<p>This allows you to compile the entire project from a single point.</p>
<h3 id="heading-create-reusable-components">Create Reusable Components</h3>
<p>Define common models and types in a shared file. Example:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// common/models.tsp</span>
model ErrorResponse {
  code: <span class="hljs-built_in">string</span>;
  message: <span class="hljs-built_in">string</span>;
}

<span class="hljs-meta">@defaultResponse</span>
op <span class="hljs-built_in">Error</span>(): ErrorResponse;
</code></pre>
<p>Then import them into your other files:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> <span class="hljs-string">"../common/models.tsp"</span>;
</code></pre>
<p>This is handy for centralizing errors, standard answers, pagination types, and so on.</p>
<h3 id="heading-use-decorators-to-enrich-your-components">Use Decorators to Enrich Your Components</h3>
<p>Decorators such as <code>@doc</code>, <code>@minLength</code>, <code>@server</code>, <code>@route</code> or <code>@tag</code> can be used to generate valid, documented APIs without any extra effort:</p>
<pre><code class="lang-typescript"><span class="hljs-meta">@route</span>(<span class="hljs-string">"/books"</span>)
<span class="hljs-meta">@doc</span>(<span class="hljs-string">"Get all books"</span>)
op listBooks(): Book[];
</code></pre>
<p>A well-annotated API is one that is ready for automatic generation of documentation or clients.</p>
<h3 id="heading-define-servers-in-the-right-place">Define Servers in the Right Place</h3>
<p>Add your @server directive to a <code>service.tsp</code> or global <code>api.tsp</code> file:</p>
<pre><code class="lang-typescript"><span class="hljs-meta">@server</span>(<span class="hljs-string">"Production"</span>, <span class="hljs-string">"https://api.mysite.com"</span>)
<span class="hljs-meta">@server</span>(<span class="hljs-string">"Staging"</span>, <span class="hljs-string">"https://staging.mysite.com"</span>)
</code></pre>
<p>This allows you to target different environments without duplicating definitions.</p>
<h3 id="heading-validate-regularly">Validate Regularly</h3>
<p>Integrate <code>tsp compile</code> into your CI/CD to ensure that your definitions are always valid. Example with an npm script:</p>
<pre><code class="lang-bash">npm run tsp compile src/main.tsp --emit=./dist
</code></pre>
<p>This avoids last-minute errors and guarantees the consistency of your API over time.</p>
<p><strong>Example of a recommended complete structure:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-string">project-root/</span>
<span class="hljs-string">├──</span> <span class="hljs-string">src/</span>
<span class="hljs-string">│</span>   <span class="hljs-string">├──</span> <span class="hljs-string">books/</span>
<span class="hljs-string">│</span>   <span class="hljs-string">│</span>   <span class="hljs-string">├──</span> <span class="hljs-string">models.tsp</span>
<span class="hljs-string">│</span>   <span class="hljs-string">│</span>   <span class="hljs-string">├──</span> <span class="hljs-string">routes.tsp</span>
<span class="hljs-string">│</span>   <span class="hljs-string">│</span>   <span class="hljs-string">└──</span> <span class="hljs-string">service.tsp</span>
<span class="hljs-string">│</span>   <span class="hljs-string">├──</span> <span class="hljs-string">users/</span>
<span class="hljs-string">│</span>   <span class="hljs-string">│</span>   <span class="hljs-string">├──</span> <span class="hljs-string">models.tsp</span>
<span class="hljs-string">│</span>   <span class="hljs-string">│</span>   <span class="hljs-string">└──</span> <span class="hljs-string">service.tsp</span>
<span class="hljs-string">│</span>   <span class="hljs-string">├──</span> <span class="hljs-string">common/</span>
<span class="hljs-string">│</span>   <span class="hljs-string">│</span>   <span class="hljs-string">└──</span> <span class="hljs-string">models.tsp</span>
<span class="hljs-string">│</span>   <span class="hljs-string">└──</span> <span class="hljs-string">main.tsp</span>
<span class="hljs-string">├──</span> <span class="hljs-string">tspconfig.yaml</span>
<span class="hljs-string">├──</span> <span class="hljs-string">package.json</span>
<span class="hljs-string">└──</span> <span class="hljs-string">README.md</span>
</code></pre>
<p>In summary:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Good practice</strong></td><td><strong>Why it's important</strong></td></tr>
</thead>
<tbody>
<tr>
<td></td><td></td></tr>
<tr>
<td>Use <code>namespaces</code></td><td>Clear organization, readability</td></tr>
<tr>
<td>Dividing files by domain</td><td>Reusability, modularity</td></tr>
<tr>
<td>Centralize shared components</td><td>DRY (Don't Repeat Yourself)</td></tr>
<tr>
<td>Use decorators</td><td>Enrich documentation and validation</td></tr>
<tr>
<td>Integrate with CI/CD</td><td>Continuous quality, no surprises</td></tr>
<tr>
<td>Have a clear input file (<code>main.tsp</code>)</td><td>Simple, centralized compilation</td></tr>
</tbody>
</table>
</div><h2 id="heading-conclusion">Conclusion</h2>
<p>TypeSpec represents a real evolution in the way we design, document and maintain APIs. By adopting a declarative, modular, and typed approach, it simplifies the definition of APIs while enhancing their quality, readability, and consistency on a large scale.</p>
<p>Whether you're a front-end developer consuming APIs, a software architect looking to standardize your team's practices, or a technical documentation enthusiast, TypeSpec offers you a robust, modern, and extensible solution.</p>
<p>The TypeSpec ecosystem is still young but very promising, supported by Microsoft and used internally on a large scale. So now's the time to start exploring and adopting it for your projects.</p>
<h4 id="heading-ressources">Ressources</h4>
<ol>
<li><p><strong>TypeSpec official website</strong><br> <a target="_blank" href="https://typespec.io/">https://typespec.io</a><br> Full documentation, guides, syntax references and APIs.</p>
</li>
<li><p><strong>TypeSpec GitHub repository (Microsoft)</strong><br> <a target="_blank" href="https://github.com/microsoft/typespec/">https://github.com/microsoft/typespec</a><br> Source code, examples and community discussions.</p>
</li>
<li><p><strong>Playground TypeSpec (essayer dans le navigateur)</strong><br> <a target="_blank" href="https://typespec.io/playground/">https://typespec.io/playground</a><br> Quickly test your models without installing anything.</p>
</li>
<li><p><strong>TypeSpec documentation — Microsoft Learn</strong><br> <a target="_blank" href="https://learn.microsoft.com/en-us/azure/developer/typespec/overview/">https://learn.microsoft.com/en-us/azure/developer/typespec/overview</a><br> Learn how to use TypeSpec to create consistent, high-quality APIs efficiently and integrate them seamlessly with existing toolchains.</p>
</li>
<li><p><strong>OpenAPI Specification</strong><br> <a target="_blank" href="https://swagger.io/specification/">https://swagger.io/specification</a><br> To compare with current API description standards.</p>
</li>
<li><p><strong>TypeSpec 101 by Mario Guerra Product Manager for TypeSpec at Microsoft</strong><br> <a target="_blank" href="https://www.youtube.com/playlist?list=PLYWCCsom5Txglkl_I1XvwzrzM5G3SuVsR/">https://www.youtube.com/playlist?list=PLYWCCsom5Txglkl_I1XvwzrzM5G3SuVsR</a><br> A tutorial series, hosted by Mario Guerra, TypeSpec product manager at Microsoft, will guide you through the process of building a REST API using TypeSpec, and generating an OpenAPI specification from our code.</p>
</li>
<li><p><strong>APIs at Scale with TypeSpec</strong><br> <a target="_blank" href="https://youtu.be/yfCYrKaojDo/">https://youtu.be/yfCYrKaojDo</a><br> A talk given by Mandy Whaley from Microsoft at the 2024 Austin API Summit in Austin, Texas.</p>
</li>
</ol>
<p>Thanks for reading. You can find me on <a target="_blank" href="https://www.linkedin.com/in/AdalbertPungu/">LinkedIn</a>, and follow me on all socials @AdalbertPungu.</p>
 
</article>
<article>
<h1> How to Make LLMs Better at Math Using AI Agents, MathJS, and BaseAI Tool Calls </h1>
<p>Maham Codes — Thu, 19 Dec 2024 14:53:46 +0000</p>
 <p>Large Language Models (LLMs) like GPT often struggle to answer mathematical questions. In fact, if you ask a human a tough math question, like what is 185 cm in ft, they’ll struggle as well. They’d likely need a calculator to perform this conversion – and so do LLMs.</p>
<p>LLMs are built to handle natural language. While generally being good at generating words and stringing together language, when it comes to math, they often need help.</p>
<p>Unlike a calculator or math library, LLMs cannot sometimes reason or process symbolic logic. So, while they can manage basic arithmetic, especially if it's something familiar from their training data, they typically struggle with more complex problems, particularly word problems.</p>
<p>The main question is how to fix this LLM limitation?</p>
<p>No doubt, LLMs have evolved with the launch of reasoning models like GPT-o1 or Llama 3.3. But they still hallucinate, lack real-time data access, struggle with complex math, and produce non-deterministic outputs. Fortunately, we can solve this problem using AI agents.</p>
<h2 id="heading-what-is-an-ai-agent">What is an AI Agent?</h2>
<p>AI agents are autonomous software that use LLMs to perform tasks beyond simple text generation.</p>
<p>They make decisions and execute actions. AI agents rely on LLMs for language understanding but add capabilities like memory, real-time interaction, and decision-making.</p>
<h2 id="heading-how-ai-agents-solve-llm-limitations">How AI Agents Solve LLM Limitations</h2>
<p>Agents augment the capabilities of LLMs in the following ways:</p>
<ul>
<li><p><strong>Memory:</strong> AI agents help LLMs retain context from past interactions, improving long-term conversation coherence.</p>
</li>
<li><p><strong>Asynchronous processing:</strong> Agents handle multiple tasks at once, enhancing efficiency.</p>
</li>
<li><p><strong>Fact-checking:</strong> They connect to real-time data sources to verify information.</p>
</li>
<li><p><strong>Enhanced math:</strong> They integrate tools to handle complex calculations.</p>
</li>
<li><p><strong>Consistent output:</strong> Agents standardize LLM outputs for uniform formatting.</p>
</li>
</ul>
<p>To help address some of the math limitations LLMs experience, let’s create an AI agent that builds a calculator using MathJS and BaseAI tool calls.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>In this tutorial, I’ll be using the following tech stack:</p>
<ul>
<li><p><a target="_blank" href="https://mathjs.org/">MathJS</a> — an extensive math library for JavaScript and Node.js.</p>
</li>
<li><p><a target="_blank" href="https://baseai.dev">BaseAI</a> — the web framework for building AI agents locally.</p>
</li>
<li><p><a target="_blank" href="https://langbase.com">Langbase</a> — the platform to build and deploy your serverless AI agents.</p>
</li>
<li><p><a target="_blank" href="https://openai.com">OpenAI</a> — to get the LLM key for the preferred model.</p>
</li>
</ul>
<p>You’ll also need to:</p>
<ul>
<li><p>Sign up on Langbase to get access to the API key.</p>
</li>
<li><p>Sign up on OpenAI to generate the LLM key for the model you want to use (for this demo, I’ll be using GPT-4o mini).</p>
</li>
</ul>
<p>Let’s get started!</p>
<h2 id="heading-step-1-create-a-directory-and-initialize-npm">Step 1: Create a Directory and Initialize npm</h2>
<p>To start creating an AI agent, you need to create a directory in your local machine and install all the relevant dev dependancies in it. You can do this by navigating to it and running the following command in the terminal:</p>
<pre><code class="lang-bash">mkdir my-project

npm init -y

npm install dotenv mathjs
</code></pre>
<p>This command will create a <code>package.json</code> file in your project directory with default values. It will also install the <code>dotenv</code> package to read environment variables from the <code>.env</code> file, and <code>mathjs</code> to handle math operations.</p>
<h2 id="heading-step-2-create-an-ai-agent-pipe">Step 2: Create an AI Agent Pipe</h2>
<p>Next, we’ll be creating an AI agent pipe. Pipes are different from other agents, as they <strong>are serverless AI agents with agentic tools</strong> that can work with any language or framework. They are easily deployable, and with just one API they let you connect 100+ LLMs to any data to build any developer API workflow.</p>
<p>To create your AI agent pipe, navigate to your project directory. Run the following command:</p>
<pre><code class="lang-bash">npx baseai@latest pipe
</code></pre>
<p>Upon running that command, you’ll see the following prompts:</p>
<pre><code class="lang-bash">BaseAI is not installed but required to run. Would you like to install it? Yes/No

Name of the pipe? pipe-with-tool

Description of the pipe? An AI agent pipe that can call tools

Status of the pipe? Public/Private

System prompt? You are a helpful AI assistant
</code></pre>
<p>Once you are done with the name, description, and status of the AI agent pipe, everything will be set up automatically for you. Your pipe will be created successfully at <code>/baseai/pipes/pipe-with-tool.ts</code>.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Pipe is a serverless AI agent. It has agentic memory and tools.</div>
</div>

<h2 id="heading-step-3-add-a-env-file">Step 3: Add a .env File</h2>
<p>Create a <code>.env</code> file in the root directory of your project and add the <a target="_blank" href="https://platform.openai.com/api-keys">OpenAI</a> and Langbase API key in it. You can access your Langbase API key from <a target="_blank" href="https://langbase.com/docs/api-reference/api-keys">here</a>.</p>
<h2 id="heading-step-4-configure-the-ai-agent-pipe">Step 4: Configure the AI Agent Pipe</h2>
<p>In this step, we’ll configure the AI agent pipe created according to our needs.</p>
<p>Navigate to your project directory and open the AI agent pipe you created. You can add a system prompt to the pipe if you want. I’m sticking to <code>You are a helpful AI assistant that will work as a calculator.</code> This is what it will look like:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { PipeI } <span class="hljs-keyword">from</span> <span class="hljs-string">'@baseai/core'</span>;

<span class="hljs-keyword">const</span> pipePipeWithTool = (): <span class="hljs-function"><span class="hljs-params">PipeI</span> =></span> ({
   apiKey: process.env.LANGBASE_API_KEY!,
   name: <span class="hljs-string">'pipe-with-tool'</span>,
   description: <span class="hljs-string">'An AI agent pipe that can call tools'</span>,
   status: <span class="hljs-string">'private'</span>,
   model: <span class="hljs-string">'openai:gpt-4o-mini'</span>,
   stream: <span class="hljs-literal">true</span>,
   json: <span class="hljs-literal">false</span>,
   store: <span class="hljs-literal">true</span>,
   moderate: <span class="hljs-literal">true</span>,
   top_p: <span class="hljs-number">1</span>,
   max_tokens: <span class="hljs-number">1000</span>,
   temperature: <span class="hljs-number">0.7</span>,
   presence_penalty: <span class="hljs-number">1</span>,
   frequency_penalty: <span class="hljs-number">1</span>,
   stop: [],
   tool_choice: <span class="hljs-string">'auto'</span>,
   parallel_tool_calls: <span class="hljs-literal">true</span>,
   messages: [{ role: <span class="hljs-string">'system'</span>, content: <span class="hljs-string">`You are a helpful AI assistant that will work as a calculator.`</span> }],
   variables: [],
   memory: [],
   tools: []
});

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> pipePipeWithTool;
</code></pre>
<h2 id="heading-step-5-create-a-calculator-tool">Step 5: Create a Calculator Tool</h2>
<p>Tool calling lets an LLM use external tools, such as functions, APIs, or other resources, to get information or perform tasks beyond its built-in knowledge.</p>
<p>In this step, we'll create a <strong>Calculator Tool</strong> using BaseAI tools. This tool will handle all mathematical computations in your project, ensuring they are error-free and trustworthy. The tool is versatile and suitable for both simple calculations (e.g., <code>5+7</code>) and more advanced ones (e.g., <code>sin(pi/4) + log(10)</code>).</p>
<p>It will also be particularly helpful in reducing hallucinations, which it can do by offloading computations to an external tool This avoids incorrect or fabricated answers that LLMs might otherwise generate. It also reduces the likelihood of getting incorrect responses from the LLM by rechecking or gathering additional data to ensure accuracy.</p>
<p>By using BaseAI's smart tool-calling and memory features, we can reduce AI hallucinations by <strong>21%</strong> while improving the model's ability to self-correct its outputs.</p>
<p>These enhancements are useful when dealing with complex mathematical expressions or formula evaluations and should really improve the quality and accuracy of the LLM’s answers.</p>
<p>To create a calculator tool in your project that will be responsible for doing all the calculations without errors, run this command in your terminal:</p>
<pre><code class="lang-bash">npx baseai@latest tool
</code></pre>
<p>You’ll be asked to provide a name and description of the tool in your terminal. This is what I’m providing:</p>
<pre><code class="lang-bash">Name of the tool? Calculator

Description of the tool? Evaluate mathematical expressions
</code></pre>
<p>Your tool will be created at <code>/baseai/tools/calculator.ts</code>.</p>
<h2 id="heading-step-6-configure-the-calculator-tool">Step 6: Configure the Calculator Tool</h2>
<p>To configure the tool, navigate to your project directory and open the tool you created. You can find it at <code>/baseai/tools/calculator.ts</code>.</p>
<p>This is what the code will look like:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { ToolI } <span class="hljs-keyword">from</span> <span class="hljs-string">'@baseai/core'</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">calculator</span>(<span class="hljs-params"></span>) </span>{
   <span class="hljs-comment">// Add your tool logic here</span>
   <span class="hljs-comment">// This function will be called when the tool is executed</span>
}

<span class="hljs-keyword">const</span> toolCalculator = (): <span class="hljs-function"><span class="hljs-params">ToolI</span> =></span> ({
   run: calculator,
   <span class="hljs-keyword">type</span>: <span class="hljs-string">'function'</span> <span class="hljs-keyword">as</span> <span class="hljs-keyword">const</span>,
   <span class="hljs-function"><span class="hljs-keyword">function</span>: </span>{
       name: <span class="hljs-string">'toolCalculator'</span>,
       description: <span class="hljs-string">'Evaluate mathematical expressions'</span>,
       parameters: {}
   }
});

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> toolCalculator;
</code></pre>
<p>The <code>run</code> key in the <code>toolCalculator</code> object is the function that will be executed when the tool is called. You can write your logic to get the mathematical calculations for a given function.</p>
<p>Update the calculator tool’s description and code by adding parameters to the calculator function. The LLM will give values to these parameters when it calls the tool. And it’ll even import math from <code>mathjs</code>. This is the final code:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> * <span class="hljs-keyword">as</span> math <span class="hljs-keyword">from</span> <span class="hljs-string">'mathjs'</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">calculator</span>(<span class="hljs-params">{expression}: {expression: <span class="hljs-built_in">string</span>}</span>) </span>{
   <span class="hljs-keyword">return</span> math.evaluate(expression);
}

<span class="hljs-keyword">const</span> toolCalculator = <span class="hljs-function">() =></span> ({
   run: calculator,
   <span class="hljs-keyword">type</span>: <span class="hljs-string">'function'</span> <span class="hljs-keyword">as</span> <span class="hljs-keyword">const</span>,
   <span class="hljs-function"><span class="hljs-keyword">function</span>: </span>{
       name: <span class="hljs-string">'calculator'</span>,
       description:
           <span class="hljs-string">`A tool that can evaluate mathematical expressions. `</span> +
           <span class="hljs-string">`Example expressions: `</span> +
           <span class="hljs-string">`'5.6 * (5 + 10.5)', '7.86 cm to inch', 'cos(80 deg) ^ 4'.`</span>,
       parameters: {
           <span class="hljs-keyword">type</span>: <span class="hljs-string">'object'</span>,
           required: [<span class="hljs-string">'expression'</span>],
           properties: {
               expression: {
                   <span class="hljs-keyword">type</span>: <span class="hljs-string">'string'</span>,
                   description: <span class="hljs-string">'The mathematical expression to evaluate.'</span>,
               },
           },
       },
   },
});

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> toolCalculator;
</code></pre>
<h2 id="heading-step-7-integrate-the-tool-in-the-ai-agent-pipe">Step 7: Integrate the Tool in the AI Agent Pipe</h2>
<p>In this step, we’ll integrate the tool in the AI agent pipe we created. For that, open the pipe file present at <code>/baseai/pipes/pipe-with-tool.ts</code> and import the calculator tool at the top of the file. We will also call the calculator tool in the tools array of the pipe.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> {PipeI} <span class="hljs-keyword">from</span> <span class="hljs-string">'@baseai/core'</span>;
<span class="hljs-keyword">import</span> toolCalculator <span class="hljs-keyword">from</span> <span class="hljs-string">'../tools/calculator'</span>;

<span class="hljs-keyword">const</span> pipeWithTools = (): <span class="hljs-function"><span class="hljs-params">PipeI</span> =></span> ({
   apiKey: process.env.LANGBASE_API_KEY!,
   name: <span class="hljs-string">'pipe-with-tool'</span>,
   description: <span class="hljs-string">'An AI agent pipe that can call tools'</span>,
   status: <span class="hljs-string">'public'</span>,
   model: <span class="hljs-string">'openai:gpt-4o-mini'</span>,
   stream: <span class="hljs-literal">false</span>,
   json: <span class="hljs-literal">false</span>,
   store: <span class="hljs-literal">true</span>,
   moderate: <span class="hljs-literal">true</span>,
   top_p: <span class="hljs-number">1</span>,
   max_tokens: <span class="hljs-number">1000</span>,
   temperature: <span class="hljs-number">0.7</span>,
   presence_penalty: <span class="hljs-number">1</span>,
   frequency_penalty: <span class="hljs-number">1</span>,
   stop: [],
   tool_choice: <span class="hljs-string">'auto'</span>,
   parallel_tool_calls: <span class="hljs-literal">true</span>,
   messages: [{role: <span class="hljs-string">'system'</span>, content: <span class="hljs-string">`You are a helpful AI assistant that will work as a calculator.`</span>}],
   variables: [],
   memory: [],
   tools: [ toolCalculator()],
});

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> pipeWithTools;
</code></pre>
<h2 id="heading-step-8-integrate-ai-agent-pipe-in-nodejs">Step 8: Integrate AI Agent Pipe in Node.js</h2>
<p>Now we’ll integrate the AI agent pipe you created into the Node.js project to build an interactive command-line interface (CLI) for the calculator tool. This Node.js project will serve as the base for testing and interacting with the AI agent pipe (in the beginning of the tutorial, we set up a Node.js project by initializing npm).</p>
<p>Now, create an <code>index.ts</code> file:</p>
<pre><code class="lang-bash">touch index.ts
</code></pre>
<p>In this TypeScript file, import the AI agent pipe you created. We will use the pipe primitive from <code>@baseai/core</code> to run the pipe.</p>
<p>Add the following code to the <code>index.ts</code> file:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> <span class="hljs-string">'dotenv/config'</span>;
<span class="hljs-keyword">import</span> { Pipe } <span class="hljs-keyword">from</span> <span class="hljs-string">'@baseai/core'</span>;
<span class="hljs-keyword">import</span> inquirer <span class="hljs-keyword">from</span> <span class="hljs-string">'inquirer'</span>;
<span class="hljs-keyword">import</span> ora <span class="hljs-keyword">from</span> <span class="hljs-string">'ora'</span>;
<span class="hljs-keyword">import</span> chalk <span class="hljs-keyword">from</span> <span class="hljs-string">'chalk'</span>;
<span class="hljs-keyword">import</span> pipePipeWithTool <span class="hljs-keyword">from</span> <span class="hljs-string">'./baseai/pipes/pipe-with-tool'</span>;

<span class="hljs-keyword">const</span> pipe = <span class="hljs-keyword">new</span> Pipe(pipePipeWithTool());

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">main</span>(<span class="hljs-params"></span>) </span>{

   <span class="hljs-keyword">const</span> initialSpinner = ora(<span class="hljs-string">'Conversation with Math agent...'</span>).start();
   <span class="hljs-keyword">try</span> {
       <span class="hljs-keyword">const</span> { completion: calculatorTool} = <span class="hljs-keyword">await</span> pipe.run({
           messages: [{ role: <span class="hljs-string">'user'</span>, content: <span class="hljs-string">'Hello'</span> }],
       });
       initialSpinner.stop();
       <span class="hljs-built_in">console</span>.log(chalk.cyan(<span class="hljs-string">'Report Generator Agent response...'</span>));
       <span class="hljs-built_in">console</span>.log(calculatorTool);
   } <span class="hljs-keyword">catch</span> (error) {
       initialSpinner.stop();
       <span class="hljs-built_in">console</span>.error(chalk.red(<span class="hljs-string">'Error processing initial request:'</span>), error);
   }

   <span class="hljs-keyword">while</span> (<span class="hljs-literal">true</span>) {
       <span class="hljs-keyword">const</span> { userMsg } = <span class="hljs-keyword">await</span> inquirer.prompt([
           {
               <span class="hljs-keyword">type</span>: <span class="hljs-string">'input'</span>,
               name: <span class="hljs-string">'userMsg'</span>,
               message: chalk.blue(<span class="hljs-string">'Enter your query (or type "exit" to quit):'</span>),
           },
       ]);

       <span class="hljs-keyword">if</span> (userMsg.toLowerCase() === <span class="hljs-string">'exit'</span>) {
           <span class="hljs-built_in">console</span>.log(chalk.green(<span class="hljs-string">'Goodbye!'</span>));
           <span class="hljs-keyword">break</span>;
       }

       <span class="hljs-keyword">const</span> spinner = ora(<span class="hljs-string">'Processing your request...'</span>).start();

       <span class="hljs-keyword">try</span> {
           <span class="hljs-keyword">const</span> { completion: reportAgentResponse } = <span class="hljs-keyword">await</span> pipe.run({
               messages: [{ role: <span class="hljs-string">'user'</span>, content: userMsg }],
           });

           spinner.stop();
           <span class="hljs-built_in">console</span>.log(chalk.cyan(<span class="hljs-string">'Agent:'</span>));
           <span class="hljs-built_in">console</span>.log(reportAgentResponse);
       } <span class="hljs-keyword">catch</span> (error) {
           spinner.stop();
           <span class="hljs-built_in">console</span>.error(chalk.red(<span class="hljs-string">'Error processing your request:'</span>), error);
       }
   }
}

main();
</code></pre>
<p>This code creates an interactive CLI for chatting with an AI agent, using a pipe from the <code>@baseai/core</code> library to process user input. Here's what happens:</p>
<ul>
<li><p>It imports necessary libraries such as <code>dotenv</code> for environment configuration, <code>inquirer</code> for user input, <code>ora</code> for loading spinners, and <code>chalk</code> for colored output. Make sure you install these libraries first using this command in your terminal <code>npm install ora inquirer</code>.</p>
</li>
<li><p>A pipe object is created from the BaseAI library using a predefined tool called <code>pipe-with-tool</code>.</p>
</li>
</ul>
<p>In the <code>main()</code> function:</p>
<ul>
<li><p>A spinner starts while an initial conversation with the AI agent is initiated with the message 'Hello'.</p>
</li>
<li><p>The response from the AI is displayed.</p>
</li>
<li><p>A loop runs to continually ask the user for input and send queries to the AI agent.</p>
</li>
<li><p>The AI's responses are shown, and the process continues until the user types "exit”.</p>
</li>
</ul>
<h2 id="heading-step-9-start-the-baseai-server">Step 9: Start the BaseAI Server</h2>
<p>To run the AI agent pipe locally, you need to start the BaseAI server. Run the following command in your terminal:</p>
<pre><code class="lang-bash">npx baseai@latest dev
</code></pre>
<h2 id="heading-step-10-run-the-ai-agent-pipe">Step 10: Run the AI Agent Pipe</h2>
<p>Run the <code>index.ts</code> file using the following command:</p>
<pre><code class="lang-bash">npx tsx index.ts
</code></pre>
<h2 id="heading-result">Result</h2>
<p>In your terminal, you’ll be prompted to <strong>"Enter your query."</strong> For example, let’s ask: <strong>"What is 120 cm in feet?"</strong> LLMs usually hallucinate when converting to feet. But because of the self-healing tool calling of the BaseAI framework, the tool detects and corrects its own errors.</p>
<p>With this setup, we’ve successfully built an AI agent that uses <strong>MathJS</strong> and <strong>BaseAI tool calls</strong> to eliminate the mathematical limitations of LLMs.</p>
<p>Here’s a demo of the end result:</p>
<div class="embed-wrapper">
        </div>
<p> </p>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>As Large Language Models (LLMs) often struggle with mathematical reasoning due to their focus on language, leading to frequent errors in calculations, especially with complex math problems.</p>
<p>AI agents extend LLM capabilities by integrating tool calls. They handle real-time data, ensure more consistent outputs, and reduce hallucination.</p>
<p>By incorporating MathJS and tool calls via the BaseAI framework, developers can create custom serverless AI agents called pipes that serve as reliable calculators and address LLMs' inherent limitations.</p>
 
</article>
<article>
<h1> How to Build an AI Chatbot with Spring AI, React, and Docker </h1>
<p>Vikas Rajput — Mon, 23 Sep 2024 14:27:25 +0000</p>
 <p>Hey Java developers, I’ve got good news: Spring now has official support for building AI applications using the <a target="_blank" href="https://spring.io/projects/spring-ai">Spring AI</a> module.</p>
<p>In this tutorial, we’ll build a chatbot application using <a target="_blank" href="https://spring.io/projects/spring-boot"><strong>Spring Boot</strong></a>, <a target="_blank" href="https://react.dev/"><strong>React</strong></a>, <a target="_blank" href="https://www.docker.com/"><strong>Docker</strong></a>, and <a target="_blank" href="https://openai.com/"><strong>OpenAI</strong></a>. This app will let users interact with an AI-powered chatbot, ask questions, and receive responses in real time.</p>
<p>The entire source code mentioned in this article is already available on the <a target="_blank" href="https://github.com/vikasrajputin/springboot-react-docker-chatbot">GitHub repository</a>. Feel free to give it a star and fork it to play around.</p>
<p>To give you an idea of what we’ll be building here, this is how the final application will look:</p>
<p></p>
<p>Are you excited? Let’s build it from scratch!</p>
<h2 id="heading-table-of-contents"><strong>Table of Contents</strong></h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-get-your-openai-key">Get Your OpenAI key</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-build-the-rest-api-in-spring-boot">Build the REST API in Spring Boot</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-build-the-chatui-using-reactjs">Build the ChatUI using Reactjs</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-dockerize-the-application">How to Dockerize the Application</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-run-the-application">Run the Application</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-congratulations">Congratulations 🎉</a></p>
</li>
</ul>
<h2 id="heading-prerequisites"><strong>Prerequisites</strong></h2>
<p>Before we dive into building the chatbot, here are a few things you’ll need to be familiar with:</p>
<ol>
<li><p>Basic understanding of <strong>Java</strong> and <strong>Spring Boot</strong>.</p>
</li>
<li><p>Basic understanding of <strong>React</strong> and <strong>CSS</strong>.</p>
</li>
<li><p>Install <a target="_blank" href="https://jdk.java.net/java-se-ri/17-MR1">JDK</a>, <a target="_blank" href="https://docs.npmjs.com/downloading-and-installing-node-js-and-npm">Node Package Manager</a> and <a target="_blank" href="https://docs.docker.com/get-started/get-docker/">Docke</a><a target="_blank" href="https://docs.docker.com/desktop/">r</a> onto your machine.</p>
</li>
</ol>
<h2 id="heading-get-your-openai-key"><strong>Get Your OpenAI key</strong></h2>
<p>First, you’ll need to sign up for an <a target="_blank" href="https://platform.openai.com/">OpenAI</a> account if you don’t have one. Once signed in, you’ll be taken to the homepage.</p>
<p>In the top right corner, click the “Dashboard” menu. On the sidebar, click "API Keys," then click the "Create new secret key" button to generate your secret key:</p>
<p></p>
<p>Copy the secret key and save it somewhere safe, as you’ll need it later to connect your app to the OpenAI API.</p>
<p>You can go through the OpenAI <a target="_blank" href="https://platform.openai.com/docs/api-reference/authentication">API reference guide</a> to learn more about how to call the APIs, what requests it accepts, and the responses it gives.</p>
<h2 id="heading-build-the-rest-api-in-spring-boot"><strong>Build the REST API in Spring Boot</strong></h2>
<p>Let’s head over to the <a target="_blank" href="https://start.spring.io/">spring initializer</a> to generate the boilerplate code:</p>
<p></p>
<p>You can give the group, artifact, name, description, and package you choose. We’ve used Maven as the built tool, Spring boot version 3.3.3, Jar as a packaging option, and Java version 17.</p>
<p>Hit the generate button and the zip will be downloaded. Unzip the files and import them as a Maven project into your favourite IDE (mine is Intellij).</p>
<h3 id="heading-configure-your-openai-key-in-spring">Configure your OpenAI key in Spring</h3>
<p>You can either use the existing <code>application.properties</code> file or create a <code>application.yaml</code> file. I love working with Yaml, so created a <code>application.yaml</code> file where I can place all my Spring Boot configurations.</p>
<p>Add the OpenAIKey, Model, and Temperature to your <code>application.yaml</code> file:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">spring:</span>
  <span class="hljs-attr">ai:</span>
    <span class="hljs-attr">openai:</span>
      <span class="hljs-attr">chat:</span>
        <span class="hljs-attr">options:</span>
          <span class="hljs-attr">model:</span> <span class="hljs-string">"gpt-3.5-turbo"</span>
          <span class="hljs-attr">temperature:</span> <span class="hljs-string">"0.7"</span>
      <span class="hljs-attr">key:</span> <span class="hljs-string">"PUT YOUR OPEN_API_KEY HERE"</span>
</code></pre>
<p>A similar configuration in <code>application.properties</code> may look like as follows:</p>
<pre><code class="lang-basic">spring.ai.openai.chat.options.model=gpt-<span class="hljs-number">3.5</span>-turbo
spring.ai.openai.chat.options.temperature=<span class="hljs-number">0.7</span>
spring.ai.openai.<span class="hljs-keyword">key</span>=<span class="hljs-string">"PUT YOUR OPEN_API_KEY HERE"</span>
</code></pre>
<h3 id="heading-build-the-chatcontroller">Build the ChatController</h3>
<p>Let’s create a <code>GET</code> API with the URL <code>/ai/chat/string</code> and a method to handle the logic:</p>
<pre><code class="lang-java"><span class="hljs-meta">@RestController</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ChatController</span> </span>{

    <span class="hljs-meta">@Autowired</span>
    <span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> OpenAiChatModel chatModel;

    <span class="hljs-meta">@GetMapping("/ai/chat/string")</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> Flux<String> <span class="hljs-title">generateString</span><span class="hljs-params">(<span class="hljs-meta">@RequestParam(value = "message", defaultValue = "Tell me a joke")</span> String message)</span> </span>{
        <span class="hljs-keyword">return</span> chatModel.stream(message);
    }
}
</code></pre>
<ul>
<li><p>First, we’re adding <code>@RestController</code> to mark the <code>ChatController</code> class as our spring controller</p>
</li>
<li><p>Then, we’re injecting the dependency for the <code>OpenAiChatModel</code> class. It comes out of the box as part of the Spring AI dependency we’ve used.</p>
</li>
<li><p>The <code>OpenAiChatModel</code> comes with a method <code>stream(message)</code> which accepts the prompt as <code>String</code> and returns a <code>String</code> response (technically it’s a <code>Flux</code> of <code>String</code> as we’ve used a Reactive version of the same method).</p>
</li>
<li><p>Internally, <code>OpenAiChatModel.stream(message)</code> will call the OpenAI API and fetch the response from there. The OpenAI call will use the configuration steps mentioned in your <code>application.yaml</code> file, so make sure to use a valid OpenAI key.</p>
</li>
<li><p>We’ve created a method to handle the GET API call, which accepts the message and returns <code>Flux<String></code> as the response.</p>
</li>
</ul>
<h3 id="heading-build-run-and-test-the-rest-api">Build, Run, and Test the REST API</h3>
<p>Use the maven commands to build and run the Spring Boot application:</p>
<pre><code class="lang-bash">./mvnw clean install spring-boot:run
</code></pre>
<p>Ideally, it will run on a <code>8080</code> port unless you’ve customized the port. Make sure to keep that port free to successfully run the application.</p>
<p>You can either use <a target="_blank" href="https://www.postman.com/">Postman</a> or the <a target="_blank" href="https://curl.se/">Curl</a> command to test your REST API:</p>
<pre><code class="lang-bash">curl --location <span class="hljs-string">'http://localhost:8080/ai/chat/string?message=How%20are%20you%3F'</span>
</code></pre>
<h2 id="heading-build-the-chatui-using-reactjs">Build the ChatUI using React.js</h2>
<p>We will be making it super simple and easy for the sake of this tutorial, so pardon me if I don’t follow any React best practices.</p>
<h3 id="heading-create-appjs-to-manage-the-chatui-form">Create <code>App.js</code> to Manage the ChatUI Form</h3>
<p>We’ll be using <code>useState</code> to manage the state:</p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> [messages, setMessages] = useState([]);
<span class="hljs-keyword">const</span> [input, setInput] = useState(<span class="hljs-string">''</span>);
<span class="hljs-keyword">const</span> [loading, setLoading] = useState(<span class="hljs-literal">false</span>);
</code></pre>
<ul>
<li><p><code>messages</code>: It will store all the messages in the chat. Each message has a <code>text</code> and a <code>sender</code> (either 'user' or 'ai').</p>
</li>
<li><p><code>input</code>: To hold what the user is typing in the text box.</p>
</li>
<li><p><code>loading</code>: This state is set to <code>true</code> while the chatbot is waiting for a response from the AI, and <code>false</code> when the response is received.</p>
</li>
</ul>
<p>Let’s create a function <code>handleSend</code> and call it when the user sends a message by clicking a button or pressing Enter:</p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> handleSend = <span class="hljs-keyword">async</span> () => {
    <span class="hljs-keyword">if</span> (input.trim() === <span class="hljs-string">''</span>) <span class="hljs-keyword">return</span>;

    <span class="hljs-keyword">const</span> newMessage = { <span class="hljs-attr">text</span>: input, <span class="hljs-attr">sender</span>: <span class="hljs-string">'user'</span> };
    setMessages([...messages, newMessage]);
    setInput(<span class="hljs-string">''</span>);
    setLoading(<span class="hljs-literal">true</span>);

    <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> axios.get(<span class="hljs-string">'http://localhost:8080/ai/chat/string?message='</span> + input);
        <span class="hljs-keyword">const</span> aiMessage = { <span class="hljs-attr">text</span>: response.data, <span class="hljs-attr">sender</span>: <span class="hljs-string">'ai'</span> };
        setMessages([...messages, newMessage, aiMessage]);
    } <span class="hljs-keyword">catch</span> (error) {
        <span class="hljs-built_in">console</span>.error(<span class="hljs-string">"Error fetching AI response"</span>, error);
    } <span class="hljs-keyword">finally</span> {
        setLoading(<span class="hljs-literal">false</span>);
    }
};
</code></pre>
<p>Here’s what happens step by step:</p>
<ul>
<li><p><strong>Check empty input</strong>: If the input field is empty, the function returns early (nothing is sent).</p>
</li>
<li><p><strong>New message from the user</strong>: A new message is added to the <code>messages</code> array. This message has the <code>text</code> (whatever the user typed) and is marked as being sent by the 'user'.</p>
</li>
<li><p><strong>Reset input</strong>: The input field is cleared after the message is sent.</p>
</li>
<li><p><strong>Start loading</strong>: While waiting for the AI to respond, <code>loading</code> is set to <code>true</code> to show a loading indicator.</p>
</li>
<li><p><strong>Make API request</strong>: The code is used <code>axios</code> to request the AI chatbot API, passing the user's message. When the response comes back, a new message from the AI is added to the chat.</p>
</li>
<li><p><strong>Error handling</strong>: If there is a problem getting the AI’s response, an error is logged to the console.</p>
</li>
<li><p><strong>Stop loading</strong>: Finally, the loading state is turned off.</p>
</li>
</ul>
<p>Let’s write a function to update the <code>input</code> state whenever the user types something in the input field:</p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> handleInputChange = <span class="hljs-function">(<span class="hljs-params">e</span>) =></span> {
    setInput(e.target.value);
};
</code></pre>
<p>Next, let’s create a function to check if the user presses the Enter key. If they do, it calls <code>handleSend()</code> to send the message:</p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> handleKeyPress = <span class="hljs-function">(<span class="hljs-params">e</span>) =></span> {
    <span class="hljs-keyword">if</span> (e.key === <span class="hljs-string">'Enter'</span>) {
        handleSend();
    }
};
</code></pre>
<p>Now let’s create UI elements to render the chat messages:</p>
<pre><code class="lang-js">{messages.map(<span class="hljs-function">(<span class="hljs-params">message, index</span>) =></span> (
    <span class="xml"><span class="hljs-tag"><<span class="hljs-name">div</span> <span class="hljs-attr">key</span>=<span class="hljs-string">{index}</span> <span class="hljs-attr">className</span>=<span class="hljs-string">{</span>`<span class="hljs-attr">message-container</span> ${<span class="hljs-attr">message.sender</span>}`}></span>
        <span class="hljs-tag"><<span class="hljs-name">img</span>
            <span class="hljs-attr">src</span>=<span class="hljs-string">{message.sender</span> === <span class="hljs-string">'user'</span> ? '<span class="hljs-attr">user-icon.png</span>' <span class="hljs-attr">:</span> '<span class="hljs-attr">ai-assistant.png</span>'}
            <span class="hljs-attr">alt</span>=<span class="hljs-string">{</span>`${<span class="hljs-attr">message.sender</span>} <span class="hljs-attr">avatar</span>`}
            <span class="hljs-attr">className</span>=<span class="hljs-string">"avatar"</span>
        /></span>
        <span class="hljs-tag"><<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">{</span>`<span class="hljs-attr">message</span> ${<span class="hljs-attr">message.sender</span>}`}></span>
            {message.text}
        <span class="hljs-tag"></<span class="hljs-name">div</span>></span>
    <span class="hljs-tag"></<span class="hljs-name">div</span>></span></span>
))}
</code></pre>
<p>This block renders all the messages in the chat:</p>
<ul>
<li><p><strong>Mapping through messages</strong>: Each message is displayed as a <code>div</code> using <code>.map()</code>.</p>
</li>
<li><p><strong>Message styling</strong>: The class name of the message changes based on who the sender is (<code>user</code> or <code>ai</code>), making it clear who sent the message.</p>
</li>
<li><p><strong>Avatar images</strong>: Each message shows a small avatar, with a different image for the user and the AI.</p>
</li>
</ul>
<p>Let’s create some logic to show the loader based on a flag:</p>
<pre><code class="lang-js">{loading && (
    <span class="xml"><span class="hljs-tag"><<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"message-container ai"</span>></span>
        <span class="hljs-tag"><<span class="hljs-name">img</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"ai-assistant.png"</span> <span class="hljs-attr">alt</span>=<span class="hljs-string">"AI avatar"</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"avatar"</span> /></span>
        <span class="hljs-tag"><<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"message ai"</span>></span>...<span class="hljs-tag"></<span class="hljs-name">div</span>></span>
    <span class="hljs-tag"></<span class="hljs-name">div</span>></span></span>
)}
</code></pre>
<p>While the AI is thinking (when <code>loading</code> is <code>true</code>), we show a loading message (<code>...</code>) so the user knows a response is coming soon.</p>
<p>At last, create a button to click the message send button:</p>
<pre><code class="lang-jsx"><button onClick={handleSend}>
    <span class="xml"><span class="hljs-tag"><<span class="hljs-name">FaPaperPlane</span> /></span></span>
</button>
</code></pre>
<p>This button triggers the <code>handleSend()</code> function when clicked. The icon used here is a <a target="_blank" href="https://react-icons.github.io/react-icons/icons/fa/">paper plane</a>, which is common for "send" buttons.</p>
<p>The full <code>Chatbot.js</code> looks as below:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> React, { useState } <span class="hljs-keyword">from</span> <span class="hljs-string">'react'</span>;
<span class="hljs-keyword">import</span> axios <span class="hljs-keyword">from</span> <span class="hljs-string">'axios'</span>;
<span class="hljs-keyword">import</span> { FaPaperPlane } <span class="hljs-keyword">from</span> <span class="hljs-string">'react-icons/fa'</span>;
<span class="hljs-keyword">import</span> <span class="hljs-string">'./Chatbot.css'</span>;

<span class="hljs-keyword">const</span> Chatbot = <span class="hljs-function">() =></span> {
    <span class="hljs-keyword">const</span> [messages, setMessages] = useState([]);
    <span class="hljs-keyword">const</span> [input, setInput] = useState(<span class="hljs-string">''</span>);
    <span class="hljs-keyword">const</span> [loading, setLoading] = useState(<span class="hljs-literal">false</span>);

    <span class="hljs-keyword">const</span> handleSend = <span class="hljs-keyword">async</span> () => {
        <span class="hljs-keyword">if</span> (input.trim() === <span class="hljs-string">''</span>) <span class="hljs-keyword">return</span>;

        <span class="hljs-keyword">const</span> newMessage = { <span class="hljs-attr">text</span>: input, <span class="hljs-attr">sender</span>: <span class="hljs-string">'user'</span> };
        setMessages([...messages, newMessage]);
        setInput(<span class="hljs-string">''</span>);
        setLoading(<span class="hljs-literal">true</span>);

        <span class="hljs-keyword">try</span> {
            <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> axios.get(<span class="hljs-string">'http://localhost:8080/ai/chat/string?message='</span> + input);
            <span class="hljs-keyword">const</span> aiMessage = { <span class="hljs-attr">text</span>: response.data, <span class="hljs-attr">sender</span>: <span class="hljs-string">'ai'</span> };
            setMessages([...messages, newMessage, aiMessage]);
        } <span class="hljs-keyword">catch</span> (error) {
            <span class="hljs-built_in">console</span>.error(<span class="hljs-string">"Error fetching AI response"</span>, error);
        } <span class="hljs-keyword">finally</span> {
            setLoading(<span class="hljs-literal">false</span>);
        }
    };

    <span class="hljs-keyword">const</span> handleInputChange = <span class="hljs-function">(<span class="hljs-params">e</span>) =></span> {
        setInput(e.target.value);
    };

    <span class="hljs-keyword">const</span> handleKeyPress = <span class="hljs-function">(<span class="hljs-params">e</span>) =></span> {
        <span class="hljs-keyword">if</span> (e.key === <span class="hljs-string">'Enter'</span>) {
            handleSend();
        }
    };

    <span class="hljs-keyword">return</span> (
        <span class="xml"><span class="hljs-tag"><<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"chatbot-container"</span>></span>
            <span class="hljs-tag"><<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"chat-header"</span>></span>
                <span class="hljs-tag"><<span class="hljs-name">img</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"ChatBot.png"</span> <span class="hljs-attr">alt</span>=<span class="hljs-string">"Chatbot Logo"</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"chat-logo"</span> /></span>
                <span class="hljs-tag"><<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"breadcrumb"</span>></span>Home <span class="hljs-symbol">></span> Chat<span class="hljs-tag"></<span class="hljs-name">div</span>></span>
            <span class="hljs-tag"></<span class="hljs-name">div</span>></span>
            <span class="hljs-tag"><<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"chatbox"</span>></span>
                {messages.map((message, index) => (
                    <span class="hljs-tag"><<span class="hljs-name">div</span> <span class="hljs-attr">key</span>=<span class="hljs-string">{index}</span> <span class="hljs-attr">className</span>=<span class="hljs-string">{</span>`<span class="hljs-attr">message-container</span> ${<span class="hljs-attr">message.sender</span>}`}></span>
                        <span class="hljs-tag"><<span class="hljs-name">img</span>
                            <span class="hljs-attr">src</span>=<span class="hljs-string">{message.sender</span> === <span class="hljs-string">'user'</span> ? '<span class="hljs-attr">user-icon.png</span>' <span class="hljs-attr">:</span> '<span class="hljs-attr">ai-assistant.png</span>'}
                            <span class="hljs-attr">alt</span>=<span class="hljs-string">{</span>`${<span class="hljs-attr">message.sender</span>} <span class="hljs-attr">avatar</span>`}
                            <span class="hljs-attr">className</span>=<span class="hljs-string">"avatar"</span>
                        /></span>
                        <span class="hljs-tag"><<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">{</span>`<span class="hljs-attr">message</span> ${<span class="hljs-attr">message.sender</span>}`}></span>
                            {message.text}
                        <span class="hljs-tag"></<span class="hljs-name">div</span>></span>
                    <span class="hljs-tag"></<span class="hljs-name">div</span>></span>
                ))}
                {loading && (
                    <span class="hljs-tag"><<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"message-container ai"</span>></span>
                        <span class="hljs-tag"><<span class="hljs-name">img</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"ai-assistant.png"</span> <span class="hljs-attr">alt</span>=<span class="hljs-string">"AI avatar"</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"avatar"</span> /></span>
                        <span class="hljs-tag"><<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"message ai"</span>></span>...<span class="hljs-tag"></<span class="hljs-name">div</span>></span>
                    <span class="hljs-tag"></<span class="hljs-name">div</span>></span>
                )}
            <span class="hljs-tag"></<span class="hljs-name">div</span>></span>
            <span class="hljs-tag"><<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"input-container"</span>></span>
                <span class="hljs-tag"><<span class="hljs-name">input</span>
                    <span class="hljs-attr">type</span>=<span class="hljs-string">"text"</span>
                    <span class="hljs-attr">value</span>=<span class="hljs-string">{input}</span>
                    <span class="hljs-attr">onChange</span>=<span class="hljs-string">{handleInputChange}</span>
                    <span class="hljs-attr">onKeyPress</span>=<span class="hljs-string">{handleKeyPress}</span>
                    <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"Type your message..."</span>
                /></span>
                <span class="hljs-tag"><<span class="hljs-name">button</span> <span class="hljs-attr">onClick</span>=<span class="hljs-string">{handleSend}</span>></span>
                    <span class="hljs-tag"><<span class="hljs-name">FaPaperPlane</span> /></span>
                <span class="hljs-tag"></<span class="hljs-name">button</span>></span>
            <span class="hljs-tag"></<span class="hljs-name">div</span>></span>
        <span class="hljs-tag"></<span class="hljs-name">div</span>></span></span>
    );
};

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> Chatbot;
</code></pre>
<p>Use <code><Chatbot/></code> inside the <code>App.js</code> to load the Chatbot UI:</p>
<pre><code class="lang-javascript"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">App</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag"><<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"App"</span>></span>
            <span class="hljs-tag"><<span class="hljs-name">Chatbot</span> /></span>
        <span class="hljs-tag"></<span class="hljs-name">div</span>></span></span>
  );
}
</code></pre>
<p>Along with this, we’re also using CSS to make our chatbot a little more beautiful. You can refer to <a target="_blank" href="https://github.com/vikasrajputin/springboot-react-docker-chatbot/blob/main/chatbot-ui/src/App.css">App.css</a> and <a target="_blank" href="https://github.com/vikasrajputin/springboot-react-docker-chatbot/blob/main/chatbot-ui/src/Chatbot.css">Chatbot.css</a> for that.</p>
<h3 id="heading-run-the-frontend">Run the Frontend</h3>
<p>Use the <code>npm</code> command to run the application:</p>
<pre><code class="lang-bash">npm start
</code></pre>
<p>This should run the frontend on the URL <code>http://localhost:3000</code>. The application is good to be tested now.</p>
<p>But running the backend and frontend separately is a bit of a hassle. So let’s use Docker to make the entire build process easier.</p>
<h2 id="heading-how-to-dockerize-the-application"><strong>How to Dockerize th</strong>e Applic<strong>ation</strong></h2>
<p>Let’s dockerize the entire application to help bundle and ship it anywhere hassle-free. You can install and configure Docker from the <a target="_blank" href="https://docs.docker.com/get-started/get-docker/">official Docker website</a>.</p>
<h3 id="heading-dockerize-the-backend">Dockerize the Backend</h3>
<p>The backend of our chatbot is built with Spring Boot, so we will create a <code>Dockerfile</code> that builds the Spring Boot app into an executable JAR file and runs it in a container.</p>
<p>Let’s write the <code>Dockerfile</code> for it:</p>
<pre><code class="lang-dockerfile"><span class="hljs-comment"># Start with an official image that has Java installed</span>
<span class="hljs-keyword">FROM</span> openjdk:<span class="hljs-number">17</span>-jdk-alpine

<span class="hljs-comment"># Set the working directory inside the container</span>
<span class="hljs-keyword">WORKDIR</span><span class="bash"> /app</span>

<span class="hljs-comment"># Copy the Maven/Gradle build file and source code into the container</span>
<span class="hljs-keyword">COPY</span><span class="bash"> target/chatbot-backend.jar /app/chatbot-backend.jar</span>

<span class="hljs-comment"># Expose the application’s port</span>
<span class="hljs-keyword">EXPOSE</span> <span class="hljs-number">8080</span>

<span class="hljs-comment"># Command to run the Spring Boot app</span>
<span class="hljs-keyword">CMD</span><span class="bash"> [<span class="hljs-string">"java"</span>, <span class="hljs-string">"-jar"</span>, <span class="hljs-string">"chatbot-backend.jar"</span>]</span>
</code></pre>
<ul>
<li><p><code>FROM openjdk:17-jdk-alpine</code>: This specifies that the container should be based on a lightweight Alpine Linux image that includes JDK 17, which is needed to run Spring Boot.</p>
</li>
<li><p><code>WORKDIR /app</code>: Sets the working directory inside the container to <code>/app</code>, where our application files will live.</p>
</li>
<li><p><code>COPY target/chatbot-backend.jar /app/chatbot-backend.jar</code>: Copies the built JAR file from your local machine (usually in the <code>target</code> folder after building the project with Maven or Gradle) into the container.</p>
</li>
<li><p><code>EXPOSE 8080</code>: This tells Docker that the application will listen for requests on port 8080.</p>
</li>
<li><p><code>CMD ["java", "-jar", "chatbot-backend.jar"]</code>: This specifies the command that will run when the container starts. It runs the JAR file that launches the Spring Boot app.</p>
</li>
</ul>
<h3 id="heading-dockerize-the-frontend">Dockerize the Frontend</h3>
<p>The front end of our chatbot is built using React, and we can Dockerize it by creating a Dockerfile that installs the necessary dependencies, builds the app, and serves it using a lightweight web server like NGINX.</p>
<p>Let’s write the <code>Dockerfile</code> for the React frontend:</p>
<pre><code class="lang-dockerfile"><span class="hljs-comment"># Use a Node image to build the React app</span>
<span class="hljs-keyword">FROM</span> node:<span class="hljs-number">16</span>-alpine AS build

<span class="hljs-comment"># Set the working directory inside the container</span>
<span class="hljs-keyword">WORKDIR</span><span class="bash"> /app</span>

<span class="hljs-comment"># Copy the package.json and install the dependencies</span>
<span class="hljs-keyword">COPY</span><span class="bash"> package.json package-lock.json ./</span>
<span class="hljs-keyword">RUN</span><span class="bash"> npm install</span>

<span class="hljs-comment"># Copy the rest of the application code and build it</span>
<span class="hljs-keyword">COPY</span><span class="bash"> . .</span>
<span class="hljs-keyword">RUN</span><span class="bash"> npm run build</span>

<span class="hljs-comment"># Use a lightweight NGINX server to serve the built app</span>
<span class="hljs-keyword">FROM</span> nginx:alpine
<span class="hljs-keyword">COPY</span><span class="bash"> --from=build /app/build /usr/share/nginx/html</span>

<span class="hljs-comment"># Expose port 80 for the web traffic</span>
<span class="hljs-keyword">EXPOSE</span> <span class="hljs-number">80</span>

<span class="hljs-comment"># Start NGINX</span>
<span class="hljs-keyword">CMD</span><span class="bash"> [<span class="hljs-string">"nginx"</span>, <span class="hljs-string">"-g"</span>, <span class="hljs-string">"daemon off;"</span>]</span>
</code></pre>
<ul>
<li><p><code>FROM node:16-alpine AS build</code>: This uses a lightweight Node.js image to build the React app. We install all dependencies and build the app inside this container.</p>
</li>
<li><p><code>WORKDIR /app</code>: Sets the working directory inside the container to <code>/app</code>.</p>
</li>
<li><p><code>COPY package.json package-lock.json ./</code>: Copies <code>package.json</code> and <code>package-lock.json</code> to install dependencies.</p>
</li>
<li><p><code>RUN npm install</code>: Installs the dependencies listed in the package.json.</p>
</li>
<li><p><code>COPY . .</code>: Copies all the frontend source code into the container.</p>
</li>
<li><p><code>RUN npm run build</code>: Builds the React application. The built files will be in a <code>build</code> folder.</p>
</li>
<li><p><code>FROM nginx:alpine</code>: After building the app, this line starts a new container based on the <code>nginx</code> web server.</p>
</li>
<li><p><code>COPY --from=build /app/build /usr/share/nginx/html</code>: Copies the built React app from the first container into the nginx container, placing it in the default folder where NGINX serves files.</p>
</li>
<li><p><code>EXPOSE 80</code>: This exposes port 80, which NGINX uses to serve web traffic.</p>
</li>
<li><p><code>CMD ["nginx", "-g", "daemon off;"]</code>: This starts the NGINX server in the foreground to serve your React app.</p>
</li>
</ul>
<h3 id="heading-docker-compose-to-run-both">Docker Compose to Run Both</h3>
<p>Now that we have separate Dockerfiles for the frontend and backend, we’ll use <code>docker-compose</code> to orchestrate running both containers at once.</p>
<p>Let’s write the <code>docker-compose.yml</code> file inside the root directory of the project:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">version:</span> <span class="hljs-string">'3'</span>
<span class="hljs-attr">services:</span>
  <span class="hljs-attr">backend:</span>
    <span class="hljs-attr">build:</span> <span class="hljs-string">./backend</span>
    <span class="hljs-attr">ports:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">"8080:8080"</span>
    <span class="hljs-attr">networks:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">chatbot-network</span>

  <span class="hljs-attr">frontend:</span>
    <span class="hljs-attr">build:</span> <span class="hljs-string">./frontend</span>
    <span class="hljs-attr">ports:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">"3000:80"</span>
    <span class="hljs-attr">depends_on:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">backend</span>
    <span class="hljs-attr">networks:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">chatbot-network</span>

<span class="hljs-attr">networks:</span>
  <span class="hljs-attr">chatbot-network:</span>
    <span class="hljs-attr">driver:</span> <span class="hljs-string">bridge</span>
</code></pre>
<ul>
<li><p><code>version: '3'</code>: This defines the version of Docker Compose being used.</p>
</li>
<li><p><code>services:</code>: This defines the services we want to run.</p>
<ul>
<li><p><code>backend</code>: This service builds the backend using the Dockerfile located in the <code>./backend</code> directory and exposes port 8080.</p>
</li>
<li><p><code>frontend</code>: This service builds the front end using the Dockerfile located in the <code>./frontend</code> directory. It maps port 3000 on the host to port 80 inside the container.</p>
</li>
</ul>
</li>
<li><p><code>depends_on:</code>: This makes sure the front end waits for the backend to be ready before it starts.</p>
</li>
<li><p><code>networks:</code>: This section defines a shared network so that both the backend and frontend can communicate with each other.</p>
</li>
</ul>
<h2 id="heading-run-the-application">Run the Application</h2>
<p>To run the entire application (both frontend and backend), you can use the following command:</p>
<pre><code class="lang-bash">docker-compose up --build
</code></pre>
<p>This command will:</p>
<ul>
<li><p>Build both the frontend and backend images.</p>
</li>
<li><p>Start both containers (backend on port 8080, frontend on port 3000).</p>
</li>
<li><p>Set up networking so that both services can communicate.</p>
</li>
</ul>
<p>Now, you can head over to <code>http://localhost:3000</code> load the Chatbot UI and start asking your questions to the AI.</p>
<h2 id="heading-congratulations">Congratulations 🎉</h2>
<p>You’ve successfully built a full-stack chatbot application using Spring Boot, React, Docker, and OpenAI.</p>
<p>The source code shown in the project is available on <a target="_blank" href="https://github.com/vikasrajputin/springboot-react-docker-chatbot">Github</a>, if you found it helpful give it a star, and feel free to fork it and play around with it.</p>
 
</article>
<article>
<h1> Learn AI Engineering with OpenAI and JavaScript </h1>
<p>Beau Carnes — Mon, 29 Apr 2024 15:34:42 +0000</p>
 <p>Adding AI to your applications can open up a bunch of great features and uses. We just posted a course on the freeCodeCamp.org YouTube channel that will introduce you to AI Engineering using the OpenAI API and JavaScript.</p>
<p>Created by expert Scrimba instructors Tom Chant, Per Borgen, and Guil Hernandez, this course offers an immersive introduction to building AI-powered web apps.</p>
<p>This course is designed for those who have been watching the rise of OpenAI-powered apps and want to join in. Starting with a practical project that involves analyzing stock market data and providing investment recommendations, you will get hands-on experience with the GPT-4, OpenAI's advanced generative AI model.</p>
<p>The course then moves on to AI-driven image generation with DALL·E 3, OpenAI’s latest image generation model. You will learn to create vivid images from simple text inputs while gaining insights into AI safety and industry best practices.</p>
<p>By the end of this course, you will:</p>
<ul>
<li><p>Understand how to integrate the OpenAI API into applications.</p>
</li>
<li><p>Master the settings and techniques to maximize the output from the OpenAI API.</p>
</li>
<li><p>Have a portfolio-ready AI-powered app.</p>
</li>
<li><p>Gain critical knowledge about AI safety and best practices.</p>
</li>
</ul>
<p>The course covers everything from AI engineering basics to deploying AI apps using Cloudflare. Key modules include:</p>
<ul>
<li><p>Basics of AI Engineering and understanding API mechanics.</p>
</li>
<li><p>Practical challenges such as Prompt Engineering.</p>
</li>
<li><p>Advanced topics like the "Few Shot" Approach and Fine-tuning.</p>
</li>
<li><p>Deploying secure and robust AI applications using Cloudflare.</p>
</li>
</ul>
<p>So if you are ready to learn how to build and deploy AI-powered applications that are not only innovative but also industry-ready, check out the course on the <a target="_blank" href="https://www.youtube.com/watch?v=Yjy837dDvOY">freeCodeCamp.org YouTube channel</a> (2.5-hour watch).</p>
<div class="embed-wrapper">
        </div>
 
</article>
<article>
<h1> How to Turn Audio to Text using OpenAI Whisper </h1>
<p>Manish Shivanandhan — Tue, 05 Mar 2024 13:02:22 +0000</p>
 <p>Do you know what OpenAI Whisper is? It’s the latest AI model from OpenAI that helps you to automatically convert speech to text.</p>
<p>Transforming audio into text is now simpler and more accurate, thanks to OpenAI’s Whisper.</p>
<p>This article will guide you through using Whisper to convert spoken words into written form, providing a straightforward approach for anyone looking to leverage AI for efficient transcription.</p>
<h1 id="heading-introduction-to-openai-whisper">Introduction to OpenAI Whisper</h1>
<p><a target="_blank" href="https://platform.openai.com/docs/guides/speech-to-text">OpenAI Whisper</a> is an AI model designed to understand and transcribe spoken language. It is an automatic speech recognition (ASR) system designed to convert spoken language into written text.</p>
<p>Its capabilities have opened up a wide array of use cases across various industries. Whether you’re a developer, a content creator, or just someone fascinated by AI, Whisper has something for you.</p>
<p>Let's go over some its key features:</p>
<p><strong>1. Transcription</strong> s<strong>ervices:</strong> Whisper can transcribe audio and video content in real-time or from recordings, making it useful for generating accurate meeting notes, interviews, lectures, and any spoken content that needs to be documented in text form.</p>
<p><strong>2. Subtitling and</strong> c<strong>losed</strong> c<strong>aptioning:</strong> It can automatically generate subtitles and closed captions for videos, improving accessibility for the deaf and hard-of-hearing community, as well as for viewers who prefer to watch videos with text.</p>
<p><strong>3. Language</strong> l<strong>earning and</strong> t<strong>ranslation</strong>: Whisper's ability to transcribe in multiple languages supports language learning applications, where it can help in pronunciation practice and listening comprehension. Combined with translation models, it can also facilitate real-time cross-lingual communication.</p>
<p><strong>4. Accessibility</strong> t<strong>ools:</strong> Beyond subtitling, Whisper can be integrated into assistive technologies to help individuals with speech impairments or those who rely on text-based communication. It can convert spoken commands or queries into text for further processing, enhancing the usability of devices and software for everyone.</p>
<p><strong>5. Content</strong> s<strong>earchability:</strong> By transcribing audio and video content into text, Whisper makes it possible to search through vast amounts of multimedia data. This capability is crucial for media companies, educational institutions, and legal professionals who need to find specific information efficiently.</p>
<p><strong>6. Voice-</strong>c<strong>ontrolled</strong> a<strong>pplications:</strong> Whisper can serve as the backbone for developing voice-controlled applications and devices. It enables users to interact with technology through natural speech. This includes everything from smart home devices to complex industrial machinery.</p>
<p><strong>7. Customer</strong> s<strong>upport</strong> a<strong>utomation:</strong> In customer service, Whisper can transcribe calls in real time. It allows for immediate analysis and response from automated systems. This can improve response times, accuracy in handling queries, and overall customer satisfaction.</p>
<p><strong>8. Podcasting and</strong> j<strong>ournalism:</strong> For podcasters and journalists, Whisper offers a fast way to transcribe interviews and audio content for articles, blogs, and social media posts, streamlining content creation and making it accessible to a wider audience.</p>
<p>OpenAI's Whisper represents a significant advancement in speech recognition technology.</p>
<p>With its use cases spanning across enhancing accessibility, streamlining workflows, and fostering innovative applications in technology, it's a powerful tool for building modern applications.</p>
<h2 id="heading-how-to-work-with-whisper">How to Work with Whisper</h2>
<p>Now let’s look at a simple code example to convert an audio file into text using OpenAI’s Whisper. I would recommend using a <a target="_blank" href="https://colab.research.google.com/">Google Collab notebook</a>.</p>
<p>Before we dive into the code, you need two things:</p>
<ol>
<li><a target="_blank" href="https://platform.openai.com/api-keys">OpenAI API Key</a></li>
<li><a target="_blank" href="https://audio-samples.github.io/">Sample audio file</a></li>
</ol>
<p>First, install the OpenAI library (Use <code>!</code> only if you are installing it on the notebook):</p>
<pre><code>!pip install openai
</code></pre><p>Now let’s write the code to transcribe a sample speech file to text:</p>
<pre><code>#Import the openai Library
<span class="hljs-keyword">from</span> openai <span class="hljs-keyword">import</span> OpenAI

# Create an api client
client = OpenAI(api_key=<span class="hljs-string">"YOUR_KEY_HERE"</span>)

# Load audio file
audio_file= open(<span class="hljs-string">"AUDIO_FILE_PATH"</span>, <span class="hljs-string">"rb"</span>)

# Transcribe
transcription = client.audio.transcriptions.create(
  model=<span class="hljs-string">"whisper-1"</span>, 
  file=audio_file
)
# Print the transcribed text
print(transcription.text)
</code></pre><p>This script showcases a straightforward way to use OpenAI Whisper for transcribing audio files. By running this script with Python, you’ll see the transcription of your specified audio file printed to the console.</p>
<p>Feel free to experiment with different audio files and explore additional options provided by the <a target="_blank" href="https://platform.openai.com/docs/guides/speech-to-text">Whisper Library</a> to customize the transcription process to your needs.</p>
<h2 id="heading-tips-for-better-transcriptions">Tips for Better Transcriptions</h2>
<p>Whisper is powerful, but there are ways to get even better results from it. Here are some tips:</p>
<ol>
<li><strong>Clear</strong> a<strong>udio:</strong> The clearer your audio file, the better the transcription. Try to use files with minimal background noise.</li>
<li><strong>Language</strong> s<strong>election:</strong> Whisper supports multiple languages. If your audio isn’t in English, make sure to specify the language for better accuracy.</li>
<li><strong>Customiz</strong>e o<strong>utput:</strong> Whisper offers options to customize the output. You can ask it to include timestamps, confidence scores, and more. Explore the documentation to see what’s possible.</li>
</ol>
<h2 id="heading-advanced-features">Advanced Features</h2>
<p>Whisper isn’t just for simple transcriptions. It has features that cater to more advanced needs:</p>
<ol>
<li><strong>Real-</strong>t<strong>ime</strong> t<strong>ranscription</strong>: You can set up Whisper to transcribe the audio in real time. This is great for live events or streaming.</li>
<li><strong>Multi-</strong>l<strong>anguage</strong> s<strong>upport:</strong> Whisper can handle multiple languages in the same audio file. It’s perfect for multilingual meetings or interviews.</li>
<li><strong>Fine-</strong>t<strong>uning:</strong> If you have specific needs, you can fine-tune Whisper’s models to suit your audio better. This requires more technical skill but can significantly improve results.</li>
</ol>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Working with OpenAI Whisper opens up a world of possibilities. It’s not just about transcribing audio – it’s about making information more accessible and processes more efficient.</p>
<p>Whether you’re transcribing interviews for a research project, making your podcast more accessible with transcripts, or exploring new ways to interact with technology, Whisper has you covered.</p>
<p>Hope you enjoyed this article. <a target="_blank" href="https://www.turingtalks.ai/">Visit turingtalks.ai</a> for daily byte-sized AI tutorials.</p>
 
</article>
<article>
<h1> Create AI Assistants with OpenAI's Assistants API </h1>
<p>Beau Carnes — Mon, 22 Jan 2024 17:41:32 +0000</p>
 <p>OpenAI's Assistants API is a powerful tool that allows developers to build AI assistants within their apps efficiently. It simplifies managing conversation history and provides access to advanced features like Code Interpreter and Retrieval. The API also improves function-calling for third-party tools.</p>
<p>We just published a course on the freeCodeCamp.org YouTube channel that will teach you how to create AI assistants using the Assistants API. This tutorial is designed specifically for beginners and aims to equip learners with the skills to create dynamic, intelligent web applications using Streamlit and the Assistants API by OpenAI.</p>
<p>Paulo Dichone created this course. He is a senior software engineer and experienced teacher.</p>
<p>The course is structured to provide a comprehensive learning experience, covering everything from the basics of function calling with the API to leveraging its knowledge retrieval and code interpretation capabilities. Participants will also gain a fundamental understanding of Large Language Models (LLMs) which are crucial to the Assistants API.</p>
<h3 id="heading-course-breakdown">Course Breakdown:</h3>
<ol>
<li><strong>Introduction and Course Overview:</strong> Setting the stage for what the course will cover.</li>
<li><strong>Pre-requisites:</strong> Understanding what knowledge and tools you need to get started.</li>
<li><strong>Setup Guides:</strong> Step-by-step instructions for setting up Python, development tools, and VS Code.</li>
<li><strong>OpenAI Account and API Key Generation:</strong> A practical guide to getting your OpenAI API key.</li>
<li><strong>Deep Dive into the Assistants API:</strong> Exploring its benefits and how it differs from the Chat Completion API.</li>
<li><strong>Practical Application Development:</strong> Hands-on sessions for building applications using the Assistants API in Streamlit.</li>
<li><strong>Knowledge Retrieval and Embeddings:</strong> Techniques to make applications smarter and responsive.</li>
<li><strong>Building and Testing Applications:</strong> Real-world projects like creating a News Summarizer and a Study Buddy application in Streamlit.</li>
<li><strong>Final Wrap-up:</strong> Consolidating learning and final thoughts.</li>
</ol>
<p>The course is not just about theoretical knowledge. It includes a series of hands-on projects and real-world examples, enabling participants to apply what they've learned. By the end of the course, learners will not only understand the Assistants API but also be confident in building their own intelligent web applications.</p>
<p>This course is a valuable resource for anyone looking to delve into the world of AI-driven application development. Watch the full course on the freeCodeCamp.org YouTube channel (4-hour watch).</p>
<div class="embed-wrapper">
        </div>
 
</article>
<article>
<h1> Learn to Control GPT in the OpenAI Playground </h1>
<p>David Clinton — Fri, 08 Dec 2023 17:15:34 +0000</p>
 <p>ChatGPT is the interface most people use to work with OpenAI's large language model. But for someone who needs the versatility and power of programmatic access, there's no replacement for OpenAI's API. </p>
<p>The API is the interface you can use to connect programming code running on your own PC with OpenAI's GPT servers. </p>
<p>Of course, when using the API you can include plain language prompts where you ask GPT for answers and generated content. But you can also apply all the built-in power of that programming code to create sophisticated and automated operations that involve GPT. </p>
<p>You could, for instance, ask ChatGPT to write an article on a specific topic. But using the API, you can ask it to write 100 articles on the topics listed in a text document and then sit back while your code does all the work for you. </p>
<p>Anything that takes advantage of code rather than performing manual operations is a thousand times more effective when you add GPT to the equation.</p>
<p>The problem is that, like all APIs, figuring out the syntax and other fine details can take time. To help you over that hill, OpenAI created the visual tools within their <a target="_blank" href="https://platform.openai.com/playground">Playground</a>. Let's see how that'll help. </p>
<p>This article is excerpted from <a target="_blank" href="https://www.manning.com/books/the-complete-obsolete-guide-to-generative-ai?a_aid=bootstrap-it&a_bid=8c39744&a_bid=8c397448&chan=fcc_ai">my Manning book, The Complete Obsolete Guide to Generative AI</a>. </p>
<h2 id="heading-what-is-the-playground">What is the Playground?</h2>
<p>Playground, shown in the figure below, existed even before ChatGPT, and it was where I had my first interactions with GPT. Although do keep in mind that, along with everything else in the AI world, the interface will probably have changed at least twice by the time you get to it. </p>
<p>We're going to use the playground throughout this tutorial to learn how to interact with GPT.</p>
<p>
<em>OpenAI's Playground interface</em></p>
<p>You get to <a target="_blank" href="https://platform.openai.com/playground">Playground</a> from your OpenAI login account. Rather than enjoying a sustained conversation where subsequent exchanges are informed by earlier prompts and completions, when the Chat option is selected from the pull-down at the top-left of the screen, the text field in Playground offers only one exchange at a time. The models it's based on might also be a bit older and less refined than the ChatGPT version.</p>
<p>But there are two things that set Playground apart from ChatGPT. One is the configuration controls displayed down the right side of the screen in the image above. The second is the <em>View code</em> feature at the top-right. It's those features that make Playground primarily an educational tool rather than just another GPT interface.</p>
<h2 id="heading-how-to-access-python-code-samples">How to Access Python Code Samples</h2>
<p>The image below shows a typical Playground session where I've typed in a prompt and then hit the "View code" button with the "Python" option selected. I'm shown working code that, assuming you'll add a valid OpenAI API key on line 4, can be copied and run from any internet-connected computer.</p>
<p>
<em>Playground's View code tool with Python code</em></p>
<p>Don't worry about the details right now, but take a moment to look through the arguments that are included in the <code>openai.Completion.create()</code> method. </p>
<p>The model that's currently selected in the Model field on the right side of the Playground is there (<code>text-davinci-003</code>), as is my actual prompt (<code>Explain the purpose of...</code>). In fact, each configuration option I've selected is there. </p>
<p>In other words, I can experiment with any combination of configurations here in the Playground, and then copy the code and run it – or variations of it – anywhere.</p>
<p>This, in fact, is where you learn how to use the API. In other words, here is where you're shown code samples that can form the basis of a lot of what you'll eventually want to run in your own environment.</p>
<h2 id="heading-how-to-access-curl-code-samples">How to Access CURL Code Samples</h2>
<p>The next image shows us how that exact same prompt would work if I decided to use the command line tool, curl, instead of Python. </p>
<p>
<em>Playground's View code tool with curl code</em></p>
<p><code>curl</code> is a venerable open source command line tool that's often available by default. You'll generally use <code>curl</code> when you want to access a remote server directly from your command line. Those will usually be for relatively simpler requests. </p>
<p>Python, on the other hand, will be the tool of choice for more complicated applications that involve programming logic.</p>
<p>To confirm it's available on your system, simply type <code>curl</code> at any command line prompt. You should see some kind of help message with suggestions for proper usage.</p>
<p>Besides Python and curl, you can also display code in Node.js (for when you're building server-based applications) and JSON (to enable programmatic integrations). </p>
<p>With that, you're all set to dive deeper than simple chat sessions: you're now able to finely control and programmatically automate your interactions with GPT from the comfort of your own command line (or IDE).</p>
<p>This article is excerpted from <a target="_blank" href="https://www.manning.com/books/the-complete-obsolete-guide-to-generative-ai?a_aid=bootstrap-it&a_bid=8c39744&a_bid=8c397448&chan=fcc_ai">my Manning book, The Complete Obsolete Guide to Generative AI</a>. There's plenty more technology goodness available through <a target="_blank" href="https://bootstrap-it.com">my website</a>.</p>
 
</article>
</main></body></html>

Model	Input cost	Output cost	Weekly total	Annualized (52 wk)
GPT-5.5 (\(5 / \)30)	3.6M × \(5/1M = \)18.00	0.36M × \(30/1M = \)10.80	$28.80	$1,498
GPT-5.5 Pro (\(30 / \)180)	$108.00	$64.80	$172.80	$8,986
GPT-5.4 (\(2.50 / \)15)	$9.00	$5.40	$14.40	$749
GPT-5-Codex (\(1.25 / \)10)	$4.50	$3.60	$8.10	$421
GPT-5.1-Codex-mini (\(0.25 / \)2)	$0.90	$0.72	$1.62	$84