#ai-tools - freeCodeCamp.org

Top AI Productivity Tools for Developers and Professionals

Manish Shivanandhan — Fri, 03 Jul 2026 15:38:52 +0000

Artificial intelligence has moved far beyond simple chatbots and basic automation. Today, developers and non-developers alike use AI tools to research information, write content, create presentations, manage projects, and analyse data faster than ever before.

The biggest advantage of AI is not that it replaces people. Instead, it helps people spend less time on repetitive work and more time on tasks that require creativity, strategy, and decision-making.

Whether you're a marketer, consultant, business owner, educator, or knowledge worker, the right AI tools can help you improve productivity and reduce the time spent on routine activities.

In this article, we'll look at some of the most useful AI productivity tools and explore how they can streamline everyday work.

What We'll Cover:

Notion AI for Knowledge Management
Glean - Best Enterprise AI Platform
QuillBot for Faster Slide Creation
Transkriptor for Meeting Documentation
Cardscanner for Text Digitisation and Extraction
Conclusion

Notion AI for Knowledge Management

Information overload is a common challenge in modern workplaces. Teams create documents, meeting notes, project plans, and reports every day. Finding the right information later can become difficult.

Notion AI helps you organise and manage knowledge more effectively. It can summarise notes, generate content, answer questions about stored information, and assist with document creation.

Imagine attending several meetings in a week. Instead of manually reviewing pages of notes, you can use Notion AI to create summaries and extract key action items.

The tool also helps teams maintain centralised documentation. Employees can quickly search for information and receive concise answers rather than reading lengthy documents.

This capability improves collaboration and reduces the time spent looking for information across different systems.

For teams that prefer self-hosted solutions, AppFlowy AI offers an open-source workspace with AI capabilities similar to Notion. Outline is another open-source knowledge base that can be paired with local LLMs such as Ollama to provide AI-powered search, summaries, and document assistance.

Glean – Best Enterprise AI Platform

Glean is not just another AI tool. It's the unified AI layer for enterprise knowledge, a platform that sits across your entire business and makes company knowledge instantly accessible, contextual, and secure. No other tool on this list operates at this level of breadth and depth for large organisations.

Rather than being a standalone chatbot or a note-taking app with AI, Glean functions as a work AI layer across your business. It connects to over 100 enterprise applications, including Google Workspace, Microsoft 365, Salesforce, Jira, Confluence, Slack, ServiceNow, and more, building a deep, permission-aware understanding of your company's content, structure, and people.

The key differentiator is how Glean handles permissions. It doesn't flatten your organisation's access controls. It reinforces them. A salesperson only sees what they're authorised to see. An engineer gets code-repo context. A support rep surfaces relevant customer history automatically. The AI inherits and respects the same permissions already configured in your source systems.

Organisations looking for an open-source enterprise search platform can explore Open WebUI connected to local LLMs and document stores, or AnythingLLM, which allows teams to build private AI knowledge assistants over internal documents while retaining full control of their data.

QuillBot for Faster Slide Creation

Creating presentations can be a very time-consuming task. Whether you're preparing a client proposal, a sales pitch, a business review, or a training session, building slides from scratch can take hours.

This is where QuillBot's Presentation Maker can help.

The process is designed to be simple. You begin by entering a presentation idea or describing your requirements in a sentence or two. The AI then generates presentation content and creates slides automatically. After the initial draft is ready, you can edit and customise slide text, titles, and slide order to match your specific needs.

Instead of spending valuable time creating every slide manually, you can start with a complete presentation draft and focus on refining the content.

This can be useful when preparing client-ready decks, business proposals, meeting summaries, or executive presentations. Marketing teams can create client pitches, campaign strategies, and proposal documents more efficiently. Educators can quickly build lesson slides and visual teaching materials without spending hours on presentation design.

As workplace communication increasingly relies on presentations, these presentation tools can help you create polished materials in a fraction of the time.

There are few mature open-source AI presentation generators, but LibreOffice Impress can be combined with local AI assistants like Open WebUI or AnythingLLM to generate slide content before exporting it into presentations. This provides a privacy-focused alternative without relying on proprietary AI services.

Transkriptor for Meeting Documentation

Meetings play a critical role in collaboration, but capturing every detail can take attention away from meaningful discussions. Manual note-taking often makes it difficult for participants to stay fully engaged during conversations.

Transkriptor uses AI-powered transcription to automatically convert meeting conversations into accurate, searchable text. Teams can stay focused on the discussion while the platform records and organises meeting content in real time.

Once the meeting is complete, you can review transcripts, extract key insights, identify action points, and share summaries with your colleagues for better alignment.

This functionality is particularly valuable for remote and hybrid teams that rely on frequent virtual meetings and need reliable documentation of decisions and discussions. Instead of spending time creating manual notes, your team can maintain structured records with minimal effort.

Users who want full control over meeting recordings can use Whisper or WhisperX to generate accurate transcripts on their own hardware. These projects support multiple languages and can be integrated into custom workflows without sending audio to third-party services.

By automating meeting documentation, you can dedicate more time to collaboration and less time to administrative tasks.

Cardscanner for Text Digitisation and Extraction

Not every engineering task is about formulas, graphs, or code. Researchers also spend a lot of time working with handwritten notes, scanned pages, diagrams, embedded text, and printed reference material.

Cardscanner helps make that material easier to use by turning static image text into editable digital content. With its image to text converter, you can quickly extract text from screenshots, lecture notes, research pages, or printed handouts without retyping everything manually.

This is especially useful when organising research material, converting notes into digital files, or collecting information from charts and documents for reports. Instead of spending extra time copying content by hand, you can focus more on understanding and applying the information.

Tesseract OCR remains one of the most widely used open-source OCR engines for extracting text from images and scanned documents. For higher accuracy on complex layouts, PaddleOCR provides modern deep-learning-based text recognition with support for multiple languages.

For researchers, this kind of tool saves time and reduces unnecessary complexity in day-to-day academic work. This is especially convenient when they have to manage large amounts of reading and reference material for extracting text.

Conclusion

AI productivity tools are transforming the way we work by reducing repetitive tasks, improving access to information, and accelerating content creation.

Whether you need an AI-powered knowledge base like Notion AI, enterprise search through Glean, automated presentation creation with QuillBot, meeting transcription using Transkriptor, or text extraction with Cardscanner, each tool addresses a different part of the modern workflow.

At the same time, open-source alternatives continue to mature. Solutions such as AppFlowy AI, AnythingLLM, Open WebUI, Whisper, Tesseract OCR, and PaddleOCR give you and your team the flexibility to self-host your AI stack, maintain greater control over your data, and reduce reliance on proprietary platforms.

The best productivity strategy is rarely built around a single application. Instead, you should combine the tools that best fit your workflow, balancing ease of use, privacy, cost, and integration requirements.

As AI capabilities continue to evolve, those who embrace the right mix of commercial and open-source solutions will be better positioned to work more efficiently, collaborate more effectively, and focus on the high-value work that matters most.

Hope you enjoyed this article. You can connect with me on LinkedIn.

The Codex Handbook: A Practical Guide to OpenAI's Coding Platform

Tatev Aslanyan — Fri, 08 May 2026 23:02:00 +0000

This handbook is written for developers, team leads, and admins who want to understand what Codex is, how to set it up, how to use it well, how it differs from general-purpose models, and how pricing works today.

It's based on current OpenAI Codex documentation and Help Center articles. Pricing and plan availability change frequently, so treat the pricing section as a snapshot of the current docs and verify against the official links before making procurement decisions.

What's new (April 2026): OpenAI released GPT-5.5 and GPT-5.5 Pro on April 23–24, 2026. GPT-5.5 is now the flagship general model and is rolling into Codex surfaces. See the new "GPT-5.5: The Newest Release" subsection in Section 2, the full benchmark deep dive in Section 11, and the updated pricing snapshot in Section 7.

Authors: Tatev Aslanyan, Vahe Aslanyan, Jim Amuto | Version: 1.3 — Last updated April 30, 2026

Executive Summary

Codex is OpenAI's coding agent — not a single model, but a product and workflow layer that wraps OpenAI's frontier models with file access, shell execution, sandboxes, approval flows, and code review.

It runs in four surfaces: the CLI, IDE extensions (VS Code, Cursor, Windsurf), the macOS/Windows app, and Codex Cloud for background tasks against GitHub repositories.

The product is included with most paid ChatGPT plans (Plus, Pro, Business, Enterprise/Edu) and, for now, Free and Go with stricter rate limits.

The model layer beneath Codex shifted in April 2026. GPT-5.5 is the new general flagship, with substantial gains on agentic and long-context benchmarks (MRCR v2 at 1M tokens jumped from 36.6% on GPT-5.4 to 74.0% on GPT-5.5. Terminal-Bench 2.0 reaches 82.7%, and hallucination rate dropped roughly 60% versus prior generations). It's also roughly 2× the per-token cost of GPT-5.4, so picking the right model per task now matters more for budget than it did a quarter ago.

For teams adopting Codex, the highest-leverage choices are:

Start in the CLI or IDE on small bounded tasks before enabling cloud
Use Codex as a pre-merge reviewer in addition to a code generator
Keep admin and user access separated through workspace RBAC, and
Treat token consumption — not prompt count — as the cost driver.

The 30-60-90 day adoption plan in the appendix gives a phased rollout that surfaces friction early.

This handbook covers what Codex is, how to set it up, how to use it well, how it compares to Claude Code, GitHub Copilot, and self-hosted alternatives. We'll also discuss what it costs, how to govern it in an enterprise, and where it does and does not fit. You'll find a glossary, security checklist, and worked cost example in the appendix.

Here's What We'll Cover:

Executive Summary
Prerequisites
Section 1: What Codex Is
Section 2: Where Codex Fits in the OpenAI Ecosystem
Section 3: The Core Surfaces
Section 4: Getting Started: Install, Set Up, and Your First Task
Section 5: How to Use Codex Effectively
Section 6: Difference Between Codex and Other Coding Tools
Comparison Matrix
Section 7: Pricing and Plan Access
Worked Cost Example
Section 8: Security, Permissions, and Enterprise Setup
Section 9: Best Practices for Teams
Section 10: Common Workflows and Examples
Section 11: Model Specs and Benchmarks (GPT-5.5 Deep Dive)
Section 12: Troubleshooting
Section 13: FAQ
Section 14: When NOT to Use Codex
Section 15: Final Recommendations
Section 16: Source References
Appendix A: 30-60-90 Day Adoption Plan
Appendix B: Glossary
Appendix C: Admin Security Checklist
Appendix D: Changelog
Appendix E: Working with Codex in VS Code

Prerequisites

This handbook is hands-on. To get the most out of it — especially Section 4, Section 5, and Section 10 where you'll install Codex and run real tasks — you should have the following in place.

Background Knowledge You Should Already Have

You don't need to be a senior engineer, but the walkthroughs assume:

Comfort using the command line. You can cd into a directory, list files, run git commands, and read shell error messages. If you have never opened a terminal, work through a one-hour shell tutorial first.
Basic Git literacy. You understand commits, branches, pull requests, and the difference between staged and unstaged changes. The Codex workflow centers on producing reviewable diffs, so this is non-negotiable.
Experience reading code in at least one mainstream language. Codex can work in any language, but the demo repo in Section 4 is a small Python service. If you can read Python, JavaScript, Go, or similar, you'll be fine.
A mental model of "what an API call costs." Section 7's worked cost example assumes you understand that LLM usage is metered by tokens. If "tokens" is a brand-new concept, skim the OpenAI tokenizer page once before reading Section 7.

If you're an engineering manager, procurement lead, or admin and you only need Section 7, Section 8, and Section 14, you can skip the technical prerequisites and jump straight to those sections.

Tools and Accounts You Need to Install

Before starting Section 4, have the following ready. Approximate setup time: 15–25 minutes if you're starting from scratch.

Tool / Account	Why you need it	Where to get it
A ChatGPT account on Plus, Pro, Business, or Enterprise/Edu	Codex is included with these plans. Free and Go work for now but with stricter rate limits	chatgpt.com
Node.js 18+ and npm	The Codex CLI is installed via npm (`npm i -g @openai/codex`)	nodejs.org
Git 2.30+	Required to clone the demo repo and produce diffs Codex can review	git-scm.com
A code editor	VS Code is the recommended baseline. Cursor and Windsurf also work	code.visualstudio.com
A GitHub account	Required only for Codex Cloud tasks (Section 8 and Appendix E)	github.com
WSL2 (Windows users only)	The Codex CLI is experimental on native Windows; WSL is the supported path	Microsoft WSL docs

Verify Your Environment

Run these three commands before you start Section 4. If any of them fails, fix it first.

node --version   # should print v18.x or higher
npm --version    # should print 9.x or higher
git --version    # should print 2.30 or higher

What This Handbook Will Not Teach You

To set expectations honestly, this handbook does not cover:

How to write production-grade Python, JavaScript, or any specific language. We use small examples to demonstrate Codex behavior, not teach syntax.
How to design a system architecture from scratch. Section 14 explains why Codex is a poor fit for novel architecture decisions.
How to administer GitHub at the organization level. Section 8 covers the Codex-specific GitHub Connector setup, but assumes your GitHub org already exists.
LLM internals (attention, RLHF, and so on). We treat the model as a black box with measurable behavior.

Section 1: What Codex Is

Codex is OpenAI's coding agent. The most important thing to understand is that Codex is not just a single model name. It's a product and workflow layer designed to help people write, review, debug, and ship code faster. In OpenAI's own wording, it's an AI coding agent that can work with you locally or complete tasks in the cloud.

That distinction matters. Most people think of AI in one of two ways:

A chat model that answers questions.
A coding assistant that suggests snippets.

Codex is broader than both. It can inspect a repository, edit files, run commands, and execute tests. It can also handle larger chunks of work by taking a prompt or spec and turning it into a task plan, code changes, and reviewable output.

For teams, the cloud-based workflow is especially important because it lets Codex run in the background while engineers stay in flow.

OpenAI's current docs also place Codex alongside a wider set of developer tools: the API, the Responses API, the Agents SDK, MCP tools, and the Codex app. If you are onboarding a team, the easiest mental model is this:

The models are the engine.
Codex is the coding product that uses those engines.
The CLI, IDE extension, web app, and cloud tasks are the ways you interact with it.

Section 2: Where Codex Fits in the OpenAI Ecosystem

OpenAI now offers a layered stack:

General-purpose frontier models such as GPT-5.5, GPT-5.5 Pro, GPT-5.4, GPT-5.4-mini, and GPT-5.4-nano.
Codex-specific models such as GPT-5.3-Codex, GPT-5.2-Codex, GPT-5.1-Codex, and codex-mini-latest.
Product surfaces that package those models into workflows, such as Codex CLI, the Codex app, IDE extensions, cloud tasks, and code review.

The practical difference is simple:

If you need one-off reasoning, synthesis, or general chat, you may use a general model.
If you need an agent that should navigate a repository, change files, run tests, and push toward a concrete code outcome, Codex is the purpose-built surface.

OpenAI's current model docs describe GPT-5.4 as the flagship model for complex reasoning and coding. At the same time, Codex-specific model pages describe GPT-5.3-Codex and GPT-5.2-Codex as optimized for agentic coding tasks in Codex or similar environments. That tells you how OpenAI is positioning the stack:

GPT-5.4 is the general flagship.
Codex-specific models are tuned for coding workflows.
Codex the product can switch models depending on the surface and configuration.

If you remember nothing else from this section, remember this: Codex is the workflow. Models are the engine.

GPT-5.5: The Newest Release

OpenAI launched GPT-5.5 on April 23, 2026, with API availability following on April 24, 2026. A higher-tier GPT-5.5 Pro variant shipped alongside it. OpenAI describes GPT-5.5 as their "smartest and most intuitive to use model yet, and the next step toward a new way of getting work done on a computer."

For a Codex user, the practical upshot is short:

GPT-5.5 is the new general flagship. Anywhere older docs say "GPT-5.4 is the flagship," read GPT-5.5 going forward. GPT-5.4 remains available as a cheaper default.
Codex surfaces will switch over. Expect GPT-5.5 to become selectable (and often the default) inside the CLI, IDE, app, and cloud tasks shortly after launch. Verify the active model in your settings.
Pricing has shifted. GPT-5.5 sits well above GPT-5.4 on a per-token basis. See Section 7 before approving budgets.

The full benchmark breakdown, performance highlights, and per-workload guidance for picking GPT-5.5 vs GPT-5.4 vs Codex-specific models are in Section 11: Model Specs and Benchmarks. Read that section once you have the foundational chapters under your belt.

Section 3: The Core Surfaces

Codex currently shows up in a few places, and each one is optimized for a slightly different working style.

Codex CLI

The CLI is the fastest way to put Codex directly into a terminal session. The docs describe it as OpenAI's coding agent that runs locally from your terminal, can read, change, and run code on your machine, and is open source and written in Rust.

Use the CLI when you want:

A terminal-first workflow.
Fast iteration inside an existing repo.
Fine-grained control over approvals and execution.
A lightweight path for local coding tasks.

IDE Extension

The CLI docs and Help Center articles point to the IDE extension for VS Code, Cursor, Windsurf, and other VS Code forks. This is the natural fit when your team lives in an editor and wants Codex embedded in the normal coding flow.

Use the IDE extension when you want:

Codex close to the files you are already editing.
Prompting and editing without switching contexts.
A bridge between human-driven and agent-driven editing.

Codex App

OpenAI's Help Center says the Codex app is available on macOS and Windows. It is designed for parallel work across projects, with built-in worktree support, skills, automations, and git functionality.

Use the app when you want:

Multiple Codex agents running in parallel.
Cloud tasks without bouncing between terminal and editor.
A project-centric place to assign and monitor tasks.

Codex Cloud

Codex cloud is the background execution mode. It runs each task in an isolated sandbox with the repository and environment, and it is intended for reviewable code output rather than direct interactive sessions.

Use Codex cloud when you want:

Tasks to run while you do something else.
Sandboxed execution with reviewable diffs.
Automated code review or repository-level workflows.

Code Review

Codex can also review code inside GitHub. OpenAI describes this as a way to automatically review your personal pull requests or configure reviews at the team level.

Use code review when you want:

A second set of eyes on pull requests.
Automated regression or issue spotting before human review.
Lightweight review coverage across a team.

Section 4: Getting Started: Install, Set Up, and Your First Task

This section walks you end-to-end from "nothing installed" to "Codex just fixed a real bug for me."

We will use a tiny demo repository you build yourself in two minutes — a small Python price-calculator with one obvious bug and one missing test. That gives you a real, reproducible target you can throw away when you're done.

The same walkthrough works for the CLI, the IDE extension, and the app, with notes for each.

If you have existing code you would rather use, skip ahead to Step 4 and point Codex at your own repo. The demo is for readers who want a known-good starting point.

Step 0: Confirm Access

Codex is included with ChatGPT Plus, Pro, Business, and Enterprise/Edu plans. For a limited time, it is also included with Free and Go, with stricter rate limits.

If you are in a team or enterprise workspace, access may also depend on workspace settings and role-based controls. Do not assume that a ChatGPT subscription alone guarantees access in a managed environment — confirm with your admin or look in Codex Cloud settings at chatgpt.com/codex.

Step 1: Install Codex

You have three install paths. Pick one to start; you can add the others later.

Option A: The CLI (recommended for first task)

The CLI is the most direct way to see how Codex behaves. The official docs note that macOS and Linux are first-class, while Windows is experimental and you should use WSL2.

npm i -g @openai/codex
codex --version

If codex --version prints a version number, you are done.

Option B: The VS Code Extension

In VS Code (or Cursor / Windsurf), open the Extensions panel, search for "Codex" by openai, and install it. Or from a terminal:

code --install-extension openai.chatgpt

The Codex panel will appear in the right sidebar after install.

Option C: The Codex App

Download the Codex app for macOS or Windows from chatgpt.com/codex. The app shines when you want parallel tasks, built-in git worktrees, and a project-centric UI. For your very first task it is overkill — start with the CLI or extension.

VS Code users: For a step-by-step guide covering all three VS Code entry points (extension, CLI in the integrated terminal, and browser Codex), see Appendix E: Working with Codex in VS Code.

Step 2: Authenticate

Run codex in a terminal (or open the extension panel). You will be prompted to:

Sign in with ChatGPT — recommended. Usage is charged against your plan's included Codex credits.
Sign in with an API key — used when you want metered API billing or your workspace policy requires it.

If you are unsure, pick ChatGPT sign-in.

Step 3: Build the Demo Repo

This is the part most quick-starts skip. Instead of pointing Codex at "any repo," let's create a small, self-contained demo repo with a known bug so you can verify Codex actually fixes it.

In a terminal, run:

mkdir codex-demo && cd codex-demo
git init

Now create three files. First, pricing.py — a small pricing calculator with one off-by-one bug and one missing edge case:

# pricing.py
def apply_discount(price: float, discount_percent: float) -> float:
    """Apply a percentage discount to a price.

    BUG: The discount is applied as a multiplier of (discount_percent / 10)
    instead of (discount_percent / 100). A 20% discount currently doubles
    the price instead of reducing it.
    """
    if discount_percent < 0:
        raise ValueError("discount_percent must be >= 0")
    return price * (1 - discount_percent / 10)


def cart_total(items: list[dict], discount_percent: float = 0) -> float:
    """Compute the total for a list of cart items after a discount."""
    subtotal = sum(item["price"] * item["quantity"] for item in items)
    return apply_discount(subtotal, discount_percent)

Then test_pricing.py — a single passing test plus one that will fail because of the bug:

# test_pricing.py
from pricing import apply_discount, cart_total


def test_no_discount_returns_original_price():
    assert apply_discount(100.0, 0) == 100.0


def test_twenty_percent_discount_on_100_is_80():
    # This will FAIL until the bug in apply_discount is fixed.
    assert apply_discount(100.0, 20) == 80.0


def test_cart_total_with_discount():
    items = [
        {"price": 10.0, "quantity": 2},
        {"price": 5.0, "quantity": 1},
    ]
    # Subtotal is 25.0. With 10% off, expected total is 22.5.
    assert cart_total(items, discount_percent=10) == 22.5

And a tiny README.md:

# codex-demo

A tiny pricing module used to learn the Codex workflow.

Run tests with: `python -m pytest`

Commit the starting state so Codex's diffs are easy to review:

git add .
git commit -m "Initial demo: pricing module with a known bug"

Confirm the bug is real before you ask Codex to fix it:

python -m pytest

You should see two failing tests (test_twenty_percent_discount_on_100_is_80 and test_cart_total_with_discount).

If pytest is not installed: pip install pytest. The full demo needs only Python 3.10+ and pytest.

Step 4: Launch Codex and Run Your First Task

Now point Codex at the demo repo.

From the CLI:

cd codex-demo
codex

When Codex starts, give it a clear, bounded task. Type this prompt exactly:

The test suite has two failing tests. Read pricing.py and test_pricing.py,
identify the root cause, fix the smallest possible thing, then run the tests
to confirm they pass. Explain what you changed and why.

Codex will:

Inspect pricing.py and test_pricing.py.
Recognize the off-by-one bug (/ 10 should be / 100).
Propose a one-line diff.
Ask for approval before modifying the file (in the default approval mode).
After you approve, run python -m pytest and report that all three tests now pass.

From the VS Code extension: Open the codex-demo folder in VS Code, open the Codex panel in the right sidebar, and paste the same prompt. The diff will appear inline in the editor for you to review and accept.

Step 5: Review the Diff

This is the most important habit to build early. Even though the fix is one character (10 → 100), look at the diff before accepting:

git diff

Read the change. Confirm it matches what Codex described. Run the tests yourself:

python -m pytest

All three should pass. Commit the fix:

git commit -am "Fix off-by-one in apply_discount"

You have just completed the full Codex loop: context → task → change → review → verify. Every bigger task is a longer version of this loop.

Step 6: Try Two More Bounded Tasks

Now that the loop works, try these against the same demo repo:

Add an edge case test. Prompt: "Add a test that verifies apply_discount raises a ValueError when discount_percent is negative. Run the tests after."
Add a missing safety check. Prompt: "apply_discount does not currently reject discount_percent values greater than 100, which would produce a negative price. Add validation, update the existing tests if needed, and add a new test for the new behavior."

Each task is small, has a clear acceptance criterion (the tests pass), and produces a reviewable diff. That is the shape of every good Codex task.

Step 7 (Optional): Set Up Codex Cloud

Cloud tasks let Codex run in the background while you do other work. They require a GitHub-hosted repository.

To enable Codex Cloud against the demo repo:

Push codex-demo to a private GitHub repo: gh repo create codex-demo --private --source=. --push (requires the gh CLI).
Visit chatgpt.com/codex and connect the ChatGPT GitHub Connector.
Allow the codex-demo repository in the connector. Do not grant org-wide access by default — see Appendix C.
From the web interface, pick the repo and prompt: "Add type hints to every function in pricing.py and add a CI-style summary of what changed."
Wait for the sandbox to finish, review the diff in the browser, and either accept it or open a PR.

By default, Codex Cloud sandboxes have no internet access. That is deliberate — admins can allowlist dependency registries and trusted sites if a real workflow needs them.

When to Use Which Surface

After completing the demo, the surface trade-offs become concrete:

CLI — fastest for terminal-heavy local work, scriptable, best for multi-step agentic tasks with explicit approvals.
VS Code extension — lowest friction for in-flow editing while you are already in the editor.
Codex app — best when you want to run multiple parallel tasks across projects with worktree isolation.
Codex Cloud — best for background work, long-running tasks, and PR-style review you can leave running.

Most experienced users have all of them installed and pick per task. A single workflow rarely fits every kind of work.

What If Something Doesn't Work?

If you get stuck during this walkthrough:

codex command not found → npm's global bin is not on your PATH. Restart your terminal, or use a Node version manager like nvm.
Sign-in keeps failing → confirm the email matches your ChatGPT plan; in enterprise workspaces, your admin must enable Codex.
Codex won't modify the file → you may be in a strict approval mode. Approve when prompted, or relax the mode after your first successful task.
Windows misbehavior → switch to a WSL2 terminal. Native Windows for the CLI is experimental.

The full troubleshooting guide is in Section 12.

Section 5: How to Use Codex Effectively

Codex works best when you treat it like a developer you're onboarding rather than a magic prompt responder. The more concrete your task, the better the result.

Each tip below has a bad example (what people actually type) and a good example (what produces a useful result). Most use the codex-demo repo from Section 4 so you can run them yourself.

Give It a Real Objective

A "real objective" means a concrete goal with a verifiable outcome — not a feeling.

Bad:

Improve this codebase.

Codex will pick something to do, but you have no way to know if the result is what you wanted, and the diff will probably touch more than you can review.

Good:

Refactor cart_total in pricing.py so the iteration logic and the discount
application are in two separate helper functions. Keep the public signature
of cart_total unchanged. Add tests for each helper. Run pytest at the end.

This works because there is exactly one acceptance criterion (tests pass with the new structure) and exactly one boundary (public signature unchanged). You can review the diff in 30 seconds.

Other shapes that work:

"Fix the failing test in test_pricing.py::test_twenty_percent_discount_on_100_is_80."
"Add a currency: str = 'USD' parameter to cart_total and update the tests."
"Review the changes in my last commit for missing edge cases."

Provide the Right Context

Codex can inspect the repo, but you still need to steer it to the right files and constraints. Without that, it wanders.

Bad:

Add validation to the pricing module.

What kind of validation? On which inputs? What error class? Codex has to guess all of that.

Good:

Context:
- File: pricing.py
- Function: apply_discount
- Current behavior: raises ValueError for negative discount_percent.
- Desired behavior: also raise ValueError when discount_percent > 100,
  with the message "discount_percent must be between 0 and 100".

Task:
- Add the validation.
- Add a matching test in test_pricing.py.
- Do not change apply_discount's public signature.
- Run pytest after.

Notice the structure: what file, current behavior, desired behavior, task, constraints, how to verify. That is the difference between a hopeful prompt and a usable spec.

For larger tasks, also include:

A link to the issue or spec (Codex can fetch it if web access is enabled).
The names of related files even if Codex could find them itself — naming them halves the time-to-first-edit.
The name of any test command, build command, or lint that should pass.

Ask for Intermediate Thinking When Needed

"Intermediate thinking" means asking Codex to plan in writing before it edits files. The default is for Codex to dive straight to code. For anything larger than a single function, that is the wrong default.

Without intermediate thinking (the alternative):

Refactor pricing.py to support multiple currencies.

Codex starts editing immediately. You discover after the fact that it changed the database schema, the API contract, and three test files — and you have no idea whether the design choice it made was the right one.

With intermediate thinking:

I want to add multi-currency support to pricing.py.

Before editing anything:
1. List the files you expect to touch and why.
2. Outline the approach in 5-10 bullets.
3. Call out any assumptions you are making and any open questions.
4. Identify the riskiest part of the change.

Wait for my approval before making any edits.

Now you get a plan you can review, push back on, or scrap entirely — at zero cost to the codebase. After you approve, Codex executes against the plan it just wrote, which makes the resulting diff predictable.

Use intermediate thinking whenever the task is:

Multi-file or cross-cutting.
Architecturally novel for this codebase.
Hard to test (so the diff is your only signal).
High blast-radius if wrong (auth, payments, data migrations).

Prefer Bounded Changes

A bounded change is one with all four of these properties:

Small surface area — touches one file, one module, or one logical concept.
Clear acceptance criterion — there's a specific test, output, or behavior that proves it worked.
Reviewable in a few minutes — a human can read the diff and form an opinion without setting aside an hour.
Easily revertible — if it goes wrong, git revert undoes it cleanly without breaking anything else.

The opposite is an unbounded change: "make the codebase faster," "modernize the API," "add types everywhere." These have no clear endpoint, no easy verification, and no clean revert path.

Bounded examples (good):

"Add a serialize() method to CartItem that returns a dict suitable for JSON encoding. Add a test."
"In apply_discount, replace the magic number 100 with a module-level constant MAX_DISCOUNT_PERCENT."
"The cart_total function takes a discount_percent keyword argument that defaults to 0. Make the default None and treat None as 'no discount.' Update the tests."

Unbounded examples (avoid):

"Make pricing.py production-ready."
"Add proper error handling everywhere."
"Improve the architecture."

When you catch yourself writing an unbounded prompt, break it into a list of bounded ones before sending. The decomposition itself is most of the work; once you have it, Codex is good at executing each piece.

Use Reviews as a Loop

Codex is not just for writing code — it is also a useful pre-merge reviewer. The loop is:

You (or Codex) write the change.
Ask Codex to review it.
Fix the issues it finds.
Re-run tests.

What this looks like in practice:

After completing a task in codex-demo, ask Codex to review your own commit:

Review the change in my last commit (git show HEAD) for:
- correctness issues (off-by-one, type mismatches, wrong defaults)
- missing tests, especially edge cases
- security concerns (input validation, injection, unsafe defaults)
- maintainability risks (unclear naming, hidden coupling)

Prioritize findings by severity (critical / important / nit). For each
finding, point to the exact line and propose a concrete fix. Do not
modify any files in this turn — just produce the review.

You will typically get back a structured response like:

CRITICAL: line 14 — apply_discount accepts NaN silently because the type
  check is `discount_percent < 0`, which is False for NaN. Fix: add an
  explicit math.isnan() check before the comparison.

IMPORTANT: test_pricing.py has no test for the boundary discount_percent=100.
  Fix: add a test asserting apply_discount(100, 100) == 0.

NIT: line 8 — the docstring mentions a "BUG" comment that should be removed
  now that the bug is fixed.

Then you triage: fix the critical and important findings (often by feeding them back to Codex with "apply the fixes you proposed"), defer or reject the nits, and re-run tests.

This converts Codex from a code generator into a quality gate, which is usually the higher-leverage use. A team that uses Codex only as a generator gets faster code; a team that also uses it as a reviewer gets better code.

Section 6: Difference Between Codex and Other Coding Tools

This is the section that usually matters most to new users, because the category boundaries are easy to blur.

Codex Is A Product Layer, Not Just A Model

Codex is the product experience and workflow layer. Models are the underlying engines. Put differently:

A general model answers questions or writes text.
A coding model is tuned more narrowly for software tasks.
Codex packages the model inside an agentic coding workflow with files, commands, approvals, sandboxes, and reviews.

That matters because users often compare Codex to "another model" when the real comparison is "another coding system."

Codex vs OpenAI General Models

OpenAI's current models page recommends GPT-5.4 as the flagship model for complex reasoning and coding. That is the general model-side recommendation.

Codex-specific pages, on the other hand, describe models like GPT-5.3-Codex and GPT-5.2-Codex as optimized for agentic coding tasks in Codex or similar environments.

The practical takeaway:

Use GPT-5.4 when you want a top-tier general model.
Use Codex-specific models when you want a model optimized for coding workflows inside Codex.
Use the Codex surface when you want file edits, shell commands, reviews, and sandboxes, not just text output.

Codex vs Claude Code

Claude Code is also a terminal-based agentic coding tool. Anthropic's docs describe it as a terminal tool that can make plans, edit files, run commands, create commits, and work with MCP-connected data sources. It is strong if your team already prefers a terminal-first workflow and wants a tightly scriptable developer tool.

Codex differs in a few practical ways:

Codex spans more surfaces, including CLI, IDE extension, app, cloud tasks, and code review.
Codex cloud is built around GitHub-connected task execution and review.
Codex is more explicitly positioned as a family of coding workflows, not just a single terminal agent.

The practical takeaway:

Choose Claude Code if you want a terminal-native workflow with strong composability and you are happy living mostly in the shell.
Choose Codex if you want a broader product layer with local, cloud, and app-based workflows that can be shared across a team.

Codex vs GitHub Copilot Coding Agent

GitHub Copilot coding agent is designed around GitHub's own workflow. GitHub docs describe it as an agent you can assign issues or pull requests to, and it works in the background to create or modify PRs. It lives very naturally inside GitHub-hosted development flows.

Codex is different in emphasis:

Copilot coding agent is highly GitHub-centric.
Codex is broader across terminal, IDE, app, and cloud.
Copilot is a strong fit if your team already uses GitHub as the center of gravity for task assignment and review.
Codex is a stronger fit if you want a more general coding agent surface that can work across local and cloud workflows.

The practical takeaway:

Choose Copilot coding agent if your process is already deeply anchored in GitHub issues and pull requests.
Choose Codex if you want a wider agent workflow that can run locally, in the IDE, or in Codex cloud.

Codex vs Open-Weight and Self-Hosted Models

Open-weight or self-hosted models serve a different need. Teams usually reach for them when they want:

Full infrastructure control.
Custom hosting or air-gapped deployment.
More direct control over retention and data boundaries.
A lower-cost path at high scale if they already own the hardware and ops stack.

The tradeoff is that self-hosted models usually do not give you the same out-of-the-box agentic product experience that Codex does. You have to assemble the orchestration, repo access, sandboxing, approvals, and review loop yourself.

That means the real choice is not "Which model is smartest?" It is "How much engineering do I want to spend on the workflow around the model?"

The practical takeaway:

Choose open-weight or self-hosted models when infrastructure control is the main requirement and you are willing to build the surrounding agent system.
Choose Codex when you want the workflow already packaged, especially for day-to-day engineering teams.

Codex vs General Chat Models

General chat models are best when the task is:

A question and answer exchange.
Conceptual reasoning.
Drafting prose.
Summarizing or rewriting text.

Codex is better when the task is:

Reading and modifying a repository.
Running tests.
Fixing code.
Reviewing pull requests.
Coordinating multi-step implementation work.

Codex vs API Usage of the Same Models

The same model family can behave differently depending on the surface.

In the API, you may call a model directly and design your own orchestration.
In Codex, the same or similar model may be wrapped in repo access, approval flows, and task execution.

That is why some model pages mention that a model is optimized for "Codex or similar environments." The model is tuned for agentic software work, but the workflow surface still matters.

Comparison Matrix

The prose comparisons above collapse into a single matrix for fast reference:

Dimension	Codex	Claude Code	GitHub Copilot Coding Agent	Self-hosted / Open-weight
Primary surface	CLI, IDE, app, cloud	CLI (terminal-first)	GitHub web/PR/issues	Whatever you build
Background execution	Yes (Codex Cloud sandboxes)	Limited; runs locally	Yes (GitHub Actions runners)	DIY
Repository integration	GitHub via connector; local repos directly	Local; MCP-connected sources	Native GitHub	DIY
Model choice	OpenAI models, switchable per surface	Anthropic Claude models	GitHub-managed (mix of vendors)	Any model you can host
Approval and sandbox controls	Yes, per-surface	Yes, per-tool	GitHub permission model	DIY
Parallel agents	Yes (app + cloud)	Limited	Yes (per-PR)	DIY
Best fit	Cross-surface team workflows	Terminal-native power users	Teams already living in GitHub	Air-gapped, custom infra, or cost-sensitive at scale
Main tradeoff	OpenAI ecosystem lock-in; price tier	Less product surface area	Heavily GitHub-coupled	Significant engineering effort

Use the matrix to pick the dominant tool, then layer the others where they fit. Many teams legitimately run two of these in parallel — for example, Codex for cross-surface work and Claude Code for power-user terminal workflows.

Which Tool Should A New User Choose?

As a rule of thumb:

For terminal-first coding and scripting, Claude Code is a strong alternative.
For GitHub-native issue and PR automation, GitHub Copilot coding agent fits naturally.
For local plus cloud plus app-based team workflows, Codex is the most flexible option.
For maximum infrastructure control, self-hosted or open-weight stacks make sense.

OpenAI's docs currently list GPT-5.5 as the general flagship, with GPT-5.4, GPT-5.4-mini, and GPT-5.4-nano remaining available below it, while Codex docs and model pages expose Codex-specific variants and model switching inside the CLI.

Section 7: Pricing and Plan Access

Pricing is the part of Codex most likely to change, so this section should be treated as a snapshot of the current official docs.

Plan Access

OpenAI's current Help Center says Codex is included with:

ChatGPT Plus
ChatGPT Pro
ChatGPT Business
ChatGPT Enterprise/Edu

For a limited time, it is also included with Free and Go, though those plans are temporary exceptions and subject to rate limits.

Flexible Pricing and Credits

The current rate card says Codex pricing changed on April 2, 2026 to align with API token usage instead of purely per-message pricing. The same article explains that:

New and existing Plus and Pro customers use the token-based rate card.
New and existing Business customers use the token-based rate card.
New Enterprise customers use the token-based rate card.
Existing Enterprise/Edu and several other legacy plan categories remain on the legacy rate card until migration.

This is important because two teams in the same company can be on different pricing logic depending on workspace status and plan vintage.

Current Model Pricing Snapshot

The current model pages list pricing per 1M tokens in USD. The exact numbers depend on the model you choose:

GPT-5.5: $5 input, $30 output. New flagship as of April 23, 2026.
GPT-5.5 Pro: $30 input, $180 output. Higher-tier variant for the most demanding agentic and reasoning workloads.
GPT-5.4: $2.50 input, $15 output.
GPT-5.4-mini: $0.75 input, $4.50 output.
GPT-5.4-nano: $0.20 input, $1.25 output.
GPT-5-Codex: $1.25 input, $10 output.
GPT-5.2-Codex: $1.75 input, $14 output.
GPT-5.1-Codex-mini: $0.25 input, $2 output.
codex-mini-latest: $1.50 input, $6 output.

These model pages also note context windows, output limits, and whether the model is intended for Codex-specific or general API use. For budget planning, remember that longer outputs can cost much more than the input prompt, so task framing matters as much as model choice.

Note that GPT-5.5 is roughly 2x the input price and 2x the output price of GPT-5.4, and GPT-5.5 Pro is an order of magnitude above that. OpenAI's framing is that GPT-5.5 is also more token-efficient than GPT-5.4, which can offset some of the headline price difference, but you should measure this on your own workloads before assuming it nets out. For the Codex-specific models, expect the lineup to shift as Codex variants based on GPT-5.5 ship; until then, the Codex-specific models above remain the right choice for purely coding-shaped tasks.

What This Means in Practice

The real cost depends on:

Input size.
Cached input.
Output length.
Whether the task uses fast mode.
Which model you select.

So if you are planning a team rollout, do not estimate usage from "number of prompts" alone. Estimate based on expected token consumption and task type.

Legacy Pricing

The legacy rate card still matters for users and workspaces that have not been migrated. The big lesson is that pricing is now tied more closely to model usage than to a simple fixed message count. Anyone budgeting Codex should read the current rate card before setting internal chargeback rules or usage policies.

Worked Cost Example

Pricing tables are easy to misread. A worked example makes the model selection question concrete.

Scenario: A 30-engineer team uses Codex Cloud for automated pull request review. Each engineer opens roughly 4 PRs per week. Each PR review pulls in approximately 30,000 input tokens (the diff plus relevant context files) and produces approximately 3,000 output tokens (the review comments and risk summary).

Weekly token volume:

Reviews per week: 30 engineers × 4 PRs = 120 reviews
Input tokens per week: 120 × 30,000 = 3.6M input tokens
Output tokens per week: 120 × 3,000 = 360K output tokens

Cost per week by model:

Model	Input cost	Output cost	Weekly total	Annualized (52 wk)
GPT-5.5 ($5 / $30)	3.6M × $5/1M = $18.00	0.36M × $30/1M = $10.80	$28.80	$1,498
GPT-5.5 Pro ($30 / $180)	$108.00	$64.80	$172.80	$8,986
GPT-5.4 ($2.50 / $15)	$9.00	$5.40	$14.40	$749
GPT-5-Codex ($1.25 / $10)	$4.50	$3.60	$8.10	$421
GPT-5.1-Codex-mini ($0.25 / $2)	$0.90	$0.72	$1.62	$84

Reading the table: The headline GPT-5.5 sticker shock disappears at this volume — under $1,500/year for 30 engineers' worth of automated review is a rounding error against engineering payroll. GPT-5.5 Pro is 6× more expensive and generally not justified for routine review; reserve it for the small share of reviews where you need its extra capability. The Codex-specific models are dramatically cheaper and are the right default if your reviews are mostly mechanical (style, obvious bugs, missing tests).

What this example does not capture:

Cached input. OpenAI prices repeated input tokens lower; if your review pulls the same context files repeatedly, real costs are lower than shown.
Long-task overhead. Agentic workflows that re-read files or iterate burn many more tokens than a single-shot review. A coding task can easily be 5–10× the tokens of a review.
Failure retries. A failed task that gets re-run costs roughly the same as the original. Agent flakiness is a real budget line item.
Mixed-model strategies. Most mature teams route cheap tasks (test stubs, doc updates) to a Codex-mini model and reserve GPT-5.5 for repository-wide refactors and PRs that need long-context reasoning.

The practical pattern: build the cost model around your actual highest-volume workload (usually PR review or test generation), then size the GPT-5.5 budget separately for the smaller set of tasks that actually benefit from the new capabilities.

Section 8: Security, Permissions, and Enterprise Setup

Teams care about Codex not just as a productivity tool, but as a controlled software-development system. OpenAI's docs reflect that reality.

Local vs Cloud Access

Enterprise admins can separately enable:

Codex Local
Codex Cloud
Both

Codex Local covers the app, CLI, and IDE extension. Codex Cloud covers hosted tasks, code review, and related integrations.

That separation is useful because some organizations want local tooling enabled broadly while keeping cloud tasks restricted to fewer users.

Workspace Controls

The admin docs say workspace owners can use RBAC to manage access. They can:

Set a default role.
Create custom roles.
Assign roles to groups.
Sync groups with SCIM.
Manage permissions centrally.

This is the right place to build a rollout with least privilege rather than giving every developer broad Codex access by default.

GitHub Connector and Repository Access

Codex Cloud requires GitHub-hosted repositories. Admins connect the ChatGPT GitHub Connector, choose an installation target, and allow specific repositories. Codex uses short-lived, least-privilege GitHub App tokens and respects repository permissions and branch protection rules.

For security teams, that matters because it keeps Codex aligned with the repo access model you already use.

Internet Access

By default, Codex cloud agents do not have internet access at runtime. That is deliberate. If your task truly needs access to dependency registries or trusted sites, admins can configure allowlists and HTTP method limits.

Recommended Governance Pattern

The enterprise docs recommend using separate groups for users and admins:

A smaller Codex Admin group for people who manage policy and governance.
A broader Codex Users group for developers who just need to use the tool.

That keeps policy management tight and avoids accidental over-permissioning.

Section 9: Best Practices for Teams

If you are onboarding a team, you will get much better outcomes if you set expectations up front.

Start With Simple, Valuable Tasks

Good first-team use cases:

Pull request review.
Small bug fixes.
Test generation.
Documentation updates.
Codebase navigation and understanding.

These are easy to compare against human work and easy to judge for quality.

Standardize Task Prompts

Give people a shared prompt template. For example:

Task: Fix the failing test in X.
Context: The regression started after Y.
Constraints: Do not change public API behavior.
Output: Explain root cause, apply fix, run tests, summarize risks.

This makes results easier to review and reduces the "prompt quality lottery" that often hurts team adoption.

Use a Review Culture

Codex should not replace code review discipline. Treat it as:

A first-pass implementer.
A pre-review reviewer.
A way to reduce repetitive work.

The human team should still own architecture, product tradeoffs, and final sign-off.

Measure What Matters

The metrics that matter are the ones that tell you whether Codex is producing reviewable, mergeable, trustworthy work — not the ones that count activity. Below is each metric, how to actually compute it from data you already have, and the rule of thumb for what "healthy" looks like.

1. Time to First Useful Diff

Definition: From the moment a Codex task is started, how long until it produces a diff that a human would actually consider applying (after possible small tweaks).

How to measure:

For CLI/IDE tasks, log the wall-clock time from prompt submission to first diff. The Codex CLI emits structured logs you can parse; a simple wrapper script suffices:
```
start=$(date +%s); codex ""; echo "elapsed: $(( $(date +%s) - start ))s"
```
For Codex Cloud tasks, use the task duration shown in the chatgpt.com/codex dashboard, or pull it from the workspace usage export.
Tag each task as "useful" or "discarded" in a shared spreadsheet for the first month. After that, you can sample.

Healthy: under 2 minutes for bounded tasks; under 10 minutes for multi-file refactors. If the median is much higher, your prompts probably lack context (see Section 5).

2. Test Pass Rate on Codex-Generated Changes

Definition: Of the diffs Codex produces, what percentage pass the existing test suite on the first try.

How to measure:

In CI, tag PRs that originated from Codex (a label like codex-authored or a commit-message prefix works). Then run a simple weekly query:

SELECT
  COUNT(*) FILTER (WHERE first_ci_run = 'pass') * 100.0 / COUNT(*) AS first_try_pass_rate
FROM pull_requests
WHERE labels @> '{"codex-authored"}'
  AND created_at > NOW() - INTERVAL '7 days';

For local CLI usage, instrument with a wrapper that runs your test command immediately after Codex finishes and records the exit code.

Healthy: above 75% for bounded tasks. Below 50% means Codex is making changes without verifying them — usually fixable by adding "run the tests after" to your prompt template (see Section 9 → Standardize Task Prompts).

3. Review Findings Caught by Codex

Definition: When Codex is used as a pre-merge reviewer, how many issues does it surface that a human reviewer or CI would have caught anyway, vs. issues only Codex caught, vs. false positives.

How to measure:

Have human reviewers annotate Codex's review comments with one of three tags: agree-found-it, agree-missed-it, disagree-noise.
Track the ratios over time:
- Useful-finding rate = (agree-found-it + agree-missed-it) / total Codex comments.
- Unique-value rate = agree-missed-it / total Codex comments.
A simple GitHub Actions step that posts the Codex review and asks the human reviewer to react with emoji (✅ / ⚠️ / ❌) makes this nearly free to collect.

Healthy: useful-finding rate above 70%; unique-value rate above 20%. Unique-value rate is the number that justifies keeping the workflow on — if it is near zero, Codex is duplicating CI and you can disable it without losing anything.

4. Tasks Completed Without Human Rewrite

Definition: Of all merged Codex-authored changes, what fraction shipped substantially as Codex wrote them (vs. being heavily rewritten by a human before merge).

How to measure:

Compare the diff Codex initially produced to the diff that actually merged. The simplest proxy:
```
# in the Codex-authored branch:
git diff codex/initial-commit HEAD --shortstat
```
If the post-Codex diff changes more than ~30% of the lines Codex originally wrote, count the task as "rewritten."
Track this monthly. The trend line matters more than the absolute number.

Healthy: above 60% shipped without major rewrite. Lower than that, and either prompts are under-specified or Codex is being pushed into work it is bad at — re-read Section 14.

5. Developer Satisfaction

Definition: Whether the people actually using the tool think it makes them faster and want to keep using it. Hard numbers do not capture this.

How to measure:

Run a 5-question pulse survey monthly. Keep it short. Suggested questions, all on a 1–5 scale:
1. "Codex saved me time this week."
2. "I trust Codex's diffs enough to review them confidently."
3. "Codex's review comments are usually worth reading."
4. "I would be unhappy if Codex were taken away."
5. "What is the single biggest friction point?" (free text)
Track the trend in question 4 specifically. That is the closest equivalent to a product-market-fit signal for an internal tool.

Healthy: average score above 3.5/5 on questions 1–4 by month 3 of rollout. If question 4 trends down, the rollout is failing regardless of what the other metrics say.

What NOT to Measure

These look useful but mislead:

Number of prompts sent. Counts activity, not value. A team sending 10× more prompts may be 10× more productive — or 10× more confused.
Tokens consumed. Useful for budget, useless for impact. Heavy users are not necessarily good users.
Lines of code generated. Same problem as LOC has always had: you reward verbosity.
PRs opened by Codex. A Codex-opened PR that nobody merges is a negative outcome dressed up as a positive one.

Use the cost data (Section 7) to manage budget. Use the metrics above to manage adoption.

Use the Right Surface for the Job

CLI for terminal-heavy local work.
IDE extension for day-to-day coding.
App for parallel project work.
Cloud for background tasks and review.

That is usually the difference between "this is useful" and "this is annoying."

Section 10: Common Workflows and Examples

Here are the workflows most teams will actually use. Each one includes a worked example against the codex-demo repo from Section 4 so you can see the full prompt, the kind of output Codex produces, and what to do with it.

Workflow 1: Fix a Bug Locally

Use when: A test is failing, a behavior is wrong, and the cause is contained to one file or function.

Steps:

Open the repo in your terminal or IDE.
Ask Codex to inspect the failing path.
Request a fix and a test.
Review the diff.
Run the test suite.

Worked example:

In the codex-demo repo, suppose a teammate just reported: "apply_discount is silently returning a negative price when discount_percent is greater than 100." Verify the bug first:

python -c "from pricing import apply_discount; print(apply_discount(100, 150))"
# prints: -50.0    <-- silent negative price, no error raised

Now launch Codex and run:

Bug: apply_discount(100, 150) returns -50.0 instead of raising an error.
Expected: discount_percent values above 100 should raise ValueError with
the message "discount_percent must be between 0 and 100".

Task:
- Add the validation in pricing.py.
- Add a test in test_pricing.py that asserts ValueError is raised for
  discount_percent=150.
- Keep the existing tests passing.
- Run pytest at the end and report the result.

What you get back: a diff that adds if discount_percent > 100: raise ValueError(...) in apply_discount, a new test_invalid_discount_percent_above_100 test, and the pytest output showing all four tests passing. Review with git diff, run python -m pytest yourself to confirm, then git commit -am "Reject discount_percent > 100".

This works best when the bug is bounded and reproducible. If you cannot reproduce it from the command line, Codex usually cannot either.

Workflow 2: Review a Pull Request

Use when: You (or a teammate) just made a change and want a fast pre-merge sanity check before opening it for human review.

Steps:

Point Codex at the PR or changed files.
Ask for correctness issues, missing tests, and security risks.
Compare the findings against human review.
Use Codex as a pre-filter before the broader team reviews.

Worked example:

After completing Workflow 1 above, ask Codex to review your own change before opening a PR:

Review the change in my last commit (HEAD) — it added validation to
apply_discount in pricing.py.

Look for:
- correctness issues (off-by-one on the boundary, wrong error type, etc.)
- missing tests (boundary cases like exactly 100, exactly 0, NaN, negative zero)
- security or robustness issues
- API consistency with the existing apply_discount validation style

Prioritize findings as CRITICAL / IMPORTANT / NIT and propose a concrete
fix for each. Do not modify any files in this turn.

What you might get back:

IMPORTANT: line 14 — the new validation rejects discount_percent > 100 but
  silently allows discount_percent == 100, which makes the price 0. That is
  technically valid but worth a test to lock the boundary. Add:
    test_apply_discount_at_boundary_100_returns_zero

NIT: the new error message says "between 0 and 100" but the existing check
  for negative values says "must be >= 0". Consider unifying the messages
  for consistency.

You apply the IMPORTANT fix (often by following up with: "apply the IMPORTANT fix from your review"), defer or accept the nit, and re-run tests.

This is one of the highest-leverage team workflows because it catches obvious problems before a human spends review time on them. See Section 9 → Measure What Matters → Review Findings Caught by Codex for how to track its actual value over time.

Workflow 3: Understand a Large Codebase

Use when: You are new to a repo (or returning after months away) and need a map before you can safely make changes.

Steps:

Ask Codex to trace a request flow.
Ask for the key modules and entry points.
Request a map of the code path before editing anything.

Worked example:

The codex-demo repo is too small to need this, so imagine a more realistic case: a teammate's repo with app/, services/, models/, api/, and 80 files you have never seen. Open the repo in Codex and run:

I am new to this codebase. Without modifying anything, give me an
orientation:

1. What is the entry point for the HTTP API?
2. Trace what happens when a POST hits /users — list every file the
   request touches in order, with a one-line description of each.
3. Where is database access centralized? Is there a repository pattern?
4. What test command should I run to verify any change I make?
5. What are the three files I should read first to understand the
   project's conventions?

Output as a structured markdown report.

What you get back: a markdown report you can paste into your notes. Read the recommended files, then start working with Codex on actual changes. The 10 minutes spent on this orientation typically saves an hour of confused refactoring later.

This workflow is particularly useful for new hires. A senior engineer can also use it the first time they touch an unfamiliar service to avoid breaking conventions they cannot see.

Workflow 4: Generate a Feature in Parallel

Use when: A feature naturally splits into independent pieces (API + tests + docs, or UI + backend + migration) that do not block each other.

Steps:

Break the work into subtasks.
Run separate Codex tasks for UI, API, tests, or docs.
Merge the outputs after review.

Worked example:

Add a new "loyalty discount" capability to codex-demo. The work splits into three pieces that do not depend on each other:

Subtask	Surface	Prompt
A. Implementation	CLI in terminal 1	"Add a `loyalty_discount(price, customer_tier)` function to `pricing.py`. Tiers are 'bronze' (0%), 'silver' (5%), 'gold' (10%). Reject unknown tiers with ValueError. Do not change any other function."
B. Tests	Codex Cloud	"Generate exhaustive tests in `test_pricing.py` for a function `loyalty_discount(price, customer_tier)` with tiers bronze/silver/gold. Cover: each tier, unknown tier, negative price, zero price, decimal prices. Do not modify pricing.py — assume the function will exist."
C. Docs	VS Code extension	"Add a section to README.md documenting the new loyalty_discount function: signature, tier table, and one usage example."

Each runs in parallel. When all three finish, merge the diffs (typically the implementation goes first, then tests verify against it, then docs reference what shipped). Review each independently.

The Codex app and cloud surfaces are especially good for this because they let you launch and monitor multiple tasks without juggling terminal windows. The CLI also supports parallel work, but it benefits from git worktree so each run operates on its own branch checkout.

Workflow 5: Use Subagents for Decomposition

Use when: A single task is too large for one Codex run but can be naturally split into investigate / plan / implement phases.

The CLI explicitly supports subagents — one Codex task that spawns child tasks, each with a narrower scope and its own context window.

Worked example:

A bug report says: "Cart totals are sometimes off by a penny for European currencies." You do not yet know if this is a rounding bug, a currency-conversion bug, or a data bug. Run a parent task that decomposes:

A bug report says cart totals are occasionally off by a penny for
European currencies.

Decompose this into three subagent tasks:

1. INVESTIGATE: Read pricing.py and any currency-related code. Identify
   every place where floating-point arithmetic touches a money value.
   Report findings without proposing fixes.

2. REPRODUCE: Write a failing test in test_pricing.py that demonstrates
   a one-cent discrepancy with EUR amounts. Use the smallest possible
   reproduction.

3. PROPOSE: Based on (1) and (2), propose two possible fixes (e.g.,
   switching to Decimal vs. rounding at the boundary) with the trade-offs
   of each. Do not implement either yet.

Wait for me to pick a fix before writing any production code.

Why subagents help: each child task has a clean context, so the investigation findings do not pollute the test-writing context, and the proposal task gets a clean view of both. You also get a natural human checkpoint between investigation and implementation.

That division is often faster than one giant all-purpose run, and dramatically more reviewable.

Prompt Cookbook

New users often ask for examples because they know what they want outcome-wise but not how to phrase it. These templates are a good starting point.

Bug Fix Template

Inspect the failing behavior in [file or module].
Identify the root cause.
Patch the smallest safe fix.
Add or update tests.
Summarize what changed and any edge cases I should watch.

Use this when the bug is narrow and you want a disciplined fix, not a redesign.

Refactor Template

Refactor [module] to improve readability and maintain the current behavior.
Keep external APIs stable.
Explain the refactor plan before editing.
Make the smallest set of changes that achieves the goal.

Use this when the code works but is hard to maintain.

Review Template

Review this change for correctness, missing tests, security issues, and maintainability risks.
Prioritize findings by severity.
Call out any behavior changes or ambiguous logic.

Use this when you want Codex to act like a pre-merge reviewer.

Feature Template

Implement [feature] in [file or subsystem].
List the files you expect to touch before changing anything.
Add tests.
Keep the implementation aligned with the current architecture.

Use this when the task spans multiple files and you want visibility into the plan.

Signs You Are Using Codex Well

You usually know the workflow is healthy when:

Codex makes small, reviewable diffs instead of broad rewrites.
The model asks for clarification only when the missing detail matters.
Test coverage improves along with functionality.
New developers can use the tool without needing a custom training session.
The time from prompt to merged change is lower, but review quality does not drop.

You usually know the workflow is unhealthy when:

Prompts are vague and every result needs heavy rework.
The team treats the first output as final.
Nobody is checking diffs or running tests.
Users keep asking for "make it better" instead of defining a clear target.

Those signals matter more than raw usage counts.

Section 11: Model Specs and Benchmarks (GPT-5.5 Deep Dive)

Section 2 introduced GPT-5.5 as the new general flagship and gave the three-bullet practical takeaway. This section is the deep dive: the published benchmark numbers, what each one actually measures, why it matters for Codex workloads specifically, and how to use those numbers to pick the right model per task.

If you are setting budgets or choosing default models for a team, read this section in full. If you just want to use Codex, you can skim it.

Why Benchmarks Matter for Model Selection

Codex lets you pick the model behind each surface. Picking well is mostly about matching the model's strengths to the task shape:

A bounded local edit (one file, one function) does not benefit much from a frontier model. Codex-specific or Codex-mini variants are usually the right call.
A repository-wide refactor that needs the model to keep many files in working memory benefits enormously from long-context performance.
An agentic cloud task that runs unattended for ten minutes benefits from low hallucination rates and strong tool-use behavior.
A PR review benefits from low hallucination rates above almost everything else — a confident-but-wrong review comment costs more than a missed real issue.

The benchmarks below tell you which model best matches each shape.

GPT-5.5 Performance Highlights

The published benchmarks position GPT-5.5 as a meaningful jump over GPT-5.4, particularly on agentic and long-context work — the workloads most relevant to Codex users.

Knowledge work (GDPval) — 84.9%. GDPval evaluates whether a model can produce well-specified knowledge-work output across 44 occupations. This is the headline general-capability number.
Computer use (OSWorld-Verified) — 78.7%. Measures whether the model can drive a real computer environment end-to-end. Directly relevant to Codex Cloud sandboxes and agentic CLI runs.
Coding (Terminal-Bench 2.0) — 82.7%. A terminal-centric coding benchmark with long-context retrieval and computer-use components. The closest public proxy for Codex CLI workloads.
Customer-service workflows (Tau2-bench Telecom) — 98.0% without prompt tuning. Indicates strong tool-use and policy-adherence behavior straight out of the box.
Long-context retrieval (MRCR v2 at 1M tokens) — 74.0%, up from 36.6% on GPT-5.4. This is the largest single jump in the report and the most important one for repository-scale Codex tasks where the model must keep many files in working memory.
Hallucination rate — independent coverage reports a roughly 60% reduction in hallucinations versus prior generations, which materially changes the trust calculus for review and PR-feedback workflows.

What Each Benchmark Actually Measures

Benchmarks are easy to misread. Quick definitions of the ones cited above:

GDPval — Asks the model to produce specified knowledge-work output across 44 occupations (legal memos, financial summaries, technical documentation, etc.). A high score means the model can produce structured, well-specified output reliably. Use as a general-capability signal, not a coding-specific one.
OSWorld-Verified — Tasks the model with operating a real desktop environment to complete real workflows (open files, navigate UIs, run commands). High scores predict the model will behave well in agentic sandboxes that mimic a developer's desktop.
Terminal-Bench 2.0 — A terminal-driven coding benchmark with long-context retrieval and computer-use components. The closest public proxy for what Codex CLI actually does day to day.
Tau2-bench Telecom — Evaluates complex customer-service-style workflows that require following policies and using tools correctly. A proxy for "does the model do what you told it without going off-script."
MRCR v2 at 1M tokens — A long-context retrieval benchmark. Tests whether the model can find and use information across a full 1M-token context window. The single best predictor of behavior on repository-scale Codex tasks where many files must be kept in working memory.

Practical Guidance for Codex Users

Translate the benchmarks into model choice:

Repository-wide tasks (cross-file refactors, multi-module migrations): GPT-5.5. The MRCR v2 jump is the single best signal that it will behave better on large codebases than GPT-5.4 did.
Cheap, bounded local edits (single function, single test, doc tweak): GPT-5.4 or a Codex-specific model. The cost/latency tradeoff is much better and the capability headroom is wasted on small tasks. Do not default everything to GPT-5.5 just because it is newest.
Agentic cloud tasks (background sandbox runs, multi-step workflows): GPT-5.5. The OSWorld-Verified score and lower hallucination rate are the relevant signals — fewer broken sandbox runs and fewer confidently-wrong outputs.
PR review and code review workflows: GPT-5.5. The 60% hallucination drop is the single most important number for review work; a noisy reviewer trains the team to ignore the reviewer.
Most expensive workloads (anything that approaches GPT-5.5 Pro pricing): keep GPT-5.5 Pro reserved for the small set of tasks where its extra capability is justified — typically deeply novel reasoning or extreme long-context work.

For Procurement: Treat GPT-5.5 as a Separate Budget Line

Token consumption on agentic tasks is dominated by output. GPT-5.5 outputs are substantially more expensive than GPT-5.4 outputs. Concretely:

Mixed-model strategies are now the rule, not the exception. Most mature teams route routine work to a Codex-mini model and reserve GPT-5.5 for repository-wide and review-heavy work.
The worked cost example in Section 7 shows the 30-engineer PR-review case across all five model tiers. Read it before approving a budget.
Re-check pricing every quarter. The rate card has changed in the past and will change again.

Verify Before Quoting

The numbers in this section come from OpenAI's launch documentation and contemporaneous press coverage. Before they go into a procurement deck or a public document, verify against the official OpenAI announcement and the model page — see Section 16: Source References. Benchmarks get re-run; numbers shift with eval methodology changes.

Section 12: Troubleshooting

Even good tools fail if the setup is wrong. Here are the most common issues.

"Codex is not installed"

Check:

You ran npm i -g @openai/codex.
You are using a supported shell and runtime.
The binary is on your path.

Check:

Your ChatGPT account has the right plan.
Your workspace allows Codex local or cloud use.
You are signing in with the correct account.

"Windows is behaving badly"

The CLI docs say Windows support is experimental. If you are on Windows, the best supported path is to use WSL for the CLI or use the Codex app where appropriate.

"Cloud task cannot see my repo"

Check:

The GitHub connector is installed.
The repository is allowed in the connector.
Your organization admin has enabled Codex cloud.
You are using a GitHub-hosted repository.

"Codex will not browse the internet"

That is expected by default in cloud mode. Ask your admin whether internet access has been intentionally restricted.

"The result is technically correct but not what I wanted"

Usually this means the prompt was under-specified. Tighten:

The target file or feature.
The acceptance criteria.
The constraints.
The expected output format.

Section 13: FAQ

Is Codex a chat model?

Not exactly. It is a coding agent and product surface built to work on repositories, tests, code review, and multi-step software tasks.

Can I use Codex without switching tools all the time?

Yes. That is one of its strengths. You can use the CLI, IDE extension, or Codex app depending on your workflow.

Do I need the cloud features?

No. Many individual users will get value from the local CLI or IDE extension alone. Cloud tasks become more valuable as soon as you want background execution, parallelism, or automated review.

Is Codex only for professional engineers?

No, but it is most useful when the user can evaluate code changes and understand a repository. It is a developer tool first.

Is Codex the same as GPT-5.4?

No. GPT-5.4 is a model. Codex is the coding product/workflow. Codex may use different models depending on the surface and configuration.

What is the safest way to start?

Use the CLI or IDE extension in a small repo change, keep the approval mode conservative, and review every diff before merging.

Section 14: When NOT to Use Codex

Most of this handbook is affirmative — Codex is good at this, Codex fits here, here is how to set it up. That framing risks creating the impression that Codex is the right tool for any coding-adjacent task. It is not. The fastest way to lose team trust in an AI coding tool is to push it into work it is bad at. The following is an honest list of where Codex is a poor fit today.

Tasks With No Reviewable Output

Codex's value depends on a human reviewing the diff, the test result, or the explanation. If the task produces something nobody will check — a one-off script that touches production data, an exploratory query whose result drives a decision before anyone reads the SQL — the AI's confidence becomes the only quality gate. That is a bad position to be in regardless of model quality. Either add a review step or do the task yourself.

Highly Novel Architecture Decisions

Codex is good at applying patterns. It is much weaker at choosing which pattern fits a problem the team has not solved before. Expect it to confidently generate plausible-but-wrong architecture for genuinely new domains: a new pricing model, a new auth boundary, a new event-sourcing scheme. Use it to prototype options, not to decide between them.

Work That Crosses Org Boundaries

Codex sees the repository it has access to. It does not see the cross-team contracts, the deprecation calendar in the platform team's roadmap, the half-finished migration in another repo, or the political reasons one approach is off-limits. For changes that span multiple teams or services, Codex can implement individual pieces, but a human still needs to own the cross-cutting plan.

Anything Touching Live Production State

Codex Cloud sandboxes are good. They are not a substitute for human approval before a production change. Database migrations, infrastructure-as-code that mutates real resources, secret rotation, customer-data scripts — these need a human in the approval path even if Codex wrote the diff. The fact that Codex can run commands does not mean it should run those commands.

Compliance- and Safety-Critical Code

Code that lives inside a regulated boundary (payments, medical, security primitives, model-evaluation harnesses for safety) has higher review and provenance requirements than typical product code. Codex output is fine as a starting draft, but the review burden is the same as for any third-party-authored code, which usually means the speed advantage shrinks substantially. Plan for that or keep these areas Codex-free.

Tasks Where the Real Bottleneck Is Knowledge, Not Typing

If the team is stuck because nobody understands the legacy system, the failing test, or the weird customer report, generating more code rarely helps. Codex can accelerate the implementation once you know what to do. It cannot replace the discovery and design conversation that should happen first. Teams that skip the discovery step and go straight to "ask Codex" tend to ship the wrong thing fast.

Anything Where Hallucinations Have High Cost

GPT-5.5 dropped hallucination rates by roughly 60% versus prior generations, which is a real improvement. It is not zero. Tasks where a confident-but-wrong output causes real damage — generating regulatory citations, copying API contract details from a doc the model hasn't actually read, asserting facts about an unfamiliar third-party library — still need the same skepticism you would apply to any AI output. Use search-grounded workflows or human verification for these.

Quick Heuristic

If you can answer all four of these with "yes," Codex is likely a good fit:

Can the output be reviewed by someone who would catch a mistake?
Is the task a known pattern, not a novel architecture decision?
Is the blast radius local to one repository or service?
Is the cost of a bad output bounded (e.g., a failed test, a reverted commit) rather than unbounded (e.g., production data loss, regulatory exposure)?

If any of those are "no," either restructure the task to make them "yes" or keep the work outside Codex.

Section 15: Final Recommendations

If you are rolling Codex out to new users, I would keep the guidance very simple:

Start with the CLI or IDE extension.
Use one small task to learn the tool.
Review every change before merging.
Move to cloud tasks only after users trust the local workflow.
For teams, separate user access from admin access.
Re-check pricing whenever your plan or workspace changes.

Codex is most valuable when it is treated as a disciplined engineering tool rather than a novelty. If you give it real code, clear constraints, and a review culture, it can accelerate the boring parts of software development and make bigger tasks easier to break down.

The LUNARTECH Fellowship: Bridging Academia and Industry

Addressing the growing disconnect between academic theory and the practical demands of the tech industry, the LUNARTECH Fellowship was created to bridge this talent gap.

Far too often, aspiring engineers are caught in the “no experience, no job” loop, graduating with theoretical knowledge but unprepared for the messy reality of production systems.

To combat this systemic issue and halt the resulting brain drain, the Fellowship invests heavily in promising individuals, offering a transformative environment that prioritizes hands-on experience, mentorship, and real-world engineering over traditional degrees.

This 6-month, remote-first apprenticeship serves as an immersive odyssey from aspiring talent to AI trailblazer. Rather than paying to learn in isolation, Fellows work on live, high-stakes AI and data products alongside experienced senior engineers and founders. By tackling actual engineering challenges and building a concrete portfolio of production-ready work, participants acquire the job-ready skills needed to thrive in today’s competitive landscape.

If you are ready to break the loop and accelerate your career, you can explore these opportunities and start your journey here: https://www.lunartech.ai/our-careers.

Master Your Career: The AI Engineering Handbook

For those ready to transition from theory to practice, we have developed The AI Engineering Handbook: How to Start a Career and Excel as an AI Engineer. This comprehensive guide provides a step-by-step roadmap for mastering the skills necessary to thrive in the transformative world of AI in 2026.

Whether you are a developer looking to break into a competitive field or a professional seeking to future-proof your career, this handbook offers proven strategies and actionable insights that have already empowered countless individuals to secure high-impact roles.

Inside, you will explore real-world industry workflows, advanced architecting methods, and expert perspectives from leaders at companies like NVIDIA, Microsoft, and OpenAI. From discovering the technology behind ChatGPT to learning how to architect systems that transform research into world-changing products, this eBook is your ultimate companion for career acceleration. You can download your free copy and start mastering the future of AI.

Section 16: Source References

Official OpenAI sources used for this handbook:

Press coverage of the GPT-5.5 release referenced in Section 2 and Section 11:

Appendix A: 30-60-90 Day Adoption Plan

If you are introducing Codex to a team, the fastest way to create trust is to phase adoption instead of rolling it out as a big-bang change. A staged plan also helps you discover where the real friction lives: authentication, permissions, prompt quality, review habits, or budget assumptions.

First 30 Days: Prove Value

In the first month, the goal is not maximum usage. The goal is repeatable wins.

Recommended actions:

Pick one or two engineers who are comfortable trying new tools.
Restrict usage to small, low-risk tasks such as bug fixes, test generation, and documentation updates.
Standardize a short prompt template so every request includes task, context, constraints, and expected output.
Require human review for every change.
Track the time it takes to go from prompt to merged diff.

What you should learn in this phase:

Does Codex understand your codebase structure?
Are the diffs reviewable?
Does the approval flow slow people down in a useful way, or in a frustrating way?
Which classes of tasks work well, and which ones need more guidance?

If the first month is noisy, do not blame the model first. Usually the issue is task scope, missing context, or unclear acceptance criteria.

Days 31-60: Expand Carefully

Once the tool has proven itself on a handful of tasks, expand to a broader pilot group.

Recommended actions:

Add more developers from different parts of the stack.
Include at least one person who is skeptical, because their feedback will reveal weak spots.
Try the app, CLI, and IDE extension in parallel so people can choose the workflow that matches their habits.
Introduce Codex cloud for one or two background tasks or pull request reviews.
Start documenting prompts that worked well, including examples of high-quality follow-up instructions.

What you should learn in this phase:

Which surfaces are actually sticky for the team?
Where does Codex save the most time?
Do people trust the output enough to delegate real work?
Are you seeing the same mistakes repeatedly?

At this stage, your internal documentation matters. A short "how we use Codex here" page is often more useful than another technical deep dive.

Days 61-90: Operationalize

After about three months, your objective should shift from experimentation to operating practice.

Recommended actions:

Assign ownership for workspace settings, GitHub connector setup, and model access.
Define which tasks should stay local and which can go to cloud sandboxes.
Document your review standards for Codex-generated diffs.
Set budget expectations with the team so no one is surprised by token-heavy tasks.
Add Codex to onboarding for new engineers, starting with one simple flow.

What good looks like at this stage:

New hires can use Codex on day one.
Team members know when to reach for Codex and when to use a different workflow.
Admins can answer access and pricing questions quickly.
The organization has a realistic picture of the tool's strengths and limits.

A Practical Onboarding Script

If you need a ready-made orientation for a new user, use this:

"Install the CLI or extension."
"Open a repository you know well."
"Ask Codex to make one small, safe change."
"Review the diff line by line."
"Run the tests."
"Ask Codex to explain what it changed and why."
"Repeat with a slightly larger task."

That sequence teaches the core loop: context, task, change, review, verify. Once a user understands that loop, the rest of the product family becomes much easier to adopt.

Appendix B: Glossary

Terms used in this handbook, in alphabetical order. The list is intentionally narrow — only terms that appear in the body and are likely to be unfamiliar to a non-engineering reader (procurement, security, leadership) are defined here.

Agent / agentic workflow. Software that can take a goal, plan steps, take actions (read files, run commands, call APIs), observe the result, and iterate. Codex is an agentic coding workflow; a chatbot is not.
Approval mode. A Codex setting that controls how much the agent can do without asking. Stricter modes prompt the human before running shell commands or modifying files; permissive modes let the agent work uninterrupted.
CLI. Command-line interface. The Codex CLI is the terminal-based version of Codex, installed via npm i -g @openai/codex.
Codex Cloud. The hosted, sandboxed execution mode for Codex. Tasks run in isolated environments with the repo and finish with a reviewable diff.
GDPval. A benchmark that scores models on their ability to produce well-specified knowledge-work output across 44 occupations. Used in Section 11 as a general-capability signal.
GitHub Connector. The integration that lets Codex Cloud access GitHub repositories. Required for cloud tasks; uses short-lived, least-privilege tokens.
MCP (Model Context Protocol). An open protocol for connecting models to external data sources and tools. Codex CLI supports MCP, which lets it pull in data from systems beyond the repo.
MRCR v2. A long-context retrieval benchmark that measures whether the model can find and use information across very large input windows. The 1M-token version is cited in the GPT-5.5 section because it predicts behavior on repository-scale tasks.
OSWorld-Verified. A benchmark that measures whether a model can operate a real desktop computer environment to complete tasks. A direct proxy for agentic and computer-use workloads.
PR (pull request). A proposed change to a code repository, hosted on GitHub or similar platforms, where reviewers approve before the change merges.
RBAC (role-based access control). A permission model where users are assigned to roles, and roles have specific permissions. Used by Codex workspace admins to control who can do what.
SCIM (System for Cross-domain Identity Management). A standard for syncing users and groups from an identity provider (Okta, Entra ID, etc.) into another system. Codex supports SCIM-based group sync for enterprise.
Subagent. A Codex CLI feature that splits a task across multiple parallel agent runs, each handling a piece of the work.
Tau2-bench Telecom. A benchmark for complex customer-service workflows with tool use. Cited as a signal for tool-use reliability and policy adherence.
Terminal-Bench 2.0. A coding benchmark focused on terminal-driven workflows, including long-context retrieval and computer use. The closest public proxy for Codex CLI workloads.
Worktree. A git feature that lets multiple branches be checked out simultaneously in different directories. The Codex app uses worktrees so multiple agents can work in parallel without stepping on each other.
WSL (Windows Subsystem for Linux). A compatibility layer that runs Linux binaries natively on Windows. The recommended environment for Codex CLI on Windows, since direct Windows support is experimental.

Appendix C: Admin Security Checklist

For workspace admins setting up Codex for an enterprise. This checklist condenses Section 8 into actionable items. Run through it before broad rollout, then revisit quarterly.

Access

[ ] Decide whether Codex Local, Codex Cloud, or both are enabled at the workspace level.
[ ] Create separate RBAC groups for Codex Admins (policy and governance) and Codex Users (day-to-day developers). Avoid mixing the two.
[ ] Sync user and group membership from your identity provider via SCIM rather than managing users by hand.
[ ] Set a sensible default role for new workspace members. Do not default to admin.

GitHub integration

[ ] Install the ChatGPT GitHub Connector against the correct GitHub organization.
[ ] Allowlist only the repositories Codex Cloud needs. Do not grant org-wide access by default.
[ ] Verify Codex respects existing branch protection rules on protected branches before enabling cloud tasks against them.
[ ] Confirm the GitHub App tokens Codex uses are short-lived and least-privilege.

Network and runtime

[ ] Confirm Codex Cloud runs with no internet access by default. This is the secure default; verify it is on.
[ ] If a workflow requires internet access, define an explicit allowlist (dependency registries, trusted sites) and limit allowed HTTP methods.
[ ] Document which model surfaces are approved for sensitive code (often: local CLI yes, cloud no for the most sensitive repositories).

Data and review

[ ] Document the team's review standard for Codex-generated diffs. At minimum: a human approves every merge.
[ ] Confirm logging and audit trails are configured for Codex actions (model used, prompts, files changed) per your compliance requirements.
[ ] Define which classes of data are off-limits to Codex (PII, customer data, secrets) and how those boundaries are enforced.
[ ] Establish an incident playbook for the case where Codex generates or commits something it should not have.

Budget and ongoing operations

[ ] Set a per-workspace token budget or alert threshold so unexpected spend is caught early.
[ ] Pick a default model per task type (e.g., Codex-mini for routine review, GPT-5.5 for repository-wide refactors) and document the choice.
[ ] Review the Codex pricing page quarterly. The rate card has changed in the past and will change again.
[ ] Re-run this checklist when (a) a major model release lands, (b) the workspace expands to a new team, or (c) Codex adds a new surface or capability.

Appendix D: Changelog

A short, append-only log of substantive revisions to this handbook. Each entry lists the version, date, and a one-line summary of what changed.

v1.3 — 2026-04-30. Made the Table of Contents clickable. Added a new Prerequisites section after the TOC. Restructured the early sections: merged the old "Quick Start" and "How to Set Up Codex" into a single Section 4 walkthrough using a self-contained codex-demo repo readers build themselves. Slimmed Section 2 by moving the GPT-5.5 benchmark deep dive to a new Section 11 (Model Specs and Benchmarks). Added per-surface hyperlinks to Section 3. Rewrote Section 5 (How to Use Codex Effectively) with bad/good examples for every tip and a definition of "bounded change." Rewrote the "Measure What Matters" subsection with concrete computation methods for each metric. Added worked, runnable examples to every workflow in Section 10. Renumbered downstream sections accordingly.
v1.2 — 2026-04-25. Added Appendix E (Working with Codex in VS Code), a detailed step-by-step guide covering the three VS Code entry points — the extension, the CLI in the integrated terminal, and browser Codex at chatgpt.com/codex — with setup instructions, a decision matrix, a combined-workflow pattern, and VS Code-specific troubleshooting. Added a forward-pointer in the setup section.
v1.1 — 2026-04-25. Added GPT-5.5 / GPT-5.5 Pro coverage in Section 2 and Section 7. Added executive summary, comparison matrix in the model-comparison section, worked cost example, "When NOT to use Codex" in Section 14. Added Appendix B (Glossary), Appendix C (Admin Security Checklist), Appendix D (Changelog). Added version stamp and author line. Press coverage sources for GPT-5.5 added in Section 16.
v1.0 — Initial release. Original Codex onboarding handbook covering surfaces, setup, usage, model comparison, pricing, security, team practices, workflows, troubleshooting, FAQ, and the 30-60-90 day adoption plan.

Appendix E: Working with Codex in VS Code

This appendix is a focused, step-by-step guide to using Codex inside Visual Studio Code (and its forks, Cursor and Windsurf).

VS Code is the most common starting surface for new Codex users, and the workflow has three distinct entry points that can be used independently or together. This guide covers each one, when to pick it, and how the three combine into a single fluid workflow.

E.1 Why VS Code Is the Recommended Starting Surface

Most teams start with VS Code rather than the standalone Codex app or pure CLI for a few practical reasons:

The editor is already where engineers spend their day. Adding Codex does not require a context switch.
The extension surface area is small and reviewable. Engineers can try it on a single file before adopting it more broadly.
VS Code's integrated terminal makes the CLI a one-keystroke experience, so the extension and CLI can be combined without leaving the editor.
Cursor and Windsurf, the most popular VS Code forks, both run the same Codex extension. A team that standardizes on the VS Code workflow does not have to retrain people if some engineers prefer a fork.

The downside of starting in VS Code is that you do not get parallel-task management or worktree support out of the box — those are stronger in the Codex app. For most individual contributors, that is not a meaningful loss in the first month.

E.2 The Three Entry Points

Codex shows up in VS Code in three distinct ways, and they are easy to confuse. Each is a separate piece of software with its own install and its own auth handshake, even though they all sign in with the same ChatGPT account.

The Codex VS Code extension — a sidebar UI inside VS Code itself. Installed from the VS Code Marketplace. Best for in-flow editing, quick questions about the open file, and short bounded tasks.
The Codex CLI, run inside VS Code's integrated terminal — the command-line agent (codex) running in the terminal pane that is already attached to your VS Code workspace. Best for multi-step agentic tasks, scripted runs, and anything where you want explicit approval gates.
Browser Codex at chatgpt.com/codex — the web interface to Codex Cloud, where tasks run in isolated sandboxes against your GitHub repository. Best for background work, parallel tasks, and PR-style review.

These are not alternatives to each other in the sense that you must pick one. They are three workflows that target different kinds of work, and most experienced Codex users have all three set up.

E.3 Setting Up the Codex VS Code Extension

This is the entry point most new users meet first.

Install

There are two install paths:

Open the VS Code Marketplace, search for "Codex" or "ChatGPT", and install the extension published by openai. The marketplace identifier is openai.chatgpt.
From a terminal, run:

code --install-extension openai.chatgpt

The CLI install path is useful for scripted dev-environment provisioning, dotfiles repos, and onboarding scripts that bring a new machine up to a known baseline.

Sign in

After install, the Codex panel appears in the right sidebar. The first time you open it, you will be prompted to sign in. You have two options:

Sign in with ChatGPT. Recommended for individuals on Plus, Pro, Business, or Enterprise/Edu plans. Usage is charged against your plan's included Codex credits.
Sign in with an API key. Used when you want metered API billing instead of plan-based usage, or when your workspace policy requires it. Get the key from the OpenAI developer console, then paste it into the extension's auth prompt.

If both options are visible and you are unsure which to pick, default to ChatGPT sign-in. It is the path that exercises the same plan-included usage that the rest of your team is on, which makes cost behavior predictable.

First-run sanity check

Once signed in, do a five-minute sanity check before relying on the extension for real work:

Open a small repository you know well.
Open the Codex panel in the right sidebar.
Ask a question about the open file (e.g., "What does this function do?") and confirm the answer matches what you already know.
Ask for a small change (e.g., "Add a docstring to this function") and confirm a reviewable diff appears.
Apply the change, run your tests, and revert if needed.

If any of those steps fails, fix the auth or install before going further. Trying to debug the extension on a real task is much harder than debugging it on a known-good toy task.

Platform notes

macOS and Linux are first-class. The extension and the underlying CLI both work natively.
Windows is experimental for the CLI. The extension itself works, but if you also want to run the CLI inside VS Code's integrated terminal, OpenAI recommends using a WSL workspace. Open the folder via "Reopen in WSL" before installing the CLI.
Cursor and Windsurf run the same extension. Watch for visual or shortcut conflicts with the fork's built-in AI features — see E.9 for specifics.

E.4 Setting Up the Codex CLI Inside VS Code's Integrated Terminal

The CLI is the second entry point. It runs as a normal command-line tool, but inside VS Code's integrated terminal it picks up the active workspace folder automatically, which makes it feel like a native part of the editor.

Install the CLI

From any terminal, including VS Code's integrated terminal:

npm i -g @openai/codex

This installs the codex binary globally. Confirm by running:

codex --version

If the command is not found, the most common cause is that npm's global bin directory is not on your PATH. Either fix the PATH or use a Node version manager (nvm, fnm, volta) that handles it for you.

Open the integrated terminal in VS Code

Three ways to open it, pick whichever matches your habits:

The View menu → Terminal.
The keyboard shortcut Ctrl+** (backtick) on Windows/Linux, **⌃ on macOS.
The Command Palette: Terminal: Create New Terminal.

The integrated terminal inherits the active workspace folder as its working directory, which means codex launched from there immediately sees the right repo.

Run Codex

In the terminal, navigate to the repo (if you are not already there) and run:

codex

The first time you run it, you will go through the same auth flow as the extension — sign in with ChatGPT or paste an API key.

Pick an approval mode

The CLI supports several approval modes that govern how much Codex can do without explicit confirmation. For new users, start with the strictest mode (asks before every shell command and every file change), then loosen it once you trust the workflow on your repo. The relevant modes and how to toggle them are described in the CLI docs linked in Section 16.

Where the CLI beats the extension

Multi-step agentic runs that need to read several files, run tests, iterate, and report.
Anything you want to script or invoke from a package.json script, a Makefile, or a CI step.
Subagent decomposition (the CLI explicitly supports splitting a task across multiple parallel agent runs).
MCP-connected tools and custom data sources.
Cloud task launching from the terminal, when you do not want to leave the keyboard.

E.5 Setting Up Browser Codex (chatgpt.com/codex)

The third entry point lives outside VS Code but is essential for the full workflow because it is how you launch and monitor cloud tasks.

Open browser Codex

Navigate to chatgpt.com/codex. You will need to be signed into the same ChatGPT account you used for the extension and CLI. If you are part of an enterprise workspace, your admin must have enabled Codex Cloud at the workspace level — see Section 8.

You can also reach Codex through the sidebar in regular ChatGPT. The browser surface exposes two main verbs:

Code — assign a coding task. Codex spins up a sandbox preloaded with your repository and produces a reviewable diff.
Ask — ask a question about your codebase without changing any code.

Connect a GitHub repository

Cloud tasks need a GitHub-hosted repository. Connect it once:

Open environment settings at chatgpt.com/codex.
Connect your GitHub account through the ChatGPT GitHub Connector.
Grant access to the specific repositories you want Codex to be able to use. Do not grant org-wide access by default — see Appendix C for the security checklist.
Confirm the connector shows the repo as available.

Launch a task

From the Codex web interface:

Pick the repository and (optionally) the branch.
Type a prompt describing the task. Be specific — "Add input validation to the /users POST endpoint and update the matching tests" beats "Improve the API."
Click Code (or Ask for a non-mutating question).
Watch the live logs as Codex works, or close the tab and let it run in the background.
When it finishes, review the diff. From there you can request changes, accept the result, or open a pull request.

Delegate from a GitHub PR comment

A useful shortcut: in any PR on a connected repo, you can post a comment that tags @codex with an instruction (for example, "@codex review this PR for security issues and missing tests"). Codex will pick up the request and respond on the PR. This requires being signed into ChatGPT in the same browser.

Why the browser surface matters even if you live in VS Code

Cloud tasks decouple Codex from your local machine. You can launch a long-running task from the browser, close the laptop, and come back to the diff later. The extension and CLI cannot do this — they need an open VS Code instance to run.

E.6 When to Pick Which Entry Point

The three entry points overlap, which causes confusion. This table makes the choice mechanical.

Situation	Best entry point	Why
Quick edit on the file you have open	Extension	Lowest friction, no context switch
"What does this function do?"	Extension	Right-sidebar Q&A is faster than typing it into a terminal
Multi-file refactor with tests	CLI in integrated terminal	Better at multi-step agentic work and approvals
Anything you want to script or wire into a Makefile	CLI	Only the CLI is invokable from other scripts
Long-running task you want to leave running	Browser (cloud)	Decoupled from your laptop
Parallel tasks (e.g., three independent fixes at once)	Browser (cloud)	Cloud sandboxes run in parallel without local resource contention
PR review on a teammate's pull request	Browser, via `@codex` mention in PR	Lives where the review actually happens
Anything touching production credentials or live infra	None of the above without explicit human approval	See Section 14

The pattern that emerges: extension for in-flow editing, CLI for serious local agentic work, browser for anything you want offloaded or shared with the team.

E.7 The Combined VS Code Workflow

The three entry points are most powerful when used together. A representative day looks like this.

Morning, in VS Code:

Open the repo. The Codex extension panel is in the right sidebar.
Use the extension to ask questions about an unfamiliar module before you touch it.
Make small in-line edits — single-function changes, docstrings, type fixes — using the extension's diff-apply flow.

Mid-morning, in the integrated terminal:

Open the integrated terminal (Ctrl+`).
Run codex and start a multi-file task with explicit approval mode: "Refactor the auth middleware to use the new session interface. List the files you intend to touch first, then make the changes in the smallest commits possible."
Approve each shell command and each diff as Codex requests them.
Run the test suite when Codex finishes.

Afternoon, in the browser:

While you are reviewing the morning's CLI changes, open chatgpt.com/codex in another tab.
Launch a cloud task: "Add OpenAPI annotations to every public endpoint in the /api/v2 directory." This will take a while.
Switch back to VS Code and keep working. The cloud task runs in its own sandbox.
When the cloud task finishes, review the diff in the browser, request any tweaks, and open a PR.

End of day, on GitHub:

Tag @codex on a teammate's open PR with "review for correctness and missing tests." The result lands as a comment overnight.

The point of the combined workflow is that each entry point is doing what it is best at simultaneously. The extension keeps in-flow editing fast, the CLI handles local agentic work where you want approval control, and the cloud handles long-running and parallel tasks without consuming your local machine.

E.8 VS Code-Specific Tips

These are small tips that compound over time once you use Codex daily inside VS Code.

Sidebar position. The Codex panel defaults to the right sidebar. If you also have GitHub PR review or another panel there, drag Codex to the secondary side or to a panel-bottom dock — whichever keeps it visible without stealing space from the editor.
Keybindings. Bind the most-used Codex commands (open panel, new task, accept diff) to keyboard shortcuts via VS Code's Preferences: Open Keyboard Shortcuts. Reach for the keyboard, not the mouse.
Settings sync. If you use VS Code's Settings Sync, the Codex extension's settings travel with you to other machines. Auth state does not — you sign in again on each machine. This is the right behavior; do not work around it.
Multi-root workspaces. The extension scopes to the active workspace folder. If you open a multi-root workspace, switch the active folder explicitly before asking Codex to make changes, otherwise it may operate against the wrong root.
Integrated terminal profiles. If you use multiple terminal profiles (PowerShell, bash, WSL), set the WSL profile as default on Windows so codex from the integrated terminal always lands in the supported environment.
Source control panel. After Codex applies a change, the VS Code Source Control panel shows the diff. Review there before committing — it gives you the same context as a git diff without leaving the editor.
Don't fight the approval mode. New users often loosen approvals to "auto" too quickly because the prompts feel slow. Resist that for the first week. The approvals are how you build a mental model of what Codex actually does in your repo.
One Codex panel per VS Code window. Avoid running the extension and the CLI in the same workspace simultaneously on the same task — they can both touch files and you will get confused about which one made which change.

E.9 Cursor and Windsurf

The Codex extension explicitly supports Cursor and Windsurf, the two most popular VS Code forks. The install and sign-in flow is identical. The notes worth knowing:

Avoid double-AI confusion. Cursor and Windsurf both ship their own AI features. Engineers using them with Codex sometimes accidentally invoke the fork's built-in AI when they meant to invoke Codex, or vice versa. Pick a primary tool for editing and use the other only when its specific strengths matter.
Auth is independent. The Codex extension's ChatGPT sign-in is separate from Cursor's or Windsurf's own model accounts. Your Codex usage is billed against your ChatGPT plan; Cursor/Windsurf usage against theirs.
Keybinding conflicts. Cursor in particular has heavily customized AI-related keybindings. Audit your bindings after installing the Codex extension to make sure both surfaces are reachable.
Settings sync caveat. Cursor and Windsurf have their own settings sync that diverges from upstream VS Code. Codex extension settings may sync within Cursor or Windsurf separately from your VS Code installs.

For pure Codex-first teams, vanilla VS Code is the simplest baseline. For teams that already standardized on Cursor or Windsurf for other reasons, the Codex extension is a clean addition rather than a replacement.

E.10 Troubleshooting VS Code Specifically

The general troubleshooting list is in Section 12. The issues below are specific to running Codex inside VS Code.

Extension installs but sidebar panel never appears

Reload the window (Command Palette → "Developer: Reload Window"). If that does not fix it, check the Output panel, switch the dropdown to "Codex", and look for the actual error. The most common causes are a corporate proxy blocking the extension's auth handshake, or a conflicting older version of the extension still installed.

"Sign in" keeps looping back to the sign-in prompt

This usually means the redirect from the browser auth flow did not reach the extension. Try signing out completely, closing all VS Code windows, then reopening and signing in fresh. On Windows, verify your default browser is one VS Code can open via the OS handler.

codex command not found in the integrated terminal

The CLI's npm global bin directory is not on PATH. The fastest fix on macOS/Linux is to add $(npm bin -g) to your shell profile (.zshrc, .bashrc). On Windows, restart VS Code after the npm install so the integrated terminal picks up the updated PATH, or switch to a WSL terminal where the install is already on PATH.

Cloud task says "no repository connected" even though you connected one

Verify in chatgpt.com/codex environment settings that the specific repository is in the allowlist. The GitHub Connector grants per-repository access; granting access to the org alone is not enough. Also confirm your workspace admin has enabled Codex Cloud — individual users cannot enable it themselves.

Extension and CLI both editing the same file at the same time

Stop one of them. They do not coordinate, and you will get conflicting edits. The simplest discipline: pick one entry point per task, switch between tasks rather than trying to combine within a task.

Extension feels slower than the CLI for the same prompt

Often this is because the extension is using a different default model than your CLI configuration. Check both for the active model — the model picker in the extension panel, and codex --help or the relevant config file for the CLI.

Windows behavior is generally bad

Switch to a WSL workspace. OpenAI's own docs call out Windows as experimental for the CLI; the WSL path is the supported one and clears most issues at once.

Ready to Excel as an AI Engineer?

As we conclude this exploration of intelligent healthcare, it’s clear that the future belongs to those who can bridge the gap between groundbreaking research and real-world utility. If you are inspired to lead this transformation, we invite you to download our flagship resource, The AI Engineering Handbook. Authored by Tatev Aslanyan, a pioneering AI engineer and co-founder of LUNARTECH, this guide is designed to help you navigate the highly competitive landscape of AI engineering, providing you with the step-by-step roadmap and industry workflows needed to build world-changing products.

Empower yourself with the same strategies used by AI trailblazers at the world's most innovative tech companies. By mastering these production-ready skills, you won't just keep pace with the hyper-connected world — you will help define it. Get started today by downloading your eBook here: https://www.lunartech.ai/download/the-ai-engineering-handbook.

About LunarTech Lab

“Real AI. Real ROI. Delivered by Engineers — Not Slide Decks.”

LunarTech Lab is a deep-tech innovation partner specializing in AI, data science, and digital transformation – from healthcare to energy, telecom, and beyond.

We build real systems, not PowerPoint strategies. Our teams combine clinical, data, and engineering expertise to design AI that’s measurable, compliant, and production-ready. We’re vendor-neutral, globally distributed, and grounded in real AI and engineering, not hype. Our model blends Western European and North American leadership with high-performance technical teams offering world-class delivery at 70% of the Big Four’s cost.

How We Work — From Scratch, in Four Phases

1. Discovery Sprint (2–4 Weeks): We start with data and ROI – not assumptions to define what’s worth building and what’s not and how much it will cost you.

2. Pilot / Proof of Concept (8–12 Weeks): We prototype the core idea – fast, focused, and measurable.
This phase tests models, integrations, and real-world ROI before scaling.

3. Full Implementation (6–12 Months): We industrialize the solution – secure data pipelines, production-grade models, full compliance (HIPAA, MDR, GDPR), and knowledge transfer.

4. Managed Services (Ongoing): We maintain, retrain, and evolve the AI models for lasting ROI. Quarterly reviews ensure that performance improves with time, not decays. As we own LunarTech Academy, we also build customised training to ensure clients tech team can continue working without us.

Every project is designed from scratch, integrating clinical knowledge, data engineering, and applied AI research.

Why LunarTech Lab?

LunarTech Lab bridges the gap between strategy and real engineering, where most competitors fall short. Traditional consultancies, including the Big Four, sell frameworks, not systems – expensive slide decks with little execution.

We offer the same strategic clarity, but it’s delivered by engineers and data scientists who build what they design, at about 70% of the cost. Cloud vendors push their own stacks and lock clients in. LunarTech is vendor-neutral: we choose what’s best for your goals, ensuring freedom and long-term flexibility.

Outsourcing firms execute without innovation. LunarTech works like an R&D partner, building from first principles, co-creating IP, and delivering measurable ROI.

From discovery to deployment, we combine strategy, science, and engineering, with one promise: We don’t sell slides. We deliver intelligence that works.

Stay Connected with LunarTech

Follow LunarTech Lab on LunarTech NewsLetter and LinkedIn, where innovation meets real engineering. You’ll get insights, project stories, and industry breakthroughs from the front lines of applied AI and data science.

AI Tools for Developers – OpenClaw, GitHub Copilot, Claude Code, CodeRabbit, Gemini CLI

Beau Carnes — Wed, 01 Apr 2026 19:26:18 +0000

Using AI tools is an important part of being a software developer.

We just posted a course on the freeCodeCamp.org YouTube channel that will teach you how to use AI tools to become more productive as a developer. I created this course!

In this course, you will master AI pair programming and agentic terminal workflows using top-tier tools like GitHub Copilot, Anthropic's Claude Code, and the Gemini CLI. The course also covers open-source automation with OpenClaw, teaching you how to set up a highly customizable, locally hosted AI assistant for your development environment. Finally, you will learn how to maintain high code quality and streamline your team's workflow by integrating CodeRabbit for automated, AI-driven pull request analysis.

Watch the full course on the freeCodeCamp.org YouTube channel (1.5 hour watch).

How to Deploy Your Own 24x7 AI Agent using OpenClaw

Manish Shivanandhan — Mon, 16 Mar 2026 17:45:14 +0000

OpenClaw is a self-hosted AI assistant designed to run under your control instead of inside a hosted SaaS platform.

It can connect to messaging interfaces, local tools, and model providers while keeping execution and data closer to your own infrastructure.

The project is actively developed, and the current ecosystem revolves around a CLI-driven setup flow, onboarding wizard, and multiple deployment paths ranging from local installs to containerised or cloud-hosted setups.

This article explains how to deploy your own instance of OpenClaw from a practical systems perspective. We'll look at how to deploy it on your local machine as well as a PaaS provider like Sevalla.

The goal is not just to “make it run,” but to understand deployment choices, architecture implications, and operational tradeoffs so you can run a stable instance long term.

Note: It is dangerous to give an AI system full control of your system. Make sure you understand the risks before running it on your machine.

Understanding What You Are Deploying

Before touching installation commands, it helps to understand the runtime model.

OpenClaw is essentially a local-first AI assistant that runs as a service and exposes interaction through chat interfaces and a gateway architecture.

The gateway acts as the operational core, handling communication between messaging platforms, models, and local capabilities.

In practical terms, deploying OpenClaw means deploying three layers.

The first layer is the CLI and runtime, which launches and manages the assistant.

The second layer is configuration and onboarding, where you select model providers and integrations.

The third layer is persistence and execution context, which determines whether OpenClaw runs on your laptop, a VPS, or inside a container.

Because OpenClaw runs with access to local resources, deployment decisions are not only about convenience but also about security boundaries. Treat it as an administrative system, not just a chatbot.

Deploying on a Local Machine

OpenClaw supports multiple deployment approaches, and the right one depends on your goals.

The simplest route is to install it directly on a local machine. This is ideal for experimentation, private workflows, or development because onboarding is fast and maintenance is minimal.

The installer script handles environment detection, dependency setup, and launching the onboarding wizard.

The fastest way to install OpenClaw is via the official installer script. The installer downloads the CLI, installs it globally through npm, and launches onboarding automatically.

curl -fsSL https://openclaw.ai/install.cmd -o install.cmd && install.cmd && del install.cmd

This method abstracts away most environmental complexity and is recommended for first-time deployments.

If you already maintain a Node environment, you can install it directly using npm.

npm i -g openclaw

The CLI is then used to run onboarding and optionally install a daemon for persistent background execution. This approach gives you more control over versioning and update cadence.

openclaw onboard

Regardless of installation path, verify that the CLI is discoverable in your shell. Environment path issues are common when global npm packages are installed under custom Node managers.

The Onboarding Process

Once installed, OpenClaw relies heavily on onboarding to bootstrap configuration.

During onboarding you will select an AI provider, configure authentication, and choose how you want to interact with the assistant. This process establishes the core runtime state and generates local configuration files used by the gateway.

Onboarding also allows you to connect messaging channels such as Telegram or Discord. These integrations transform OpenClaw from a local CLI tool into an always-accessible assistant.

From a deployment perspective, this is the moment where availability requirements change. If you connect external chat platforms, your instance must remain online consistently.

You can skip certain onboarding steps and configure integrations later, but for production deployments it's better to complete the initial configuration so you can validate end-to-end functionality immediately.

Once you add an OpenAI API key or Claude key, you can choose to open the web UI.

Go to localhost:18789 to interact with OpenClaw.

Deploying on the Cloud using Sevalla

A second approach is to deploy to a VPS or cloud instance. This model gives you always-on availability and makes it possible to interact with OpenClaw from anywhere.

A third approach is containerised deployment using Docker or similar tooling. This provides reproducibility and cleaner dependency isolation.

Docker setups are particularly useful if you want predictable upgrades or easy migration between machines. OpenClaw’s repository includes scripts and compose configurations that support container execution workflows.

I have set up a custom Docker image to load OpenClaw into a PaaS platform like Sevalla.

Sevalla is a developer-friendly PaaS provider. It offers application hosting, database, object storage, and static site hosting for your projects.

Log in to Sevalla and click “Create application”. Choose “Docker image” as the application source instead of a GitHub repository. Use manishmshiva/openclaw as the Docker image, and it will be pulled automatically from DockerHub.

Click “Create application” and go to the environment variables. Add an environment variable ANTHROPIC_API_KEY . Then go to “Deployments” and click “Deploy now”.

Once the deployment is successful, you can click “Visit app” and interact with the UI with the Sevalla-provided URL.

Interacting with the Agent

There are many ways to interact with the agent once you set up Openclaw. You can configure a Telegram bot to interact with your agent. Basically, the agent will (try to) do a task similar to a human assistant. Its capabilities depend on how much access you provide the agent.

You can ask it to clean your inbox, watch a website for new articles, and perform many other tasks. Please note that providing OpenClaw access to your critical apps or files is not ideal or secure. This is still a system in its early stages, and the risk of it making a mistake or exposing your private information is high.

Here are some of the ways people are using OpenClaw.

Security and Operational Considerations

Because OpenClaw can execute tasks and access system resources, deployment security is not optional. The safest baseline is to bind services to localhost and access them through secure VPN tunnels when remote control is required. Learn more about VPNs here.

When deploying on a VPS, harden the host like any administrative service. Use non-root users, keep packages updated, restrict inbound ports, and monitor logs. If you're integrating messaging channels, treat tokens and API keys as sensitive secrets and avoid storing them in plaintext configuration where possible.

Containerization helps isolate dependencies but doesn't eliminate risk. The container still executes code on your host, so network and volume permissions should be carefully scoped.

Updating and Maintaining Your Instance

OpenClaw evolves quickly, with frequent releases and feature changes. Keeping your instance updated is important not only for features but also for stability and compatibility with integrations.

For npm-based installations, updates are straightforward, but you should test upgrades in a staging environment if your assistant handles important workflows. For source-based deployments, pull changes and rebuild consistently rather than mixing old build artifacts with new code.

Monitoring is another overlooked aspect. Even simple log inspection can reveal integration failures early. If your deployment is mission-critical, consider external uptime checks or process supervisors.

Conclusion

Deploying your own OpenClaw agent is ultimately about taking control of how your AI assistant works, where it runs, and how it fits into your daily workflows. While the setup process is straightforward, the real value comes from understanding the choices you make along the way, whether you run it locally for privacy, host it in the cloud for constant availability, or use containers for consistency and portability.

As the ecosystem around self-hosted AI continues to evolve, tools like OpenClaw make it possible to move beyond relying entirely on third-party platforms. Running your own agent gives you flexibility, ownership, and the freedom to shape the experience around your needs.

Start small, experiment safely, and gradually build confidence in how your assistant operates. Over time, what begins as a simple deployment can become a dependable, personalized system that works the way you want , under your control.

Hope you enjoyed this article. Learn more about me by visiting my website.

How to Build an AI Social Media Post Scheduler Using Gemini and Late API in Next.js

David Asaolu — Fri, 30 Jan 2026 17:34:55 +0000

Social media has become a vital tool for people and businesses to share ideas, promote products, and connect with their target audience. But creating posts regularly and managing schedules across multiple platforms can be time-consuming and repetitive.

In this tutorial, you’ll learn how to build an AI-powered social media post scheduler using Gemini, Late API, and Next.js.

We’ll use the Gemini API to generate engaging social media content from user prompts, Next.js to handle both the frontend and backend of the application, and Late API to publish and schedule posts across multiple social media platforms from a single platform.

Prerequisites
Setup and Installation
How to Schedule Social Media Posts with Late
How to Build the Next.js App Interface
How to integrate Gemini API for Post Generation
How to Use Late API in Next.js
Conclusion

Prerequisites

To fully understand this tutorial, you need to have a basic understanding of React or Next.js.

We will use the following tools:

Late API: A social media API that lets you create and schedule posts across 13 social media platforms from a single dashboard.
Next.js: A React framework for building fast, scalable web applications, handling both the frontend and backend.
Google Gemini API: Provides access to Google’s AI models for generating text and other content based on user prompts.

Setup and Installation

Create a new Next.js project using the following code snippet:

npx create-next-app post-scheduler

Install the project dependencies. We’ll use Day.js to work with JavaScript dates, making it easier to schedule and publish social media posts at the correct time.

npm install @google/genai dayjs utc

Next, add a .env.local file containing your Gemini API key at the root of your Next.js project:

GEMINI_API_KEY=

Once everything is set up, your Next.js project is ready. Now, let's start building! 🚀

Late is an all-in-one social media scheduling platform that allows you to connect your social media accounts and publish posts across multiple platforms. In this section, you’ll learn how to create and schedule social media posts using the Late dashboard.

To get started, create a Late account and sign in.

Create an API key and add it to the .env.local file within your Next.js project.

LATE_API_KEY=

Connect your social media accounts to Late so you can manage and publish posts across all platforms.

After connecting your social media accounts via OAuth, you can start writing, posting, and scheduling content directly to your social media platforms.

Late lets you write your post content and attach media files directly from the dashboard.

You can choose when your content should be published: post immediately, schedule for later, add it to a job queue, or save it as a draft.

Once a post is published, you can view its status and preview it directly in the dashboard using the post link.

🎉 Congratulations! You’ve successfully created your first post using the Late dashboard. In the next sections, you’ll learn how to use the Late API to create and schedule posts directly from your applications.

How to Build the Next.js App Interface

In this section, you’ll build the user interface for the application. The app uses a single-page route with conditional rendering to display recent posts, an AI prompt input field, and a form that allows users to create or schedule posts.

Before we proceed, create a types.d.ts file within your Next.js project and copy the following code snippet into the file:

interface Post {
    _id: string;
    content: string;
    scheduledFor: string;
    status: string;
}

interface AIFormProps {
    handleGeneratePost: (e: React.FormEvent) => void;
    useAI: boolean;
    setUseAI: React.Dispatchboolean>>;
    prompt: string;
    setPrompt: React.Dispatchstring>>;
    disableBtn: boolean;
}

interface FormProps {
    handlePostSubmit: (e: React.FormEvent) => void;
    content: string;
    setContent: React.Dispatchstring>>;
    date: string;
    setDate: React.Dispatchstring>>;
    disableBtn: boolean;
    setUseAI: React.Dispatchboolean>>;
    useAI: boolean;
}

The types.d.ts file defines all the data structures and type declarations used throughout the application.

Copy the following code snippet into the app/page.tsx file:

"use client";
import Nav from "./components/Nav";
import { useState } from "react";
import NewPost from "./components/NewPost";
import PostsQueue from "./components/PostsQueue";

export default function Page() {
    const [showPostQueue, setShowPostQueue] = useState<boolean>(false);
    return (
        'w-full h-screen'>
            
            {showPostQueue ?  : }
        
    );
}

The Page component renders the Nav component and uses conditional rendering to display either the PostsQueue or NewPost component based on the value of the showPostQueue state.

Create a components folder to store the page components used in the application.

cd app
mkdir components && cd components
touch Nav.tsx NewPost.tsx PostElement.tsx PostsQueue.tsx

Add the code snippet below to the Nav.tsx file:

export default function Nav({
    showPostQueue,
    setShowPostQueue,
}: {
    showPostQueue: boolean;
    setShowPostQueue: React.Dispatchboolean>>;
}) {
    return (
        
            Post Scheduler

            
        
    );
}

Copy the following code snippet into the PostsQueue.tsx file:

"use client";
import { useEffect, useState, useCallback } from "react";
import PostElement from "./PostElement";

export default function PostsQueue() {
    const [posts, setPosts] = useState([]);
    const [loading, setLoading] = useState<boolean>(true);

    return (
        'p-4'>
            'text-xl font-bold'>Scheduled Posts

            {loading ? (
                'text-sm'>Loading scheduled posts...
            ) : (
                'mt-4'>
                    {posts.length > 0 ? (
                        posts.map((post) => )
                    ) : (
                        No scheduled posts available.
                    )}
                
            )}
        
    );
}

The PostsQueue.tsx component displays a list of previously created posts along with their current status, showing whether each post has been published or scheduled for a later time. While the data is being loaded, it shows a loading message, and once loaded, it renders each post using the PostElement component.

Add the following to the PostElement.tsx component:

export default function PostElement({ post }: { post: Post }) {
    export const formatReadableTime = (isoString: string) => {
        const date = new Date(isoString); // parses UTC automatically
        return date.toLocaleString(undefined, {
            year: "numeric",
            month: "short",
            day: "numeric",
            hour: "2-digit",
            minute: "2-digit",
            second: "2-digit",
            hour12: true, // set to false for 24h format
        });
    };

    return (
        'p-4 border flex items-center justify-between  space-x-4 rounded mb-2 hover:bg-gray-100 cursor-pointer'>
            
                'font-semibold text-sm'>{post.content.slice(0, 100)}
                'text-blue-400 text-xs'>
                    Scheduled for: {formatReadableTime(post.scheduledFor)}
                
            

            'text-sm text-red-500'>{post.status}
        
    );
}

Finally, copy the following code snippet into the NewPost.tsx file:

"use client";
import { useState } from "react";

export default function NewPost() {
 const [disableBtn, setDisableBtn] = useState<boolean>(false);
 const [useAI, setUseAI] = useState<boolean>(false);
 const [content, setContent] = useState<string>("");
 const [prompt, setPrompt] = useState<string>("");
 const [date, setDate] = useState<string>("");

 //👇🏻 generates post content
 const handleGeneratePost = async (e: React.FormEvent) => {
  e.preventDefault();
  setDisableBtn(true);
 };

 //👇🏻 create/schedule post
 const handlePostSubmit = async (e: React.FormEvent) => {
  e.preventDefault();
 };

 return (
  'w-full p-4  h-[90vh] flex flex-col items-center justify-center border-t'>
   'text-xl font-bold'>New Post

   {useAI ? (
    
   ) : (
    
   )}
  
 );
}

The NewPost component conditionally renders the AIPromptForm and the PostForm. When a user chooses to generate content using AI, the AIPromptForm component is displayed to collect the prompt. Once the content is generated, the PostForm component is shown, allowing the user to edit, create, or schedule the post.

Add the components below inside the NewPost.tsx file:

export const AIPromptForm = ({
    handleGeneratePost,
    useAI,
    setUseAI,
    prompt,
    setPrompt,
    disableBtn,
}: AIFormProps) => {
    return (
        
            () => setUseAI(!useAI)}>Exit AI 
            3</span>}
                required
                value={prompt}
                onChange={<span class="hljs-function">(<span class="hljs-params">e</span>) =></span> setPrompt(e.target.value)}
                placeholder=<span class="hljs-string">'Enter prompt...'</span>
            />
            <button <span class="hljs-keyword">type</span>=<span class="hljs-string">'submit'</span> disabled={disableBtn}>
                {disableBtn ? <span class="hljs-string">"Generating..."</span> : <span class="hljs-string">"Generate Post with AI"</span>}
            </button>
        </form>
    );
};

<span class="hljs-comment">// 👇🏻 Post Form component</span>
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> PostForm = <span class="hljs-function">(<span class="hljs-params">{
    handlePostSubmit,
    content,
    setContent,
    date,
    setDate,
    disableBtn,
    setUseAI,
    useAI,
}: FormProps</span>) =></span> {
    <span class="hljs-keyword">const</span> getNowForDatetimeLocal = <span class="hljs-function">() =></span> {
        <span class="hljs-keyword">const</span> now = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>();
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(now.getTime() - now.getTimezoneOffset() * <span class="hljs-number">60000</span>)
            .toISOString()
            .slice(<span class="hljs-number">0</span>, <span class="hljs-number">16</span>);
    };

    <span class="hljs-keyword">return</span> (
        <form onSubmit={handlePostSubmit}>
            <p onClick={<span class="hljs-function">() =></span> setUseAI(!useAI)}>Generate posts <span class="hljs-keyword">with</span> AI </p>
            <textarea
                value={content}
                onChange={<span class="hljs-function">(<span class="hljs-params">e</span>) =></span> setContent(e.target.value)}
                rows={<span class="hljs-number">4</span>}
                placeholder=<span class="hljs-string">"What's happening?"</span>
                required
                maxLength={<span class="hljs-number">280</span>}
            />
            <input
                <span class="hljs-keyword">type</span>=<span class="hljs-string">'datetime-local'</span>
                min={getNowForDatetimeLocal()}
                value={date}
                onChange={<span class="hljs-function">(<span class="hljs-params">e</span>) =></span> setDate(e.target.value)}
            />
            <button disabled={disableBtn} <span class="hljs-keyword">type</span>=<span class="hljs-string">'submit'</span>>
                {disableBtn ? <span class="hljs-string">"Posting..."</span> : <span class="hljs-string">"Create post"</span>}
            </button>
        </form>
    );
};
</code></pre>
<p>Congratulations! You've completed the application interface.</p>
<h2 id="heading-how-to-integrate-gemini-api-for-post-generation">How to integrate Gemini API for Post Generation</h2>
<p>Here, you will learn how to generate post content from the user's prompt using the Gemini API.</p>
<p>Before we proceed, make sure you have copied your API key from the <a target="_blank" href="https://ai.google.dev/gemini-api/docs/api-key">Google AI Studio</a>.</p>
<p></p>
<p>Create an <code>api</code> folder inside the Next.js <code>app</code> directory. This folder will contain the API routes used to generate AI content and create or schedule posts using the Late API.</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> app && mkdir api
</code></pre>
<p>Next, create a <code>generate</code> folder inside the <code>api</code> directory and add a <code>route.ts</code> file. Copy the following code into the file:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// 👇🏻 In api/generate/route.ts file</span>
<span class="hljs-keyword">import</span> { NextRequest, NextResponse } <span class="hljs-keyword">from</span> <span class="hljs-string">"next/server"</span>;
<span class="hljs-keyword">import</span> { GoogleGenAI } <span class="hljs-keyword">from</span> <span class="hljs-string">"@google/genai"</span>;

<span class="hljs-keyword">const</span> ai = <span class="hljs-keyword">new</span> GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });

<span class="hljs-keyword">export</span> <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">POST</span>(<span class="hljs-params">req: NextRequest</span>) </span>{
    <span class="hljs-keyword">const</span> { prompt } = <span class="hljs-keyword">await</span> req.json();

    <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> ai.models.generateContent({
            model: <span class="hljs-string">"gemini-3-flash-preview"</span>,
            contents: <span class="hljs-string">`
    You are a social media post generator, very efficient in generating engaging posts for Twitter (X). Given a topic, generate a creative and engaging post that captures attention and encourages interaction. This posts will always be within the character limit of X (Twitter) which is 280 characters, which includes any hashtags or mentions, spaces, punctuation, and emojis.

    The user will provide a topic or theme, and you will generate a post based on that input.
    Here is the instruction from the user:
    "<span class="hljs-subst">${prompt}</span>"`</span>,
        });
        <span class="hljs-keyword">if</span> (!response.text) {
            <span class="hljs-keyword">return</span> NextResponse.json(
                {
                    message: <span class="hljs-string">"Encountered an error generating the post."</span>,
                    success: <span class="hljs-literal">false</span>,
                },
                { status: <span class="hljs-number">400</span> },
            );
        }

        <span class="hljs-keyword">return</span> NextResponse.json(
            { message: response.text, success: <span class="hljs-literal">true</span> },
            { status: <span class="hljs-number">200</span> },
        );
    } <span class="hljs-keyword">catch</span> (error) {
        <span class="hljs-keyword">return</span> NextResponse.json(
            { message: <span class="hljs-string">"Error generating post."</span>, success: <span class="hljs-literal">false</span> },
            { status: <span class="hljs-number">500</span> },
        );
    }
}
</code></pre>
<p>The <code>api/generate</code> endpoint accepts the user's prompt and generates post content using the <a target="_blank" href="https://ai.google.dev/gemini-api/docs/quickstart#javascript_1">Gemini API.</a></p>
<p>Now you can send a request to the newly created <code>/api/generate</code> endpoint from the <code>NewPost</code> component. Update the <code>handleGeneratePost</code> function as shown below:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> handleGeneratePost = <span class="hljs-keyword">async</span> (e: React.FormEvent<HTMLFormElement>) => {
    e.preventDefault();
    setDisableBtn(<span class="hljs-literal">true</span>);
    <span class="hljs-keyword">const</span> result = <span class="hljs-keyword">await</span> fetch(<span class="hljs-string">"/api/generate"</span>, {
        method: <span class="hljs-string">"POST"</span>,
        headers: {
            <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span>,
        },
        body: <span class="hljs-built_in">JSON</span>.stringify({ prompt }),
    });

    <span class="hljs-keyword">const</span> data = <span class="hljs-keyword">await</span> result.json();
    <span class="hljs-keyword">if</span> (data.success) {
        setUseAI(<span class="hljs-literal">false</span>);
        setContent(data.message);
        setPrompt(<span class="hljs-string">""</span>);
    }
    setDisableBtn(<span class="hljs-literal">false</span>);
};
</code></pre>
<p>The <code>handleGeneratePost</code> function accepts the user's prompt and returns the AI-generated content.</p>
<h2 id="heading-how-to-use-late-api-in-nextjs">How to Use Late API in Next.js</h2>
<p><a target="_blank" href="https://docs.getlate.dev/core/posts#create-a-draft-scheduled-or-immediate-post">Late</a> provides API endpoints that let you create, schedule, and manage posts programmatically. This allows you to integrate social media posting directly into your applications or automation workflows.</p>
<p>To get started, copy your Late API key and the account ID of your social media platforms into the <code>.env.local</code> file:</p>
<pre><code class="lang-bash">LATE_API_KEY=<Late_API_key>
ACCOUNT_ID=<social_media_acct_id>

<span class="hljs-comment"># Gemini API key</span>
GEMINI_API_KEY=<gemini_API_key>
</code></pre>
<p></p>
<p><strong>Note:</strong> In this tutorial, we will be using Twitter (X) as the social media platform for scheduling posts. You can adapt the same workflow to other platforms supported by Late API by updating the platform and accountId values in your API requests.</p>
<p>Create an <code>api/post</code> endpoint to accept post content and schedule or publish posts using the Late API.</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> api
mkdir post && <span class="hljs-built_in">cd</span> post
touch route.ts
</code></pre>
<p>Then, add the following POST method to <code>post/route.ts</code>:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { NextRequest, NextResponse } <span class="hljs-keyword">from</span> <span class="hljs-string">"next/server"</span>;
<span class="hljs-keyword">import</span> utc <span class="hljs-keyword">from</span> <span class="hljs-string">"dayjs/plugin/utc"</span>;
<span class="hljs-keyword">import</span> dayjs <span class="hljs-keyword">from</span> <span class="hljs-string">"dayjs"</span>;

dayjs.extend(utc);

<span class="hljs-keyword">export</span> <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">POST</span>(<span class="hljs-params">req: NextRequest</span>) </span>{
    <span class="hljs-keyword">const</span> { content, publishAt } = <span class="hljs-keyword">await</span> req.json();

    <span class="hljs-comment">// Determine if the post should be scheduled or published immediately</span>
    <span class="hljs-keyword">const</span> nowUTC = publishAt ? dayjs(publishAt).utc() : <span class="hljs-literal">null</span>;
    <span class="hljs-keyword">const</span> publishAtUTC = nowUTC ? nowUTC.format(<span class="hljs-string">"YYYY-MM-DDTHH:mm"</span>) : <span class="hljs-literal">null</span>;

    <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> fetch(<span class="hljs-string">"https://getlate.dev/api/v1/posts"</span>, {
            method: <span class="hljs-string">"POST"</span>,
            headers: {
                Authorization: <span class="hljs-string">`Bearer <span class="hljs-subst">${process.env.LATE_API_KEY}</span>`</span>,
                <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span>,
            },
            body: <span class="hljs-built_in">JSON</span>.stringify({
                content,
                platforms: [
                    {
                        platform: <span class="hljs-string">"twitter"</span>,
                        accountId: process.env.ACCOUNT_ID!,
                    },
                ],
                publishNow: !publishAt,
                scheduledFor: publishAtUTC,
            }),
        });

        <span class="hljs-keyword">const</span> { post, message } = <span class="hljs-keyword">await</span> response.json();

        <span class="hljs-keyword">if</span> (post?._id) {
            <span class="hljs-keyword">return</span> NextResponse.json({ message, success: <span class="hljs-literal">true</span> }, { status: <span class="hljs-number">201</span> });
        }

        <span class="hljs-keyword">return</span> NextResponse.json({ message: <span class="hljs-string">"Error occurred"</span>, success: <span class="hljs-literal">false</span> }, { status: <span class="hljs-number">500</span> });
    } <span class="hljs-keyword">catch</span> (error) {
        <span class="hljs-keyword">return</span> NextResponse.json({ message: <span class="hljs-string">"Error scheduling post."</span>, success: <span class="hljs-literal">false</span> }, { status: <span class="hljs-number">500</span> });
    }
}
</code></pre>
<p>From the code snippet above:</p>
<ul>
<li><p>The <code>api/post</code> endpoint accepts the post’s content and an optional <code>publishAt</code> time.</p>
</li>
<li><p>If <code>publishAt</code> is <code>null</code>, the post is published immediately. Otherwise, the time is converted to UTC for scheduling.</p>
</li>
<li><p>It then sends a request to the Late API using your API key and the account ID to create or schedule the post on the selected social media platform.</p>
</li>
</ul>
<p>You can also add a <strong>GET</strong> method to the <code>/api/post</code> endpoint to retrieve posts that have already been created or scheduled:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">GET</span>(<span class="hljs-params"></span>) </span>{
    <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> fetch(
            <span class="hljs-string">"https://getlate.dev/api/v1/posts?platform=twitter"</span>,
            {
                method: <span class="hljs-string">"GET"</span>,
                headers: {
                    Authorization: <span class="hljs-string">`Bearer <span class="hljs-subst">${process.env.LATE_API_KEY}</span>`</span>,
                    <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span>,
                },
            },
        );

        <span class="hljs-keyword">const</span> { posts } = <span class="hljs-keyword">await</span> response.json();

        <span class="hljs-keyword">return</span> NextResponse.json({ posts }, { status: <span class="hljs-number">200</span> });
    } <span class="hljs-keyword">catch</span> (error) {
        <span class="hljs-keyword">return</span> NextResponse.json(
            { message: <span class="hljs-string">"Error fetching posts."</span>, success: <span class="hljs-literal">false</span> },
            { status: <span class="hljs-number">500</span> },
        );
    }
}
</code></pre>
<p>Next, update the <code>handlePostSubmit</code> function in <code>NewPost.tsx</code> to send a POST request to <code>/api/post</code>. This will create or schedule the post and notify the user of the result:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> handlePostSubmit = <span class="hljs-keyword">async</span> (e: React.FormEvent<HTMLFormElement>) => {
    e.preventDefault();
    setDisableBtn(<span class="hljs-literal">true</span>);

    <span class="hljs-keyword">const</span> now = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>();
    <span class="hljs-keyword">const</span> selected = date ? <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(date) : <span class="hljs-literal">null</span>;
    <span class="hljs-keyword">const</span> publishAt = !selected || selected <= now ? <span class="hljs-literal">null</span> : date;

    <span class="hljs-keyword">const</span> result = <span class="hljs-keyword">await</span> fetch(<span class="hljs-string">"/api/post"</span>, {
        method: <span class="hljs-string">"POST"</span>,
        headers: { <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span> },
        body: <span class="hljs-built_in">JSON</span>.stringify({ content, publishAt }),
    });

    <span class="hljs-keyword">const</span> { message, success } = <span class="hljs-keyword">await</span> result.json();

    <span class="hljs-keyword">if</span> (success) {
        setContent(<span class="hljs-string">""</span>);
        setDate(<span class="hljs-string">""</span>);
        alert(<span class="hljs-string">"Success: "</span> + message);
    } <span class="hljs-keyword">else</span> {
        alert(<span class="hljs-string">"Error: "</span> + message);
    }

    setDisableBtn(<span class="hljs-literal">false</span>);
};
</code></pre>
<p>Finally, fetch all scheduled or published posts and render them in the <code>PostsQueue</code> component:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> fetchScheduledPosts = useCallback(<span class="hljs-keyword">async</span> () => {
    <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> fetch(<span class="hljs-string">"/api/post"</span>, {
            method: <span class="hljs-string">"GET"</span>,
            headers: { <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span> },
        });
        <span class="hljs-keyword">const</span> data = <span class="hljs-keyword">await</span> response.json();
        setPosts(data.posts);
        setLoading(<span class="hljs-literal">false</span>);
    } <span class="hljs-keyword">catch</span> (error) {
        <span class="hljs-built_in">console</span>.error(<span class="hljs-string">"Error fetching scheduled posts:"</span>, error);
        setLoading(<span class="hljs-literal">false</span>);
    }
}, []);

useEffect(<span class="hljs-function">() =></span> {
    fetchScheduledPosts();
}, [fetchScheduledPosts]);
</code></pre>
<p>🎉 Congratulations! You’ve successfully built an AI-powered social media post scheduler using Next.js, Gemini API, and Late API.</p>
<p>The source code for this tutorial is available on <a target="_blank" href="https://github.com/dha-stix/ai-post-scheduler">GitHub</a>.</p>
<div class="embed-wrapper">
        </div>
<p> </p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this tutorial, you’ve learnt how to create and schedule social media posts across multiple platforms using a single scheduling platform, Late, and how to generate AI content using the Gemini API.</p>
<p>The <a target="_blank" href="https://getlate.dev/">Late API</a> is a powerful tool for automating social media tasks, posting at specific intervals, managing multiple accounts, and tracking analytics – all from one platform. By combining it with generative AI models like Gemini and automation tools like n8n or Zapier, you can build automated workflows that keep your audience engaged with minimal effort.</p>
<p>The <a target="_blank" href="https://ai.google.dev/gemini-api/docs/quickstart">Gemini API</a> also makes it easy to integrate AI-powered text, images, or code generation directly into your applications, opening up a wide range of creative possibilities.</p>
<p>Thank you for reading! 🎉</p>
 
</article>
<article>
<h1> How to Build an AI-Powered Flutter App with Google Antigravity: A Hands-On Tutorial </h1>
<p>Anna Muzykina — Wed, 07 Jan 2026 17:26:39 +0000</p>
 <p>As a Flutter developer who’s building a cloud-based ecosystem for digital media lifecycle management, I’m constantly looking for ways to speed up the transition from idea to prototype.</p>
<p>In November 2025, Google launched <a target="_blank" href="https://antigravity.google/blog/introducing-google-antigravity"><strong>antigravity</strong></a>, a new interactive coding platform that has fundamentally shifted my workflow.</p>
<p>Antigravity has completely <a target="_blank" href="https://www.youtube.com/watch?v=SVCBA-pBgt0&t=2s">changed how fast you can prototype</a> and iterate on projects. Instead of writing boilerplate code or spending hours searching through documentation, you can describe your needs in natural language, review the plan, and let AI agents create, test, and even run the code.</p>
<p>This "coding in the air" approach creates the feeling of working with a very capable junior developer who never tires.</p>
<p>Based on my positive experience, I decided to share my first steps and thoughts about Antigravity. In this hands-on tutorial, we’ll create Water Tracker, a beautiful, modern Flutter app that helps users track their water intake with smart progress visualization and gentle reminders.</p>
<p>We’ll use Antigravity to let AI agents plan, write, test, and show video walkthroughs of your app. This “<a target="_blank" href="https://www.freecodecamp.org/news/how-to-use-vibe-coding-effectively-as-a-dev/">vibe coding</a>” style means that you describe what you want, review plans, and approve changes – all while agents handle the heavy lifting.</p>
<p>The app will feature a <a target="_blank" href="https://www.freecodecamp.org/news/glassmorphism-how-to-create-a-glass-card-in-figma/">glassmorphism</a> design: frosted glass cards, blurred backgrounds, subtle borders, and soft translucency. This will give the app a premium, modern feel that’s both elegant and calming.</p>
<h2 id="heading-heres-what-well-cover">Here's what we'll cover:</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-understanding-the-antigravity-engine">Understanding the Antigravity Engine</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-prompts-the-key-to-successful-vibe-coding-in-antigravity">Prompts - the Key to Successful Vibe Coding in Antigravity</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-1-open-antigravity-and-create-a-workspace">Step 1: Open Antigravity and Create a Workspace</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-mastering-the-art-of-agentic-prompting">Step 2: Mastering the Art of Agentic Prompting</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-implement-the-glassmorphic-main-screen">Step 3: Implement the Glassmorphic Main Screen</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-4-add-persistence-and-daily-logic">Step 4: Add Persistence and Daily Logic</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-5-add-on-device-smart-reminders-with-gemma">Step 5: Add On-Device Smart Reminders with Gemma</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-6-final-polish">Step 6: Final Polish</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-antigravity-quota-limit">Antigravity quota limit</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<p>In this tutorial, we will build <strong>Water Tracker</strong>: a modern Flutter app featuring an attractive glassmorphism design. We will use Antigravity’s agentic workflow to handle the heavy lifting, including a circular progress visualization and on-device smart reminders powered by Gemma.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To follow along, you’ll need:</p>
<ul>
<li><p>Flutter SDK installed (Flutter doctor should be clean)</p>
</li>
<li><p>Android emulator or physical device</p>
</li>
<li><p>Google Antigravity installed (you can use the free public preview from <a target="_blank" href="https://antigravity.google/">antigravity.google</a>)</p>
</li>
</ul>
<h2 id="heading-understanding-the-antigravity-engine">Understanding the Antigravity Engine</h2>
<p>Before we dive into the code, it is important to understand what is happening under the hood. Unlike standard LLM chat interfaces that simply provide code snippets for you to copy-paste, Google Antigravity is an agentic development platform.</p>
<p>While the core is a familiar AI-powered IDE experience that uses the best of Google’s models, Antigravity is evolving the IDE towards an agent-first future with browser control capabilities, asynchronous interaction patterns, and an agent-first product form factor. Together, all this enables agents to autonomously plan and execute complex, end-to-end software tasks.</p>
<p>It connects to powerful Large Language Models, but it also has "tools" or "skills" that allow it to interact with your file system, run terminal commands like <code>flutter create</code>, and even execute the app in an emulator.</p>
<p>When you send a prompt, the system doesn't just guess the next word. It uses a reasoning loop to plan actions, execute them, and verify the output.</p>
<p>But because these agents can make autonomous decisions, your role shifts from "writer" to "editor and supervisor." You must verify the agents’ plans to ensure they follow best practices and don't introduce security or performance regressions.</p>
<h2 id="heading-prompts-the-key-to-successful-vibe-coding-in-antigravity"><strong>Prompts: the Key to Successful Vibe Coding in Antigravity</strong></h2>
<p>While building my platform, I learned that prompts for Google Antigravity are completely different from regular AI chats or code completers.</p>
<p>Antigravity is <strong>agentic</strong>. It means that AI agents can run commands, create files, launch apps, and test everything autonomously. This power means prompts must be structured like detailed instructions to a very capable junior developer, not short requests.</p>
<p>That's why every prompt in this tutorial follows the same pattern:</p>
<ul>
<li><p><strong>High-level goal + vibe</strong>: I describe the feature and the desired feel (for example, glassmorphism with soft blues, premium and calming).</p>
</li>
<li><p><strong>Detailed requirements in bullets</strong>: Functionality, UX, design, performance, accessibility – everything the agent needs to deliver quality on the first try.</p>
</li>
<li><p><strong>Plan-first safety</strong>: Always include something like, "Before any commands/code: generate a detailed plan artifact (folder tree, dependencies, steps) and ask for approval." This forces the agent to think first and lets me review/correct before anything changes.</p>
</li>
<li><p><strong>Verification request</strong>: Ask for screenshots and video walkthrough artifacts so I can visually check the result.</p>
</li>
<li><p><strong>No roles or fluff</strong>: Use direct, natural language. Agents don't need "You are an expert..." to work well.</p>
</li>
</ul>
<p>This style helps prevent mistakes (so agents can't run wild), ensures consistent premium quality (glassmorphism done right), and creates the relaxed "vibe coding" flow: you can focus on vision, approve plans, and get polished features fast.</p>
<p>Without this structure, agents might skip steps or produce basic results. With it, you get the beautiful, functional app we built together. In this tutorial I’ll share the prompts which I’m using.</p>
<h2 id="heading-step-1-open-antigravity-and-create-a-workspace">Step 1: Open Antigravity and Create a Workspace</h2>
<p>To get started, you’ll need to download and install the Antigravity IDE. It’s important to note that Antigravity is a standalone application, not a plugin or extension for your existing editor.</p>
<p>It’s built as a <strong>fork of Visual Studio Code</strong>, which means the setup is incredibly straightforward. If you have ever used VS Code, the interface will be instantly familiar, and you can even bring over your favorite shortcuts and themes. It functions as a standalone development environment that integrates the editor, terminal, and AI agents into a single window.</p>
<p>Next, open the Agent manager by clicking “Open Agent Manager” (either the button at the top or in the center of the screen, as you can see below):</p>
<p></p>
<p>The panel on the left has an “+ Open Workspace” – just click that to create a new workspace:</p>
<p></p>
<p>Then go ahead and click “Open New Workspace”:</p>
<p></p>
<p>Then just name it <code>water_tracker</code> and create it:</p>
<p></p>
<p>Now you have a clean workspace ready for prompts:</p>
<p></p>
<p>This creates a sandboxed environment where the AI agent can safely manage your files without affecting your other projects. And now your AI agents have the permission to build and test your Flutter code!</p>
<h2 id="heading-step-2-mastering-the-art-of-agentic-prompting">Step 2: Mastering the Art of Agentic Prompting</h2>
<p>In Antigravity, your success depends on how you communicate with your agents. Good prompts are detailed and always require a plan first. To create an effective prompt, you should think like a Project Manager: define the scope, set technical constraints, and establish a "checkpoint" before the agent executes any code.</p>
<h3 id="heading-the-anatomy-of-a-perfect-prompt">The Anatomy of a Perfect Prompt</h3>
<p>As we briefly discussed above, a strong prompt follows a clear structure: <strong>Context + Goal + Constraints + Verification</strong>. By explicitly asking for a plan, you prevent the agent from making assumptions about your architecture or UI style that might be difficult to undo later.</p>
<p>Copy and paste the following prompt into the Agent Manager:</p>
<pre><code class="lang-markdown">Create a new Flutter project named <span class="hljs-code">`water_tracker`</span>.

Design requirements:
<span class="hljs-bullet">-</span> Glassmorphism style throughout: frosted glass cards, blurred backgrounds, subtle borders, translucency
<span class="hljs-bullet">-</span> Soft color palette: light blues, whites, gentle gradients
<span class="hljs-bullet">-</span> Modern, premium feel with depth and elegance

Before any commands:
<span class="hljs-bullet">1.</span> Generate a detailed project plan artifact including:
<span class="hljs-bullet">   -</span> Full folder structure tree
<span class="hljs-bullet">   -</span> Recommended dependencies (e.g., shared<span class="hljs-emphasis">_preferences, glassmorphism package if available)
   - High-level architecture (simple state management to start)
2. Ask for my explicit approval.

After approval:
- Run `flutter create water_</span>tracker`
<span class="hljs-bullet">-</span> Add dependencies
<span class="hljs-bullet">-</span> Launch the blank app
<span class="hljs-bullet">-</span> Provide screenshots and video walkthrough artifact.
</code></pre>
<p></p>
<h3 id="heading-analyzing-the-prompt-strategy">Analyzing the Prompt Strategy</h3>
<p>I crafted this prompt with specific "hooks" to ensure high-quality output:</p>
<ol>
<li><p>First, the <strong>Design requirements</strong> block uses sensory language ("frosted," "soft," "depth") to guide the agent's aesthetic choice.</p>
</li>
<li><p>Second, the <strong>"Before any commands"</strong> section is the most important element as it creates a safety gate. This forces the agent into a "Plan-First" mode, where it must present its logic as a readable document (an Artifact) before touching your file system.</p>
</li>
<li><p>Finally, the <strong>Verification</strong> requests (screenshots/video) ensure the agent is responsible for proving the setup was successful.</p>
</li>
</ol>
<h3 id="heading-reviewing-the-implementation-plan">Reviewing the Implementation Plan</h3>
<p>After running this prompt, agent will give you plan to review. Scroll down and read everything carefully, making sure the plan looks solid. If it does, reply by clicking on the <strong>"Proceed"</strong> button:</p>
<p></p>
<h3 id="heading-authorizing-commands-in-the-agent-manager">Authorizing Commands in the Agent Manager</h3>
<p>After you proceed with the plan, the agent will begin the <strong>Initializing Project</strong> phase. In Antigravity, agents do not run terminal commands in the background without your knowledge. Instead, they present the specific command for your authorization.</p>
<p></p>
<p>As shown in the above screenshot, the agent will ask to run: <code>flutter pub add provider shared_preferences intl flutter_animate google_fonts</code>.</p>
<p>Clicking <strong>"Accept"</strong> here is the specific action that gives the agent permission to actually execute the command in your workspace. This is the moment the project actually starts to exist, dependencies are added, and the initial folder structure is generated.</p>
<p></p>
<h3 id="heading-managing-commands-and-folders">Managing Commands and Folders</h3>
<p>The "Step Requires Input" gate ensures you maintain full control over what’s being installed on your machine.</p>
<p>Before any directories are actually made, the agent displays the exact <code>mkdir</code> command it plans to run. You’ll need to review this proposed folder structure and click the blue <strong>"Accept"</strong> button to authorize the agent to physically create those paths in your workspace.</p>
<p></p>
<h3 id="heading-verifying-the-emulator-launch">Verifying the Emulator Launch</h3>
<p>Before launching on the emulator, the agent will ask for permission to launch it:</p>
<p></p>
<p>The agent will then initialize the project and show you the running app in the integrated emulator:</p>
<p></p>
<p>Also, the agent tests the app and records a few seconds of video to demonstrate that all buttons are working:</p>
<p></p>
<h2 id="heading-switching-to-the-editor-for-verification">Switching to the Editor for Verification</h2>
<p>Once the agent has finished initializing the project and building the directory structure, you’ll want to see the results of its work.</p>
<p>Because Antigravity is an agentic IDE, it often keeps the focus on the <strong>Agent Manager</strong> while it runs terminal commands and generates code in the background. To switch from the agent's log to the actual source code, click the “Open editor” button (the <code>< ></code> icon) located at the top right of the interface.</p>
<p></p>
<p>Clicking this button reveals the <strong>Explorer</strong> view on the left, where you can now see the newly created <code>water_tracker</code> project. You should explore the <code>lib/</code> directory to verify that the agent successfully created <code>main.dart</code> and organized your files into the <code>core</code>, <code>data</code>, and <code>ui</code> folders as proposed in its earlier plan.</p>
<p>This is your chance to perform a sanity check on the code itself. Open <code>main.dart</code> to ensure the agent correctly set up the <code>WaterTrackerApp</code> and initialized your theme before you proceed to the next stage of development.</p>
<p></p>
<h3 id="heading-understanding-orchestration-vs-verification">Understanding Orchestration vs. Verification</h3>
<p>Just to clarify, in Antigravity, the transition between the Agent Manager and the Open editor button (the <code>< ></code> icon) represents a shift from <strong>orchestration</strong> to <strong>verification</strong>:</p>
<ul>
<li><p><strong>The Agent Manager View (Orchestration)</strong>: When you click <strong>Open Agent Manager</strong>, you’re looking at the "command center" for the AI agents. In this view, you see a terminal-like interface where the agent proposes actions.<br>  For example, as seen in your screenshot, the agent shows a "Step Requires Input" and waits for you to click Accept on a terminal command like <code>flutter pub add</code>. You can’t edit code here – you can only approve or reject the agent's planned terminal operations.</p>
</li>
<li><p><strong>The Editor View (Verification)</strong>: When you click the <strong>'Open editor'</strong> button (the <code>< ></code> icon) in the top right, the IDE reveals the standard VS Code-style workspace. This is where the physical files (like <code>main.dart</code> and the folder structure you just authorized) actually appear.<br>  While the Agent Manager shows you the <em>log</em> of what the agent did, the Editor View allows you to open those files to verify that the code follows your standards and is ready for production.</p>
</li>
</ul>
<h3 id="heading-summary-of-workflow">Summary of Workflow</h3>
<p>In short: you use the Agent Manager to authorize the agent to run terminal commands and create folders, and you click the 'Open editor' button to actually see, explore, and edit the resulting files.</p>
<h2 id="heading-step-3-implement-the-glassmorphic-main-screen">Step 3: Implement the Glassmorphic Main Screen</h2>
<p>Now it’s time to create the beautiful UI. Glassmorphism relies on the <code>BackdropFilter</code> widget and <code>ClipRRect</code> to create that "frosted glass" effect. We want a central progress ring that shows how much water we’ve had and that feels physical and tactile.</p>
<p>Paste in the following prompt:</p>
<pre><code class="lang-markdown">Implement the main water tracking screen with glassmorphism design.

Detailed requirements:
<span class="hljs-bullet">-</span> Large central circular progress ring (frosted glass style, blurred background visible through it)
<span class="hljs-bullet">-</span> Big floating "+" button with glass effect and subtle glow on tap
<span class="hljs-bullet">-</span> Current intake text in large, elegant font
<span class="hljs-bullet">-</span> Glassmorphic card below showing "X glasses · Y ml of 2000 ml"
<span class="hljs-bullet">-</span> Scrollable history list in frosted cards
<span class="hljs-bullet">-</span> Empty state with calming illustration/text
<span class="hljs-bullet">-</span> Smooth fill animation on progress ring when adding water

Before coding:
<span class="hljs-bullet">1.</span> Plan artifact with:
<span class="hljs-bullet">   -</span> Glassmorphism implementation approach (BackdropFilter, ClipRRect, etc.)
<span class="hljs-bullet">   -</span> Widget hierarchy
<span class="hljs-bullet">   -</span> Animation details
<span class="hljs-bullet">2.</span> Ask approval.

After approval:
<span class="hljs-bullet">-</span> Generate code
<span class="hljs-bullet">-</span> Hot reload
<span class="hljs-bullet">-</span> Provide video walkthrough showing:
<span class="hljs-bullet">   -</span> Adding water multiple times
<span class="hljs-bullet">   -</span> Progress ring filling with glass effect
<span class="hljs-bullet">   -</span> History cards appearing
</code></pre>
<p>If all looks good, approve the plan. The agent should build a stunning glassmorphic interface. Tap “+” and watch the ring fill with a silky animation through the frosted glass.</p>
<p></p>
<h2 id="heading-step-4-add-persistence-and-daily-logic">Step 4: Add Persistence and Daily Logic</h2>
<p>An app is only useful if it remembers your data. We’ll use <code>shared_preferences</code> for simple local storage. We also need logic that checks the current date and resets the counter to zero at midnight.</p>
<p>We’ll now ask the agent to add persistence and daily reset logic using shared_preferences.</p>
<p>The app should save the intake and the last reset date. Before implementing, explain how the midnight reset check will be triggered.</p>
<p>Use this prompt:</p>
<pre><code class="lang-markdown">Add persistence and daily reset.

Requirements:
<span class="hljs-bullet">-</span> Use shared<span class="hljs-emphasis">_preferences to save intake and last reset date
- Auto-reset to 0 ml at midnight
- Preserve today's history until reset
- Simple settings dialog to change daily goal

Before changes:
1. Plan with storage and reset logic
Ask approval.

After:
- Implement
- Test app close/reopen
- Video: add water → close → reopen → data persists</span>
</code></pre>
<p>Review the agent’s logic for the reset. A common pitfall is only checking the date when the app first opens – so make sure that the agent accounts for the app staying open in the background overnight.</p>
<p>Your progress now survives:</p>
<p></p>
<h2 id="heading-step-5-add-on-device-smart-reminders-with-gemma">Step 5: Add On-Device Smart Reminders with Gemma</h2>
<p>The most advanced feature of our Water Tracker is the smart reminder system powered by <strong>Gemma 3n</strong>. Unlike traditional reminders that use static, repetitive text, these reminders are generated dynamically to keep the user engaged and motivated. The primary goal of these reminders is to track the user's progress against their daily hydration goal and provide personalized nudges that ensure they stay on schedule throughout the day.</p>
<p>To achieve this, we’ll use Gemma 3n, which is a specialized variant of Google’s open-weight model family designed specifically for on-device performance. Gemma 3n acts as our AI Hydration Coach by analyzing the user's current intake status. For example, it notices if a user has only consumed 500ml out of their 2000ml goal by mid-afternoon. It then uses this context to generate a friendly, unique message.</p>
<p>We’re using Gemma 3n here for several critical reasons:</p>
<ul>
<li><p><strong>Privacy and data sovereignty</strong>: Because Gemma 3n runs fully locally on the user's phone, no personal health data or daily habits ever leave the device, providing a "privacy-first" experience where no data leaks to the cloud.</p>
</li>
<li><p><strong>Next-generation architecture</strong>: Gemma 3n uses the same architecture as the latest Gemini Nano, which allows it to offer incredible speed and efficiency while maintaining a minimal footprint on the device's battery and memory.</p>
</li>
<li><p><strong>Native multimodal support</strong>: This model is unique because it features native audio support for the first time, meaning that while we are currently using it for text notifications, the app is future-proofed for voice-based logging and interaction.</p>
</li>
</ul>
<p>Copy and paste in this prompt:</p>
<pre><code class="lang-markdown">Add on-device hydration reminders using Gemma.

Requirements:
<span class="hljs-bullet">-</span> Use flutter<span class="hljs-emphasis">_gemma or similar 2025 package for Gemma 3n
- Every 2 hours, check progress
- If behind schedule, show local notification: friendly, motivational message like "Time for a refreshing glass! You've had X of Y ml today."
- Use simple on-device prompt for variety
- Toggle in settings
- Privacy badge: "Reminders powered locally"

Before implementation:
1. Plan with package, notification setup, and timing logic
Ask approval.

After:
- Implement
- Test (simulate time or wait)
- Video showing notification appearing.</span>
</code></pre>
<p>You should verify that the agent is not making frequent, battery-draining calls to the model. The reminders should be scheduled efficiently using background tasks.</p>
<p></p>
<p>To test the AI Hydration Coach:</p>
<ol>
<li><p>Go to Settings (gear icon).</p>
</li>
<li><p>Enable the "AI Hydration Coach" toggle.</p>
</li>
<li><p>You should receive a simulated notification immediately with a motivational message like: <em>"Hydration Buddy 💧: Stay hydrated! You're at X% of your daily goal."</em></p>
</li>
</ol>
<h2 id="heading-step-6-final-polish">Step 6: Final Polish</h2>
<p>To finish the app, we will add micro-interactions – the small details that make an app feel premium. This includes a confetti celebration when the daily goal is met and a wave animation for the empty state.</p>
<p>Use this prompt:</p>
<pre><code class="lang-markdown">Add final polish:
<span class="hljs-bullet">-</span> Confetti explosion when reaching 100% goal
<span class="hljs-bullet">-</span> Glassmorphic settings screen
<span class="hljs-bullet">-</span> Better empty state with subtle wave animation
<span class="hljs-bullet">-</span> Optimize performance

Implement one at a time with quick video updates.
</code></pre>
<p>Run the app on your phone. Add water throughout the day and enjoy the glassmorphic beauty, gentle reminders, and celebration when you hit your goal.</p>
<p></p>
<p>Then go back to your app and click the ‘+’ button to get the results. After you get a score of 100%, the confetti will be visible:</p>
<p></p>
<h3 id="heading-reviewing-the-final-changes">Reviewing the Final Changes</h3>
<p>As the agent works, use the <strong>'Open editor'</strong> button (the <code>< ></code> icon) to inspect the new animations. When checking the performance optimization, look for the agent's use of <code>RepaintBoundary</code> around the glassmorphic layers. This is a key indicator that the agent is following high-performance Flutter standards rather than just writing simple code.</p>
<p>Once every micro-interaction is verified, your Water Tracker is ready for primetime. Run it on your device, log your water throughout the day, and enjoy the combination of glassmorphic beauty, privacy-first reminders, and the celebration of your health goals.</p>
<h2 id="heading-antigravity-quota-limit"><strong>Antigravity Quota Limit</strong></h2>
<p>If your favorite model notifies you that it's reached its quota limit, you can switch to another model before the limit resets. As you can see in my screenshot, my favorite Gemini 3 Pro won’t be available until 8:26 PM, so I'll select another model from the drop-down menu to use before then.</p>
<p></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this tutorial, you built a helpful habit-tracking app using <strong>agentic development</strong>.</p>
<p>You learned about:</p>
<ul>
<li><p>Managing workspaces in Antigravity</p>
</li>
<li><p>Writing detailed, plan-first prompts</p>
</li>
<li><p>Creating glassmorphism designs</p>
</li>
<li><p>Integrating on-device AI with Gemma</p>
</li>
<li><p>Rapid, high-quality prototyping</p>
</li>
</ul>
<p>This is how modern Flutter development feels: focused on creativity, not boilerplate.</p>
<p>Happy vibe coding!</p>
 
</article>
<article>
<h1> Figma MCP vs Kombai: Cloning the Front End from Figma with AI Tools </h1>
<p>Shrijal Acharya — Mon, 08 Dec 2025 16:58:51 +0000</p>
 <p>Frontend automation is moving fast. Tools like Figma MCP and Kombai can read design context and generate working UI code. I wanted to see what you actually get in practice, so I decided to compare them.</p>
<p>Figma MCP exposes design metadata to AI clients, while Kombai is a frontend-first agent that integrates with editors and existing stacks.</p>
<p>In this article, we’ll feed the same two Figma files into both tools, review how close the output is to the designs, and look at the code structure in a real editor.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-whats-the-deal">What's the Deal?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-meet-the-tools">Meet the Tools</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-kombaihttpskombaicom">Kombai</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-figma-mcphttpswwwfigmacomblogintroducing-figma-mcp-server">Figma MCP</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-frontend-comparison-with-figma">Frontend Comparison with Figma</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-test-1-simple-portfolio-design">Test 1: Simple Portfolio Design</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-figma-mcp">Figma MCP</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-kombai">Kombai</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-test-2-complex-learning-dashboard">Test 2: Complex Learning Dashboard</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-figma-mcp-1">Figma MCP</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-kombai-1">Kombai</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-what-you-should-know-before-using-these-tools">What You Should Know Before Using These Tools</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-final-verdict-and-whats-next">Final Verdict and What's Next?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-whats-the-deal">What's the Deal?</h2>
<p>Cloning complex Figma designs by hand isn’t fun anymore, nor is writing your CSS line by line with exact precision.</p>
<p>And sure, you can attach a screenshot or whatever to GPT, but it often ends up with something that barely looks like your design. That's where Kombai or the Figma MCP come in.</p>
<p>They actually get your Figma design metadata and give you frontend code that's super close to the real thing.</p>
<p>So now, instead of spending hours rebuilding what's already in your design file, you can focus more on small tweaks and what actually matters.</p>
<h2 id="heading-meet-the-tools">Meet the Tools</h2>
<h3 id="heading-kombaihttpskombaicom"><a target="_blank" href="https://kombai.com/">Kombai</a></h3>
<p></p>
<p>Kombai is an AI agent designed for frontend work. It takes input from Figma (like text, images, or your existing code), understands your stack, and converts it into clean, production-ready UI.</p>
<p>💡 It’s made specifically for frontend work, so you can expect it to be very good at that (unlike more generic tools like ChatGPT or Claude).</p>
<p>Kombai also handles large repositories easily. It doesn't just convert Figma designs into code. It actually understands your entire frontend codebase, even if it's huge.</p>
<p>So, even if you're working on a small side project or a very large production app, it can read, change, and write code that fits perfectly into your existing project.</p>
<p><strong>Note:</strong> Kombai isn’t just good at cloning Figma designs and writing clean code. It actually understands your whole repo, too. You can chat with it like GPT, but it already knows your frontend. It can help refactor code, clean things up, or make changes without ever touching your backend logic.</p>
<p>Pretty handy, right?</p>
<p>No backend code is ever touched, which ensures none of your business logic is mistakenly changed.</p>
<p>You can also add Kombai right inside your editor. It works with VSCode, Cursor, Windsurf, and Trae. Just grab it from the extension marketplace, launch it, and you’re ready to go.</p>
<p>With Kombai, you can:</p>
<ul>
<li><p>Turn Figma designs into code (React, HTML, CSS, and so on) using the component library your project already uses.</p>
</li>
<li><p>Work with a frontend-smart engine that understands 30+ libraries including Next.js, MUI, and Chakra UI.</p>
</li>
<li><p>Stay in your editor, follow your own conventions, and ship faster with good accuracy.</p>
</li>
<li><p>And most importantly, preview the changes in a sandbox so you can approve or reject the change before committing it to the files.</p>
</li>
</ul>
<p>You can be up and running in under a minute. Here are the steps to get started:</p>
<ul>
<li><p>Install the extension for your editor</p>
</li>
<li><p>Sign in and connect your project</p>
</li>
<li><p>Paste a Figma link or describe what you want to build</p>
</li>
<li><p>Review the output and commit your code</p>
</li>
</ul>
<p>You can find it in the Extension marketplace of your IDE.</p>
<p></p>
<p>Now, using it is just as simple as accessing it from the left sidebar and having a chat similar to how you would with ChatGPT. (Optionally, you can add your tech stack, but Kombai handles it automatically.)</p>
<p></p>
<p>Head to the <a target="_blank" href="https://docs.kombai.com/get-started/welcome">docs</a> to get started and find the setup for your editor.</p>
<p><strong>Pricing Note</strong>: Kombai is a paid tool but gives you a free plan with 300 credits per month, which is great for personal projects. For more advanced workflows, you can move up to the Pro plan or the Enterprise plan.</p>
<p>If you spend most of your time on the frontend, Kombai may be a good fit.</p>
<h3 id="heading-figma-mcphttpswwwfigmacomblogintroducing-figma-mcp-server"><a target="_blank" href="https://www.figma.com/blog/introducing-figma-mcp-server/">Figma MCP</a></h3>
<p>Figma MCP (Model Context Protocol) lets AI agents connect directly to your Figma files. It closes the gap between your designs and your AI tools by giving them structured access to real design data instead of relying on screenshots or rough estimates.</p>
<p>It works by exposing your design's node tree, styles, layout rules, and component structure so the model can build the UI with actual design data.</p>
<p>That means tools like Claude Code, Gemini CLI, Cursor, and VSCode can actually <strong>read your designs</strong>, including layers, components, colors, spacing, and text, and use that context to generate accurate, production-ready code or design updates.</p>
<p>With Figma MCP, you can:</p>
<ul>
<li><p>Let AI tools pull live data from your Figma files, so your code suggestions always match your latest designs</p>
</li>
<li><p>Ask your AI assistant to inspect components, layouts, or styles directly from Figma</p>
</li>
<li><p>Generate UI code that reflects real design and structure instead of guessing from an image</p>
</li>
<li><p>Keep designers and developers in sync without constantly sending files back and forth.</p>
</li>
</ul>
<p>Setting it up is simple:</p>
<ul>
<li><p>Run the Figma MCP server locally</p>
</li>
<li><p>Authorize your Figma workspace</p>
</li>
<li><p>Connect your editor or AI tool (Cursor, Claude Code, Gemini CLI, and so on)</p>
</li>
</ul>
<p>For this test, I'll be using Figma MCP inside Claude Code in Linux, and setting it up is as simple as adding the following JSON in your Claude configuration file <code>~/.claude.json</code>:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"mcpServers"</span>: {
    <span class="hljs-attr">"Framelink MCP for Figma"</span>: {
      <span class="hljs-attr">"command"</span>: <span class="hljs-string">"npx"</span>,
      <span class="hljs-attr">"args"</span>: [<span class="hljs-string">"-y"</span>, <span class="hljs-string">"figma-developer-mcp"</span>, <span class="hljs-string">"--figma-api-key=YOUR-KEY"</span>, <span class="hljs-string">"--stdio"</span>]
    }
  }
}
</code></pre>
<p>For Windows users:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"mcpServers"</span>: {
    <span class="hljs-attr">"Framelink MCP for Figma"</span>: {
      <span class="hljs-attr">"command"</span>: <span class="hljs-string">"cmd"</span>,
      <span class="hljs-attr">"args"</span>: [<span class="hljs-string">"/c"</span>, <span class="hljs-string">"npx"</span>, <span class="hljs-string">"-y"</span>, <span class="hljs-string">"figma-developer-mcp"</span>, <span class="hljs-string">"--figma-api-key=YOUR-KEY"</span>, <span class="hljs-string">"--stdio"</span>]
    }
  }
}
</code></pre>
<p><strong>Pricing Note</strong>: To use Figma MCP, you need to have a paid Figma plan, either Professional, Organization, or Enterprise. But there's a community-maintained open-source MCP server, <a target="_blank" href="https://github.com/GLips/Figma-Context-MCP">Figma-Context-MCP</a>, that you can test out for free – which I'll be using for this test.</p>
<p>Once it’s running, any MCP-supported tool can understand your design files, making frontend coding development much more accurate.</p>
<p>Check the <a target="_blank" href="https://help.figma.com/hc/en-us/articles/32132100833559-Guide-to-the-Figma-MCP-server">Figma MCP Guide</a> to get started.</p>
<h2 id="heading-frontend-comparison-with-figma">Frontend Comparison with Figma</h2>
<p>For this test, we'll be comparing Kombai with Figma MCP using two Figma designs: one is a simple portfolio design, and the other is a more complex learner dashboard.</p>
<p><strong>NOTE:</strong> For this test with Figma MCP, I'll be using Sonnet 4, which, in my experience, has been the best model for coding the frontend. I've also tested with the recent GPT-5 and Opus 4, but Sonnet 4 seems to be the best for frontend work. If you want to try other models, feel free to do so and see if you notice much difference in the results.</p>
<blockquote>
<p>💁 <strong>Prompt</strong>: Clone this Figma design from this Figma frame link attached. Write clean, maintainable, and responsive code that matches the design closely. Keep components simple, reusable, and production-ready.</p>
</blockquote>
<p><strong>Quick note about the videos in the next section:</strong> The demo recordings are pretty long because I kept them raw. The idea is to show how the tools behave in real time. If you only care about the final output, feel free to skip to the end of each video.</p>
<h2 id="heading-test-1-simple-portfolio-design">Test 1: Simple Portfolio Design</h2>
<p>Let's start with a simpler design that doesn't have much going on in the UI.</p>
<p>You can find the Figma design template here: <a target="_blank" href="https://www.figma.com/design/ikqgqDYKWsM6OXwdz1IFCp/Personal-Portfolio-Website-Template--Community---Copy-?node-id=0-1&t=HBdIdagaA7tSxpoV-1">Personal Portfolio Template</a></p>
<h3 id="heading-figma-mcp">Figma MCP</h3>
<p>Here's the response from Figma MCP:</p>
<div class="embed-wrapper">
        </div>
<p> </p>
<p>This is pretty decent. The overall UI looks good, and the colors and fonts are all accurate. The biggest visual issues are with the hero image and a few icon placements, which are a bit off compared to the original Figma file.</p>
<p>The overall implementation took just about 5 minutes of coding and achieved this entire result in one go, as you see in the video demo. The time it takes isn't really dependent on the MCP itself but mostly on the model, so the timings will vary based on the model you choose to work with. The timing is something you can simply ignore here.</p>
<p>The whole page is split into sensible components (<code>Header</code>, <code>Hero</code>, <code>Projects</code>, <code>ProjectCard</code>, <code>Footer</code>) and composed in a clean <code>page.tsx</code>.</p>
<pre><code class="lang-tsx">export default function Home() {
  return (
    <div className="min-h-screen bg-bg-gray">
      <Header />
      <main>
        <Hero />
        <Projects />
      </main>
      <Footer />
    </div>
  );
}
</code></pre>
<p>That is a nice, readable starting point for a Next app.</p>
<p>You can find the code it generated <a target="_blank" href="https://gist.github.com/shricodev/285295e78ebc41db37d0b65277abbe09">here</a>.</p>
<p>But here are some issues I noticed right away:</p>
<ol>
<li>The hero decoration is positioned with pretty brittle absolute values:</li>
</ol>
<pre><code class="lang-tsx"><div className="hidden lg:block absolute right-0 top-0 w-[720px] h-[629px] pointer-events-none">
  <div className="relative w-full h-full">
    <div className="absolute left-0 top-0 w-[777px] h-[877px] -translate-y-[248px] bg-brand-yellow" />
    <div className="absolute left-0 top-0 w-full h-full">
      
    </div>
  </div>
</div>
</code></pre>
<p>This achieves the desired look at one screen size, but it can easily become misaligned when you resize. When compared side by side with the Figma frame, the hero image and yellow shape do not align as they should.</p>
<ol start="2">
<li>Fixed Header</li>
</ol>
<p>For a simple portfolio page with a short hero, a fixed header is not always worth the complexity.</p>
<p>The problem here is that since the header is fixed to the top, the rest of the content also starts from the top. On smaller devices, this might cover parts of the content when scrolling.</p>
<pre><code class="lang-tsx">return (
  <header className="fixed top-0 left-0 right-0 bg-bg-gray z-50 h-14">
    {/* ... */}
    <button
      onClick={() => scrollToSection("about")}
      className="font-raleway ..."
    >
      About
    </button>
    {/* more buttons */}
  </header>
);
</code></pre>
<p>This is still a great head start, though it is not quite at the level where I would add it to a production repo without tidying up some of the layout changes.</p>
<h3 id="heading-kombai">Kombai</h3>
<p>Here's the response from Kombai:</p>
<div class="embed-wrapper">
        </div>
<p> </p>
<p>Visually, this one is extremely close to the Figma template. Apart from the hero image being slightly off from the Figma design, I see no other differences. It actually feels like the design is exactly copy-pasted.</p>
<p>Notice that the font, images, and icons are exactly the same, which to me is insane.</p>
<p>You can find the code it generated <a target="_blank" href="https://gist.github.com/shricodev/41fdf0596f312573e0efd44a30b5b36b">here</a>.</p>
<p>Here are the specific things it does better in this simple example.</p>
<ol>
<li>It mirrors the Figma typography and colors as real tokens</li>
</ol>
<p>Kombai sets up <code>globals.css</code> with Figma-like tokens and even defines utility classes for the text styles:</p>
<pre><code class="lang-css"><span class="hljs-selector-pseudo">:root</span> {
  <span class="hljs-comment">/* ... */</span>
}

<span class="hljs-keyword">@theme</span> inline {
  <span class="hljs-comment">/* ... */</span>
}

<span class="hljs-keyword">@utility</span> text-heading-large {
  <span class="hljs-comment">/* ... */</span>
}

<span class="hljs-keyword">@utility</span> text-subtitle {
  <span class="hljs-comment">/* ... */</span>
}
</code></pre>
<p>That is very similar to how a designer would set up styles in Figma, and it means you can reuse these utilities in new screens instead of retyping Tailwind font sizes everywhere.</p>
<ol start="2">
<li>Components are cleaner and more reusable</li>
</ol>
<p>All the other components, like <code>Hero</code> or some smaller button components, use the same styles set up in <code>styles.css</code>.</p>
<pre><code class="lang-tsx">const baseClasses =
  "text-button px-6 py-3 rounded-sm transition-all hover:opacity-90";

const variantClasses =
  variant === "primary"
    ? "bg-(--primary-yellow) text-(--foreground)"
    : "bg-transparent border-2 border-(--foreground) text-(--foreground) hover:bg-(--foreground) hover:text-white";
</code></pre>
<p>The footer pulls each icon into its own component:</p>
<pre><code class="lang-tsx">import InstagramIcon from "./icons/InstagramIcon";
import LinkedInIcon from "./icons/LinkedInIcon";
import MailIcon from "./icons/MailIcon";
</code></pre>
<p>In practice, that means if the designer swaps the mail icon or tweaks the size, there is a single place to update it.</p>
<p>So for this simple test, Kombai’s output is both closer to the visual design and a bit nicer structurally for a real project. I would still tweak naming and some minor details, but I would happily keep most of this as is. How crazy is that?</p>
<h2 id="heading-test-2-complex-learner-dashboard">Test 2: Complex Learner Dashboard</h2>
<p>So, for the second one, let's create a slightly more complex design with a lot happening in the UI.</p>
<p>You can find the Figma design template here: <a target="_blank" href="https://www.figma.com/design/hATPCahjQRzz0dXao2QH1U/Dashboard---Online-Learning-Profile--Community-?node-id=10-1626&t=sn9rVXVzXlzzdusd-0">Learning Dashboard</a></p>
<h3 id="heading-figma-mcp-1">Figma MCP</h3>
<p>Here's the response from Figma MCP:</p>
<div class="embed-wrapper">
        </div>
<p> </p>
<p>This is good, considering the complexity of the design. It’s able to put all the images and assets in place. This is much better than what I expected. But there's a slight inconsistency in the placement of images between the original design and the implementation, as you can see for yourself.</p>
<p>If I compare the time, this got it done super fast, in just about <strong>8 minutes</strong>, whereas Kombai took over 15 minutes to get it done (but with a better result).</p>
<p>You can find the code it generated <a target="_blank" href="https://gist.github.com/shricodev/a15cbff76f4256a20fa098d69f5b4661">here</a>.</p>
<p>Here's what I like and dislike about a few things it did here:</p>
<ol>
<li>Great smaller components, but everything is still quite page-centric</li>
</ol>
<p>It does break things into logical components like <code>Sidebar</code>, <code>Input</code>, <code>Button</code>, <code>StatCard</code>, <code>CourseCard</code>, and <code>Icons</code>. The main page then stitches them together:</p>
<pre><code class="lang-tsx">export default function Home() {
  const mentors = [
    {
      id: 1,
      name: "John Doe",
      subject: "UI/UX Design",
      color: "bg-purple-500",
    },
    // ...
  ];

  return (
    <div className="flex items-center gap-8 w-full max-w-[1440px] h-[933px] bg-white rounded-[20px] mx-auto overflow-hidden">
      {/* Sidebar */}
      <Sidebar />

      {/* Main content */}
      <main className="flex flex-col items-center gap-6 pt-5 pb-0 flex-1 h-full overflow-hidden">
        {/* Search, hero, cards, mentor table */}
      </main>
    </div>
  );
}
</code></pre>
<p>The separation into components is nice, but everything is still wired directly inside one big page component with inline mock data. For a real app, I would want that data in its own module, ideally typed, so it is not mixed with layout logic.</p>
<ol start="2">
<li>Hard-coded dimensions tied to the original frame</li>
</ol>
<p>The outer container is pinned to a specific height:</p>
<pre><code class="lang-tsx"><div className="flex items-center gap-8 w-full max-w-[1440px] h-[933px] bg-white rounded-[20px] mx-auto overflow-hidden">
</code></pre>
<p>That’s fine if you are literally recreating a 1440 by 933 frame for a screenshot, but in a live app, it means:</p>
<ul>
<li><p>You get weird empty space on taller screens.</p>
</li>
<li><p>Anything that grows vertically (longer course titles, more mentors) will either overflow or get clipped.</p>
</li>
</ul>
<p>The hero banner has the same kind of pixel-exact positioning:</p>
<pre><code class="lang-tsx"><div className="relative w-full h-[181px] bg-primary rounded-[20px] overflow-hidden">
  <Image
    src="/images/star1.svg"
    alt="Star"
    width={80}
    height={80}
    className="absolute top-[45px] left/[683px] opacity-25"
  />
  {/* four more star images with fixed top/left */}
</div>
</code></pre>
<p>This is great for matching the specific Figma design, but as soon as the width changes, these positions stop lining up perfectly.</p>
<p>So overall, I would call this result surprisingly good for a single prompt, but a bit rigid and template-like once you start thinking about real data and using it in production.</p>
<h3 id="heading-kombai-1">Kombai</h3>
<p>Here's the response from Kombai:</p>
<div class="embed-wrapper">
        </div>
<p> </p>
<p>You will see in the video that I had to fix a small error with an extra prompt, but after that, it produced a fully working dashboard. The visual match is very strong, given how complex the layout is.</p>
<p>You can find the code it generated <a target="_blank" href="https://gist.github.com/shricodev/bc86951ed09c2b3ef6500cc40f3c0b0b">here</a>.</p>
<p>Here is what stands out compared to the MCP output.</p>
<ol>
<li>It treats the Figma file like a real product, not just a static screen.</li>
</ol>
<p>Instead of wiring everything in a single page with inline arrays, Kombai creates proper domain types and a <code>mock-data.ts</code>:</p>
<pre><code class="lang-tsx">import { UserProfile, Friend, Course, ProgressCard, Mentor } from "./types";

export const courses: Course[] = [
  {
    id: "1",
    title: "Beginner's Guide to becoming a professional frontend developer",
    category: "Frontend",
    thumbnail: "/images/course-coding.jpg",
    instructor: {
      name: "Prashant Kumar singh",
      role: "software Developer",
      avatar: "/images/avatar-prashant.jpg",
    },
  },
  // ...
];
</code></pre>
<p>That looks much closer to what you would expect in a production codebase: clear types, data separated from layout, and a page component that just composes everything.</p>
<ol start="2">
<li>Better mapping of the smaller UI pieces</li>
</ol>
<p>The course card is similar to the MCP one, but now it is fully driven by a <code>Course</code> object:</p>
<pre><code class="lang-tsx">export function CourseCard({ course }: { course: Course }) {
  return (
    <div className="flex flex-col gap-2.5 rounded-[20px] bg-white shadow-[0px_14px_42px_rgba(8,15,52,0.06)] overflow-hidden min-w-[268px]">
      <div className="relative">
        <Image
          src={course.thumbnail}
          alt={course.title}
          width={244}
          height={113}
          className="w-full h-28 object-cover rounded-t-xl"
        />
        <button className="absolute top-3 right-3 w-2 h-2 bg-white rounded-full" />
      </div>
      <div className="px-3 pb-4 flex flex-col gap-2.5">
        <span className="text-[8px] font-normal uppercase text-primary px-3 py-1 bg-purple-50 rounded w-fit">
          {course.category}
        </span>
        <p className="text-[14px] font-medium text-text-primary leading-tight">
          {course.title}
        </p>
        <div className="w-full h-1.5 bg-gray-100 rounded-full overflow-hidden">
          <div
            className="h-full bg-primary rounded-full"
            style={{ width: "60%" }}
          />
        </div>
        {/* instructor avatar and name */}
      </div>
    </div>
  );
}
</code></pre>
<p>The structure and text styles are very close to the original design, and because the card is fully data-driven, you can plug in real data without touching the JSX.</p>
<ol start="3">
<li>Design tokens and typography utilities again</li>
</ol>
<p>Just like in the portfolio example, Kombai sets up a proper token layer for the dashboard:</p>
<pre><code class="lang-css"><span class="hljs-selector-pseudo">:root</span> {
  <span class="hljs-comment">/* ... */</span>
}

<span class="hljs-keyword">@utility</span> heading-section {
  <span class="hljs-comment">/* ... */</span>
}

<span class="hljs-keyword">@utility</span> text-caption {
  <span class="hljs-comment">/* ... */</span>
}
</code></pre>
<p>The components then reuse these utilities, which keeps the code close to the design system instead of scattering font sizes and colors everywhere.</p>
<ol start="4">
<li>Things I would still tweak</li>
</ol>
<p>It is not perfect:</p>
<ul>
<li><p>The Next <code>layout.tsx</code> is still using the default Geist fonts and “Create Next App” metadata, so you would want to align that with the Inter font and real app title.</p>
</li>
<li><p>Some of the mock data has inconsistent casing in names and roles, which you would clean up in a real project.</p>
</li>
<li><p>The play button on the course card is just a white dot button for now, so you would still plug in the real icon.</p>
</li>
</ul>
<p>But even with those issues, it is very close to something I would actually keep in a production repo after a quick pass.</p>
<p>Now, this is not as perfect as the previous Kombai implementation, and it did not run into errors. But considering how complex this design is, with multiple different cards with images and all, it's still really impressive to me.</p>
<p>For this one, it took a bit longer to code, but in my opinion, the extra time was worth it.</p>
<p>Imagine you're building something similar and get a response this good already. Then it's not that big of a deal to iterate a little bit, right? You don't have to start from scratch. Just make a few changes if required, and you're done.</p>
<h2 id="heading-what-you-should-know-before-using-these-tools">What You Should Know Before Using These Tools</h2>
<p>As good as these tools are, they’re not something you can just trust blindly. They’ll get you off to a solid start, but you’ll still need to tweak a few things before calling it production-ready.</p>
<p><strong>Kombai</strong> does a great job cloning Figma designs and writing clean, modular code. It breaks components into smaller files and generally follows good structure.</p>
<p>The only issue I noticed is that it sometimes slips on naming conventions. Since it scans your entire codebase to stay consistent with your setup, it can be a bit slower to generate code, but that’s also what makes it smarter. You’re not just getting a Figma cloner, you’re getting an assistant that actually understands your frontend.</p>
<p><strong>Figma MCP</strong> is fast and does a decent job matching the UI, although the results depend a lot on the model you use for generation. If your main goal is to clone Figma designs quickly and you don’t mind refining the output, it’s a good option.</p>
<p>In short, both tools can save you a ton of time, but they’re not plug-and-play replacements for a frontend workflow. Treat them as part of your toolkit, and you’ll get the best results.</p>
<h2 id="heading-final-verdict-and-whats-next">Final Verdict, and What's Next?</h2>
<p>Now that you’ve got the gist of what these tools can do, go ahead and try them out. You can turn your Figma designs into working frontends in just a few minutes without all the endless play with CSS.</p>
<p>To sum up, here’s the quick rundown:</p>
<ul>
<li><p>If you want production-ready code that actually looks like your Figma design and you mostly live in VS Code, Cursor, or any GUI IDE, go with Kombai. It nails the details and even understands your codebase, which is completely missing in Figma MCP.</p>
</li>
<li><p>If you just want to clone a Figma design quickly and don’t mind if things are <em>slightly</em> off, Figma MCP is totally fine. It gets the job done pretty well.</p>
</li>
</ul>
<p>Basically, choose Kombai if you care about precision and code quality with codebase understanding.</p>
<p>Choose Figma MCP if you want something quick, that <em>works</em> and looks decent enough. 🤷‍♂️</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>So, what do you think? Pretty cool, right? This was a fun little experiment to see how close tools like Figma MCP and Kombai can get to cloning real frontends straight from Figma.</p>
<p>If you’re into building frontends and want to save yourself a few hours of CSS pain, definitely give them a try. Just don’t expect them to be perfect in one try – their output still needs review and likely a little refining.</p>
<p>That’s all for this one. Thank you for reading! ✌️</p>
 
</article>
<article>
<h1> How the Model Context Protocol Works </h1>
<p>Manish Shivanandhan — Fri, 24 Oct 2025 19:48:54 +0000</p>
 <p>The world of artificial intelligence is moving fast. Every week, it seems like there’s a new tool, framework, or model that promises to make AI better.</p>
<p>But as developers build more AI applications, one big problem keeps showing up: the lack of context.</p>
<p>Each tool works on its own. Each model has its own memory, its own data, and its own way of understanding the world. This makes it hard for different parts of an AI system to talk to each other.</p>
<p>That’s where Model Context Protocol, or MCP, comes in.</p>
<p>It is a new standard for how AI tools share context and communicate. It allows large language models and <a target="_blank" href="https://www.turingtalks.ai/p/how-an-ai-agent-works">AI agents</a> to connect with external data sources, apps, and tools in a structured way.</p>
<p>MCP is like the missing piece that helps AI systems work together instead of apart.</p>
<p>MCP is becoming one of the most important ideas in modern AI development. In this article, you’ll learn how the MCP connects AI tools and data sources, making modern AI apps smarter, faster, and far easier to build.</p>
<h2 id="heading-table-of-contents"><strong>Table of Contents</strong></h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-the-problem-with-disconnected-ai-tools">The Problem with Disconnected AI Tools</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-model-context-protocol">What is Model Context Protocol</a>?</p>
</li>
<li><p><a class="post-section-overview" href="#heading-from-plugins-to-protocols">From Plugins to Protocols</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-making-ai-apps-smarter">Making AI Apps Smarter</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-making-ai-apps-faster-and-simpler">Making AI Apps Faster (and Simpler)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-bigger-picture">The Bigger Picture</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-the-problem-with-disconnected-ai-tools"><strong>The Problem with Disconnected AI Tools</strong></h2>
<p>Imagine you’re building a customer support chatbot using a large language model like GPT. The model can generate great responses, but it doesn’t know anything about your actual customers.</p>
<p>To make it useful, you connect it to your CRM so it can look up customer records. Then you connect it to your ticketing system to see open cases. You might also connect it to a knowledge base for reference.</p>
<p>Each of these integrations is a separate task. You write custom API calls, format responses, manage authentication, and handle errors. Every new data source means more glue code. The LLM doesn’t naturally know how to interact with these systems.</p>
<p>Now imagine you have five or ten such tools like your AI assistant, your search engine, your summarization tool, and a few automation scripts. Each one stores information in a different way.</p>
<p>None of them share context. If one model learns something about a user’s intent, the others can’t use it. You end up with silos of intelligence instead of a connected ecosystem.</p>
<p>This is the problem that MCP was built to solve.</p>
<h2 id="heading-what-is-model-context-protocol"><strong>What is Model Context Protocol?</strong></h2>
<p>Model Context Protocol is a standard that defines how AI systems should exchange context. It was introduced to make it easier for models, tools, and environments to communicate in a predictable way. You can think of it as an “API for AI context.”</p>
<p></p>
<p>At its core, MCP allows three types of communication:</p>
<ol>
<li><p>Models can request context from external tools or data sources.</p>
</li>
<li><p>Tools can send updates or new information back to the model.</p>
</li>
<li><p>Both can share metadata about what they know and how they can help.</p>
</li>
</ol>
<p>This sounds technical, but the outcome is simple. It makes AI apps more aware of their environment.</p>
<p>Instead of manually wiring integrations, developers can rely on a shared protocol that defines how everything fits together.</p>
<h2 id="heading-from-plugins-to-protocols"><strong>From Plugins to Protocols</strong></h2>
<p>To understand MCP, it helps to look back at how OpenAI handled this problem before.</p>
<p>When <a target="_blank" href="https://openai.com/index/chatgpt-plugins/">ChatGPT Plugins</a> were introduced, they allowed GPT models to access external APIs, for example, to book a flight, get weather updates, or search the web. Each plugin had its own schema that described what data it could handle and what actions it could perform.</p>
<p>MCP takes that idea further. Instead of plugins designed only for ChatGPT, MCP defines a universal language that any AI system can use. It’s like moving from private integrations to an open standard.</p>
<p>If you’ve ever worked with APIs, you can think of MCP as doing for AI what HTTP did for the web. HTTP allowed browsers and servers to communicate using shared rules. MCP allows models and tools to share context consistently.</p>
<p>Below is a pseudocode example showing how you might build a Model Context Protocol (MCP) server that exposes a SQL database as a context source to AI models.</p>
<p>This is conceptual pseudocode. It captures the flow, not specific syntax, and assumes an MCP-compatible environment where LLMs can request data from external tools via a standard interface.</p>
<p>The goals is to expose your SQL database (for example, a <code>customers</code> or <code>orders</code> table) through an MCP server so an AI model can query and understand its contents contextually. For example, you could say “Show me all pending orders.”</p>
<pre><code class="lang-plaintext">// MCP SQL Context Server Pseudocode
---

// Step 1: Initialize server and dependencies
MCPServer = new MCPServer(name="SQLContextServer")

Database = connect_to_sql(
    host="localhost",
    user="admin",
    password="password",
    database="ecommerce"
)

// Step 2: Define available context schemas
// These describe what data the server can provide
MCPServer.register_context_schema("orders", {
    "order_id": "integer",
    "customer_name": "string",
    "status": "string",
    "amount": "float",
    "created_at": "datetime"
})

// Step 3: Define request handler for context queries
MCPServer.on_context_request("orders", function(queryParams):
    sql_query = build_sql_query(
        table="orders",
        filters=queryParams.filters,
        limit=queryParams.limit or 50
    )
    results = Database.execute(sql_query)
    return MCPResponse(data=results)
)

// Step 4: Define actions (optional)
// Allows the model to perform updates, inserts, etc.
MCPServer.register_action("update_order_status", {
    "order_id": "integer",
    "new_status": "string"
}, function(args):
    Database.execute("UPDATE orders SET status = ? WHERE order_id = ?", 
                     [args.new_status, args.order_id])
    return MCPResponse(message="Order updated successfully")
)

// Step 5: Start the MCP server and listen for model requests
MCPServer.start(port=8080)
log("MCP SQL Context Server is running on port 8080")

// Example of how a model might call this server:
//
// Model -> MCPServer:
//   RequestContext("orders", filters={"status": "pending"})
//
// MCPServer -> Model:
//   [{"order_id": 42, "customer_name": "John Doe", "status": "pending", "amount": 199.99}]
</code></pre>
<p>How it works:</p>
<ol>
<li><p>The model sends a request via MCP, asking for context like <code>orders where status = 'pending'</code>.</p>
</li>
<li><p>The server translates this into a SQL query, fetches the data, and returns it as structured context.</p>
</li>
<li><p>The model now uses this context to give accurate answers, automate workflows, or make decisions (like “Send a refund email to pending orders older than 5 days”).</p>
</li>
<li><p>Optional MCP actions let the model perform safe updates, enabling bi-directional workflows (context in, actions out).</p>
</li>
</ol>
<h2 id="heading-making-ai-apps-smarter"><strong>Making AI Apps Smarter</strong></h2>
<p>Smartness in AI doesn’t only come from the size of the model. It also comes from how much relevant context the model has.</p>
<p>A small model with rich context can outperform a large one that’s unaware of its surroundings. With MCP, a model can access the right context at the right time.</p>
<p>For example, let’s say a customer support bot gets a message saying,</p>
<blockquote>
<p><strong><em>“I’m still waiting for my refund.”</em></strong></p>
</blockquote>
<p>Normally, the model might respond with a generic apology. But with MCP, it can pull the customer’s order history from a connected tool, check refund status, and reply with something like,</p>
<blockquote>
<p><em>“<strong><strong>Your refund for Order #1423 has been processed and should reach your account by Tuesday.</strong></strong>”</em></p>
</blockquote>
<p>This is possible because MCP lets the model request information from external sources using structured calls. It no longer works blindly. It works with context, making the response more relevant and accurate.</p>
<p>As more tools adopt MCP, models will become context-aware across multiple domains, from finance and healthcare to software development and education.</p>
<h2 id="heading-making-ai-apps-faster-and-simpler"><strong>Making AI Apps Faster (and Simpler)</strong></h2>
<p>Speed in AI applications isn’t just about how quickly a model generates text. True speed comes from how efficiently the system gathers, processes, and applies information.</p>
<p>Without MCP, AI systems waste time doing repetitive work like fetching data from different sources, cleaning it, and converting it into compatible formats.</p>
<p>Every new integration adds latency. Developers often build caching layers, write adapters, or batch process data just to make things run smoothly. All of this adds complexity and slows down development.</p>
<p>MCP removes much of this overhead. Because it defines a shared structure for context, models and tools can exchange data seamlessly. There’s no need to translate or reformat information, since everything speaks the same language. The result is lower latency, faster responses, and cleaner architecture.</p>
<p>Consider an example: you’re building an AI coding assistant. Without MCP, you’d need to manually connect to your file system, your Git repository, and your IDE, each requiring a different integration.</p>
<p>With MCP, all three can communicate through a single shared protocol. The assistant instantly understands where your code lives, what files have changed, and what actions it can perform.</p>
<p>This simplicity benefits not just developers but also users. With MCP, your context, your preferences, recent work, and open projects, can travel with you across different apps. It’s like having a portable memory layer for the AI world, one that keeps every tool aware of what you’re doing no matter where you go.</p>
<h2 id="heading-the-bigger-picture"><strong>The Bigger Picture</strong></h2>
<p>The rise of MCP points to a shift in how we think about AI systems. We’re moving from isolated models to connected ecosystems.</p>
<p>In the early days of the web, each site was its own island. Then came standards like HTTP and HTML, which made everything interoperable. That’s when the web truly exploded.</p>
<p>AI is at a similar point. Right now, every company is building its own stack, its own integrations, prompts, and memory systems. But that approach doesn’t scale. MCP could be the layer that connects them all.</p>
<p>Once context becomes shareable and portable, AI apps can collaborate in new ways. A writing assistant could talk to your research tool. A design bot could work with your file system. A coding assistant could coordinate with your deployment manager.</p>
<p>This kind of shared intelligence is what makes AI truly useful. It’s no longer about one model doing everything. It’s about many specialized models working together seamlessly.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>MCP is still new, but the idea behind it is powerful. By creating a shared protocol for context, it lowers the barrier for innovation.</p>
<p>Developers can focus on what their AI does, not how it connects. Companies can build products that play well with others instead of locking users into closed systems.</p>
<p>In the long run, this could lead to an open AI ecosystem, where models, tools, and data sources interact freely, much like websites do today. You could mix and match capabilities without friction.</p>
<p>The goal is not just smarter AI, but simpler AI. AI that understands what’s happening around it, reacts in real time, and works naturally with the tools you already use.</p>
<p>Model Context Protocol is a big step toward that future. It’s the bridge between intelligence and context, and it’s what will make tomorrow’s AI systems faster, more reliable, and far more human in how they understand the world.</p>
<p><em>Hope you enjoyed this article. Signup for my free AI newsletter</em> <a target="_blank" href="https://www.turingtalks.ai/"><strong><em>TuringTalks.ai</em></strong></a> <em>for more hands-on tutorials on AI. You can also find</em> <a target="_blank" href="https://manishshivanandhan.com/"><strong><em>visit my website</em></strong></a><em>.</em></p>
 
</article>
<article>
<h1> AI in Finance: Transforming Investments and Banking in the Digital Age </h1>
<p>Tatev Aslanyan — Fri, 01 Aug 2025 22:28:15 +0000</p>
 <p>Artificial Intelligence (AI) is rapidly reshaping the financial sector. As models become more powerful and infrastructure more scalable, AI has evolved from an emerging technology into a fundamental force driving competitive advantage.</p>
<p>From fraud prevention to real-time payments and smart investing, AI is unlocking major opportunities across finance. Machine learning models help identify suspicious activity faster than ever before, while also enabling hyper-personalized customer experiences. AI-driven payment systems improve transaction speed, reduce friction, and make financial services more accessible worldwide.</p>
<p>In investing and trading, predictive analytics and NLP help firms uncover market insights, assess risk, and automate decision-making. From hedge funds to robo-advisors, AI is enhancing performance and democratizing access to financial tools.</p>
<p>Globally, AI is also strengthening cross-border collaboration and compliance. Through APIs, real-time data sharing, and regulatory tech, financial institutions are creating more transparent and agile systems that operate across jurisdictions.</p>
<p>This handbook explores how AI is driving the next era of finance. Whether you're a bank executive, fintech innovator, or policy leader, you’ll find practical insights and tools to guide your organization into a smarter, data-driven future.</p>
<blockquote>
<p><strong>“You are not going to lose your job to AI, but you are going to lose your job to a developer who uses AI.”</strong></p>
<p>– Jensen Huang, CEO @NVIDIA</p>
</blockquote>
<h2 id="heading-table-of-contents">Table of Contents:</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-chapter-1-why-ai-in-finance-is-a-necessity-not-just-hype">Chapter 1: Why AI in Finance Is a Necessity – Not Just Hype</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-2-ai-in-finance-today-where-are-we-in-ai-and-innovation">Chapter 2: AI in Finance Today – Where Are We in AI and Innovation?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-3-case-studies-of-ai-in-fintech-global-use-cases-and-case-studies-of-ai-in-finance">Chapter 3: Case Studies of AI in FinTech – Global Use Cases and Case Studies of AI in Finance</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-4-data-management-in-finance-navigating-data-lakes-real-time-ingestion-security-and-cloud-platforms">Chapter 4: The Role of Data in Finance – Infrastructure, Warehousing, and Security</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-5-the-science-behind-the-models-ml-nlp-and-predictive-analytics">Chapter 5: The Science Behind the Models – ML, NLP, and Predictive Analytics</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-6-training-the-workforce-upskilling-executives-technical-and-non-technical-teams-in-fintech">Chapter 6: Training the Workforce – Upskilling Executives, Technical, and Non-Technical Teams in FinTech</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-7-ai-for-executives-ai-education-amp-enablement-in-finance-workshops-tools-services-and-training-resources">Chapter 7: Resources for Finance Executives – AI Education & Enablement in Finance: Workshops, Tools, Services, and Training Resources</a></p>
</li>
</ol>
<p>You can download the PDF Version of the eBook <a target="_blank" href="http://www.lunartech.ai/download/ai-in-finance">here</a>.</p>
<p>And you can also listen to this handbook as a podcast here:</p>
<div class="embed-wrapper">
        </div>
<p> </p>
<h2 id="heading-chapter-1-why-ai-in-finance-is-a-necessity-not-just-hype">Chapter 1: Why AI in Finance Is a Necessity – Not Just Hype</h2>
<p>The financial sector has long prided itself on being ahead of the curve when it comes to adopting new technologies. From early mainframe systems to real-time trading platforms, banks, hedge funds, and payment providers have historically been quick to embrace tools that promise greater speed, efficiency, and insight.</p>
<p>But the world has changed – and fast.</p>
<p>Today, Artificial Intelligence (AI) and data-driven technologies are redefining what innovation means in finance. From predictive risk modeling to hyper-personalized customer experiences, AI isn’t a buzzword or a future luxury. It’s a present-day requirement for survival.</p>
<h3 id="heading-the-innovation-gap-perception-vs-reality">The Innovation Gap: Perception vs. Reality</h3>
<p>It may surprise you that even in some of the world’s most digitally advanced regions, many financial institutions still rely heavily on legacy systems. Core banking infrastructure often runs on outdated technologies. Manual compliance checks, fragmented data storage, and lack of real-time analytics are still common.</p>
<p>In countries with strong financial histories, legacy often gets in the way of progress. While fintech startups sprint ahead with cloud-native, AI-first approaches, traditional banks and insurers are struggling to digitize core services, let alone lead with data.</p>
<p>This isn’t just a minor gap – it’s a growing risk. Institutions that delay digital transformation fall behind not only in customer service but in risk mitigation, fraud prevention, and investment performance.</p>
<h3 id="heading-where-innovation-is-needed">Where Innovation Is Needed</h3>
<p>AI isn’t a one-size-fits-all solution. But it offers specific, actionable advantages across nearly every domain of finance:</p>
<ul>
<li><p><strong>Retail Banking</strong>: AI improves customer service, personalizes offerings, detects fraud in real-time, and enables better credit decisions using alternative data.</p>
</li>
<li><p><strong>Investment & Asset Management</strong>: Predictive analytics help portfolio managers spot trends early. Robo-advisors offer scalable, custom investment advice. NLP tools turn earnings calls and market chatter into structured insight.</p>
</li>
<li><p><strong>Payments & Fintech</strong>: Machine learning models reduce fraud, optimize payment routing, and improve KYC/AML compliance with far greater accuracy.</p>
</li>
<li><p><strong>Insurance & Risk</strong>: AI models assess risk in real-time, automate underwriting, and help insurers respond to claims with minimal manual effort.</p>
</li>
<li><p><strong>Trading & Hedge Funds</strong>: From quant strategies using reinforcement learning to sentiment-based trading algorithms, AI has already reshaped trading floors.</p>
</li>
<li><p><strong>Compliance & Security</strong>: Natural Language Processing (NLP) automates the review of regulatory documents. Anomaly detection finds suspicious transactions that human analysts might miss.</p>
</li>
</ul>
<p>In short: AI is not a tool to consider "someday." It’s an operational backbone for today and tomorrow.</p>
<h3 id="heading-its-about-roi-not-just-technology">It’s About ROI – Not Just Technology</h3>
<p>With every AI buzzword, there comes hype – and with hype, hesitation. This is healthy. Financial leaders need to see <strong>measurable ROI</strong>, not just a list of features.</p>
<p>Smart AI adoption focuses on:</p>
<ul>
<li><p><strong>Solving real business problems</strong> (for example, reducing loan processing time by 60%)</p>
</li>
<li><p><strong>Improving customer KPIs</strong> (for example, 20% higher retention from personalized financial advice)</p>
</li>
<li><p><strong>Cutting operational costs</strong> (for example, automating reconciliation processes)</p>
</li>
<li><p><strong>Enhancing security and compliance</strong> in increasingly hostile threat environments</p>
</li>
</ul>
<p>This handbook is about moving past the hype and into real value.</p>
<h3 id="heading-who-should-read-this-handbook">Who Should Read This Handbook</h3>
<p>This is a handbook written for decision-makers – executives, investors, and operators who shape the future of financial services:</p>
<ul>
<li><p>Bank executives and managers who want to transform operations and customer experience</p>
</li>
<li><p>Fintech founders and product teams building next-gen platforms</p>
</li>
<li><p>CTOs and CIOs tasked with modernizing infrastructure</p>
</li>
<li><p>Investors – VCs, PEs, GPs, LPs – looking to evaluate scalable fintech and AI plays</p>
</li>
<li><p>Leaders in asset management, hedge funds, and trading who want a performance edge</p>
</li>
<li><p>Insurance and payment companies navigating digital acceleration</p>
</li>
</ul>
<h3 id="heading-what-to-expect">What to Expect</h3>
<p>This handbook dives deep into how AI and data are being applied across the financial world – not in theory, but in practice. We'll explore global case studies from Singapore to New York, Tokyo to Amsterdam that show exactly how leading firms are deploying AI to solve real-world challenges.</p>
<p>We’ll break down the ecosystem into the most relevant financial verticals and explain:</p>
<ul>
<li><p>What problems AI solves</p>
</li>
<li><p>How data infrastructure plays a role</p>
</li>
<li><p>What tools and platforms are available</p>
</li>
<li><p>How organizations can upskill their teams</p>
</li>
<li><p>What successful case studies teach us</p>
</li>
</ul>
<p>By the end of this handbook, you’ll walk away with a roadmap – not just for “adopting AI,” but for <strong>building a sustainable, data-driven financial institution</strong> that stays ahead of the curve.</p>
<p><a target="_blank" href="https://lunartech.ai/programs/ai-for-executives"></a></p>
<h2 id="heading-chapter-2-ai-in-finance-today-where-are-we-in-ai-and-innovation">Chapter 2: AI in Finance Today — Where Are We in AI and Innovation?</h2>
<p>At its core, <strong>finance</strong> is the science and business of managing money – how it’s earned, saved, invested, insured, borrowed, and spent. That definition hasn’t changed. But the methods, expectations, and technologies that drive modern finance have radically transformed.</p>
<p>In today’s financial ecosystem, institutions are no longer judged solely on interest rates or product offerings. Instead, they are measured by:</p>
<ul>
<li><p>How fast they can deliver services</p>
</li>
<li><p>How well they personalize customer experiences</p>
</li>
<li><p>How securely they protect data and infrastructure</p>
</li>
<li><p>How intelligently they manage risk and capital allocation</p>
</li>
</ul>
<p>And most importantly, by <strong>how effectively they use data.</strong></p>
<h3 id="heading-finance-in-2025-data-centric-and-ai-driven">Finance in 2025: Data-Centric and AI-Driven</h3>
<p>Every financial activity – be it a retail transaction, a cross-border payment, an IPO, or a wealth management advisory session – generates a <strong>digital footprint</strong>. What sets the leaders apart is how well they can capture, structure, analyze, and act on that data.</p>
<p>AI is the natural engine of this transformation. But today, we’re at a mixed adoption stage globally.</p>
<h4 id="heading-where-finance-is-excelling-in-ai">Where Finance Is Excelling in AI</h4>
<p>Many large financial players have already implemented AI with impressive results. Here are a few standout areas:</p>
<ul>
<li><p><strong>Fraud Detection and Risk Management</strong>: AI models can now detect fraud in milliseconds by analyzing real-time patterns and anomalies (for example, Mastercard and Visa use ML to detect fraudulent transactions before they’re completed).</p>
</li>
<li><p><strong>Algorithmic and Quantitative Trading</strong>: Hedge funds like Renaissance Technologies and Two Sigma use machine learning for predictive modeling based on vast data sources, including alternative data like satellite imagery.</p>
</li>
<li><p><strong>Robo-Advisors and Personal Finance</strong>: Platforms like Betterment and Wealthfront use AI to provide automated, personalized investment strategies at scale.</p>
</li>
<li><p><strong>Customer Service</strong>: Chatbots and AI-powered assistants are now handling millions of interactions across banks like Bank of America (Erica) and HSBC, significantly reducing customer support costs.</p>
</li>
</ul>
<p>These are just the beginning. In many of these cases, AI has not just improved performance – it has become a core competitive advantage.</p>
<h4 id="heading-where-the-gaps-are">Where the Gaps Are</h4>
<p>Despite high-profile innovation, many financial institutions – especially traditional banks and insurers in Western Europe, Southeast Asia, and Latin America – are lagging behind.</p>
<p>Common challenges include:</p>
<ul>
<li><p><strong>Legacy Core Systems</strong>: Older, monolithic infrastructures make data integration and automation difficult.</p>
</li>
<li><p><strong>Siloed Data</strong>: Without centralized data warehouses or lakes, advanced AI modeling is almost impossible.</p>
</li>
<li><p><strong>Shortage of AI Talent</strong>: Many banks lack in-house AI engineers or data scientists, leading to reliance on generic third-party tools.</p>
</li>
<li><p><strong>Regulatory Fear</strong>: Concerns over compliance and data privacy (GDPR, AML, Basel III) often slow down innovation, even when AI can help meet those very obligations.</p>
</li>
</ul>
<p>A 2023 report by the World Economic Forum noted that while 85% of financial executives see AI as “essential” to future growth, fewer than 35% have deployed it at scale within core operations.</p>
<p>This means we are still in the early innings – especially for those outside of major innovation hubs like New York, London, or Hong Kong.</p>
<h3 id="heading-finance-is-becoming-fintech-by-default">Finance Is Becoming Fintech by Default</h3>
<p>One important shift: the line between traditional finance and fintech is vanishing.</p>
<p>Any company that provides financial services must now think like a tech company. This includes retail banks, wealth managers, insurers, private equity firms, and central banks. Whether they like it or not, they are becoming data companies.</p>
<ul>
<li><p>Payments are being reinvented by APIs and machine learning optimization (Stripe, Adyen, Square).</p>
</li>
<li><p>Lending is now algorithmic, with startups like Upstart and Kabbage approving loans in seconds using AI-based credit scoring.</p>
</li>
<li><p>Investment analysis is real-time, with platforms scanning global news, earnings reports, and social media sentiment 24/7.</p>
</li>
<li><p>Insurtechs are pricing risk more accurately than ever with real-time data from connected devices and behavioral scoring.</p>
</li>
</ul>
<p>Legacy institutions that resist this shift risk being leapfrogged by more agile, AI-first challengers.</p>
<h3 id="heading-the-global-landscape-an-uneven-map">The Global Landscape: An Uneven Map</h3>
<p>Innovation levels vary widely across regions:</p>
<ul>
<li><p><strong>United States</strong>: Leading in AI-driven trading, wealth tech, and regtech. Heavy investment in AI research and startup ecosystems.</p>
</li>
<li><p><strong>United Kingdom</strong>: Strong fintech sector in London, but traditional banks remain cautious. Regulation-friendly for experimentation (for example, FCA sandbox).</p>
</li>
<li><p><strong>Netherlands & Germany</strong>: Wealth of talent and infrastructure, but legacy banking institutions are slow to adapt AI internally.</p>
</li>
<li><p><strong>Singapore & Hong Kong</strong>: Government-backed innovation hubs, strong adoption in wealth management and regulatory tech.</p>
</li>
<li><p><strong>China</strong>: AI-first approach in consumer finance and mobile payments, led by Ant Group and Tencent.</p>
</li>
<li><p><strong>Canada & Australia</strong>: Focused on ethical AI and compliance automation. Slower in retail innovation but strong in institutional tech.</p>
</li>
<li><p><strong>Japan</strong>: Conservative innovation pace in traditional banks, but increasing AI use in investment and manufacturing finance.</p>
</li>
</ul>
<p>This variance opens the door for learning across borders – and for competitive advantage in under-served regions.</p>
<p>Finance today is not just about managing capital. It's about managing data, speed, trust, and intelligence. AI is no longer the edge. It is becoming the foundation.</p>
<p>In the next section, we’ll go beyond definitions and into real-world examples: How are top institutions – from Goldman Sachs to Revolut to Ant Financial – applying AI in ways that are changing the game.</p>
<p><a target="_blank" href="https://lunartech.ai/programs/ai-for-executives"></a></p>
<h2 id="heading-chapter-3-global-use-cases-and-case-studies-of-ai-in-finance">Chapter 3: Global Use Cases and Case Studies of AI in Finance</h2>
<p>AI is no longer experimental in finance – it's operational. From Wall Street to Shanghai, leading institutions are deploying machine learning, natural language processing (NLP), and generative AI not just to optimize processes but to redefine them.</p>
<p>In this section, we explore real-world case studies of how AI is already transforming financial services across banking, investing, payments, compliance, and customer experience. These examples span a global spectrum – from the U.S. to Asia to Europe – offering a comprehensive view of how AI is being leveraged across different financial sectors worldwide.</p>
<h3 id="heading-jpmorgan-chase-coin-contract-intelligence-platform">JPMorgan Chase – COiN (Contract Intelligence Platform)</h3>
<p><strong>Country:</strong> United States<br><strong>Function:</strong> Legal automation and document review<br><strong>AI Applications:</strong> NLP and Machine Learning<br><strong>Impact:</strong> Reduced 360,000 hours of manual review time</p>
<p>JPMorgan’s <strong>COiN</strong> (Contract Intelligence) platform is a pioneer in AI for legal and compliance processes. Using Natural Language Processing (NLP), COiN automates the review of legal documents, particularly complex credit agreements. This process, which used to take hundreds of thousands of hours of manual work, is now completed in a fraction of the time, significantly enhancing operational efficiency.</p>
<ul>
<li><p><strong>Risk Analysis:</strong> COiN scans documents to identify key terms, obligations, and risks associated with legal contracts. This allows compliance officers to focus on the high-risk contracts and flag potential issues early on.</p>
</li>
<li><p><strong>Operational Cost Savings:</strong> The automation provided by COiN reduces reliance on manual labor and minimizes the risk of human error, ultimately saving the bank time and money.</p>
</li>
<li><p><strong>Compliance and Speed:</strong> COiN helps JPMorgan comply with complex regulatory requirements by making the review process quicker and more accurate, reducing compliance risk.</p>
</li>
</ul>
<p>COiN is a clear example of how AI can disrupt back-office operations, providing banks and financial institutions with tools that significantly improve productivity and legal oversight.</p>
<h3 id="heading-blackrock-aladdin-asset-liability-debt-amp-derivative-investment-network">BlackRock – Aladdin (Asset, Liability, Debt & Derivative Investment Network)</h3>
<p><strong>Country:</strong> United States (Global deployment)<br><strong>Function:</strong> Risk management, portfolio construction, investment operations<br><strong>AI Applications:</strong> Predictive analytics, real-time risk modeling<br><strong>Impact:</strong> Powers ~$21 trillion in assets under management</p>
<p><strong>Aladdin</strong>, BlackRock’s AI-powered risk management platform, is one of the most influential tools in the investment management space. Aladdin leverages predictive analytics and real-time data to help asset managers assess risk, build portfolios, and manage their investment operations.</p>
<ul>
<li><p><strong>Scenario Analysis:</strong> Aladdin simulates various market scenarios (such as changes in interest rates or economic downturns) to help portfolio managers identify potential vulnerabilities and optimize portfolio performance accordingly.</p>
</li>
<li><p><strong>Market Prediction:</strong> Aladdin uses AI to forecast asset performance by analyzing <strong>both historical and real-time data</strong>, allowing asset managers to make data-driven decisions that improve returns while managing risk.</p>
</li>
<li><p><strong>Operational Risk:</strong> The platform can quickly identify potential gaps in the operational side of portfolio management, providing actionable insights to reduce risks.</p>
</li>
</ul>
<p>Aladdin is used by financial institutions around the world, including large asset managers, insurers, and sovereign wealth funds. By licensing its technology, BlackRock has turned into not just an asset management firm, but a technology provider as well.</p>
<p>Here’s a <a target="_blank" href="https://www.blackrock.com/aladdin/">BlackRock Aladdin overview</a> if you want to read more.</p>
<h3 id="heading-goldman-sachs-marcus-amp-ai-powered-consumer-finance">Goldman Sachs – Marcus & AI-Powered Consumer Finance</h3>
<p><strong>Country:</strong> United States<br><strong>Function:</strong> Consumer banking, digital lending<br><strong>AI Applications:</strong> Behavioral analytics, NLP, personalization<br><strong>Impact:</strong> Over $100B in deposits managed via AI-augmented digital channels</p>
<p>Goldman Sachs entered the consumer banking space with <strong>Marcus</strong>, a digital platform offering savings accounts and personal loans. Powered by AI, Marcus has revolutionized how the bank approaches credit decisioning, personalized financial advice, and customer onboarding.</p>
<ul>
<li><p><strong>Credit Decisioning:</strong> Goldman Sachs uses AI to assess creditworthiness by analyzing alternative data sources, such as transaction history and social behavior, instead of just traditional credit scores. This allows Marcus to extend credit to a wider customer base, especially those underserved by traditional banks.</p>
</li>
<li><p><strong>Personalization:</strong> AI-driven algorithms create tailored financial solutions for individual customers, such as personalized savings plans or investment recommendations, enhancing user experience.</p>
</li>
<li><p><strong>Automated Onboarding:</strong> The AI engine speeds up the verification process, reducing manual input and allowing customers to open accounts in a matter of minutes, rather than days.</p>
</li>
</ul>
<p>Goldman Sachs’ move into the digital consumer finance space underscores how even traditional investment banks can innovate and compete with fintech disruptors by leveraging AI to improve user experience and streamline operations.</p>
<p>You can read more about <a target="_blank" href="https://www.marcus.com/">Marcus by Goldman Sachs</a> if you’re curious.</p>
<h3 id="heading-ant-group-ai-for-superapp-finance">Ant Group – AI for SuperApp Finance</h3>
<p><strong>Country:</strong> China<br><strong>Function:</strong> Mobile payments, credit, insurance, wealth<br><strong>AI Applications:</strong> Deep learning, behavior-based credit scoring, fraud detection<br><strong>Impact:</strong> Over 1 billion users served by AI-driven services</p>
<p>Ant Group, the parent company of <strong>Alipay</strong>, integrates AI throughout its extensive ecosystem, offering mobile payments, credit, insurance, and wealth management services. The scale at which Ant operates – with over 1 billion users – makes its AI deployment incredibly sophisticated.</p>
<ul>
<li><p><strong>Zhima Credit (Sesame Credit):</strong> This AI-powered credit scoring system uses behavioral data to evaluate creditworthiness. By analyzing transaction history, utility bill payments, and even social behavior, Ant Group can offer personalized loans and financial products to users who may lack traditional credit histories.</p>
</li>
<li><p><strong>Fraud Detection:</strong> Real-time anomaly detection systems continuously monitor billions of transactions to flag suspicious activity, preventing fraud before it happens. This has greatly improved trust in digital financial transactions, particularly in regions where traditional banking infrastructure is lacking.</p>
</li>
<li><p><strong>Smart Customer Support:</strong> Ant's NLP-powered chatbots resolve over 95% of customer queries autonomously, ensuring users receive timely assistance.</p>
</li>
</ul>
<p>Ant Group’s AI-driven platform enables massive scalability and efficiency, allowing the company to offer an array of services without the need for extensive physical infrastructure.</p>
<h3 id="heading-revolut-real-time-fraud-detection-and-personalization">Revolut – Real-Time Fraud Detection and Personalization</h3>
<p><strong>Country:</strong> United Kingdom<br><strong>Function:</strong> Neobank, payments, FX, crypto<br><strong>AI Applications:</strong> Real-time anomaly detection, personalization engines<br><strong>Impact:</strong> 35M+ users, AI flags >95% of fraud in real time</p>
<p><strong>Revolut</strong> uses AI extensively to enhance both customer experience and security across its neobanking platform. By leveraging machine learning, Revolut is able to detect fraud in real time and personalize financial services for each user.</p>
<ul>
<li><p><strong>Fraud Detection:</strong> Revolut’s AI models analyze behavioral patterns – such as location, transaction frequency, and device fingerprinting – to identify potentially fraudulent activities in real time. This allows the system to immediately flag suspicious transactions, ensuring a high level of security for its global user base.</p>
</li>
<li><p><strong>Personalization:</strong> Revolut’s AI engine provides users with customized budgeting tips, spending insights, and even recommends financial products such as loans and insurance, based on individual transaction data.</p>
</li>
<li><p><strong>Scalability:</strong> Revolut’s AI stack is designed to handle the massive scale of over 35 million users spread across 200+ countries, all while maintaining high standards of personalization.</p>
</li>
</ul>
<p>Revolut’s success lies in balancing cutting-edge AI with a streamlined, user-friendly experience, proving that AI is not just a tool for large banks but also for nimble fintech startups.</p>
<p>You can read more about <a target="_blank" href="https://www.revolut.com/">Revolut’s AI-driven approach here</a>.</p>
<h3 id="heading-renaissance-technologies-predictive-quant-trading">Renaissance Technologies – Predictive Quant Trading</h3>
<p><strong>Country:</strong> United States<br><strong>Function:</strong> Hedge fund<br><strong>AI Applications:</strong> Machine learning, alternative data modeling, signal extraction<br><strong>Impact:</strong> Arguably the most profitable quant firm in history</p>
<p><strong>Renaissance Technologies</strong>, the legendary hedge fund, is known for its AI-powered and data-driven investment strategies. The firm employs some of the most advanced machine learning techniques and data models to predict price movements, gaining a significant edge in the market.</p>
<ul>
<li><p><strong>Alternative Data Analysis:</strong> Renaissance uses unconventional data sources such as satellite imagery, weather data, and even social sentiment from social media platforms to build predictive models. For instance, they may analyze the number of cars in the parking lot of a retail chain using satellite images to forecast quarterly earnings.</p>
</li>
<li><p><strong>Machine Learning Models:</strong> Renaissance Technologies uses machine learning models to identify patterns and signals that human analysts may miss, making their trading decisions faster and more accurate.</p>
</li>
<li><p><strong>Consistent Returns:</strong> The firm’s flagship Medallion Fund has reportedly returned over 60% annually (net), a remarkable feat in the investment world, thanks to its reliance on AI to optimize every aspect of its trading strategy.</p>
</li>
</ul>
<p>Renaissance’s success story is a perfect example of how AI, combined with alternative data, can produce extraordinary financial returns.</p>
<h3 id="heading-generative-ai-for-internal-automation-and-client-interaction">Generative AI for Internal Automation and Client Interaction</h3>
<p><strong>Used Globally</strong><br><strong>Function:</strong> Customer service, internal productivity, compliance<br><strong>AI Applications:</strong> LLMs (like ChatGPT), GPT-powered copilots<br><strong>Impact:</strong> Reduces response time, boosts compliance, increases advisor efficiency</p>
<p>Generative AI is being rapidly adopted across the finance industry for internal automation and client interaction. AI tools like ChatGPT and similar Large Language Models (LLMs) have found applications across multiple facets of financial institutions:</p>
<ul>
<li><p><strong>Customer Service Automation:</strong> Banks and financial institutions are using generative AI to power chatbots and virtual assistants that handle common customer inquiries, reducing the need for human intervention and significantly improving response times.</p>
</li>
<li><p><strong>Internal Productivity:</strong> AI copilots, like those tested by Morgan Stanley and UBS, help financial advisors quickly retrieve research, analyze market trends, and generate custom reports. This allows advisors to focus on more valuable, higher-level tasks like client engagement.</p>
</li>
<li><p><strong>Compliance Assistance:</strong> Generative AI is also being deployed to automate risk documentation, summarize compliance reports, and assist in the generation of legal documents, ensuring that the vast array of regulatory requirements is met with greater accuracy and efficiency.</p>
</li>
</ul>
<p>Here are some examples:</p>
<ul>
<li><p><strong>Morgan Stanley</strong> uses OpenAI’s GPT to help financial advisors access research instantly.</p>
</li>
<li><p><strong>UBS</strong> is testing AI copilots to assist relationship managers and client-facing bankers.</p>
</li>
<li><p><strong>ING</strong> uses AI to streamline internal processes like writing credit memos and risk assessments.</p>
</li>
</ul>
<p>Generative AI is transforming how financial firms deliver customer service, assist employees, and maintain compliance.</p>
<p><a target="_blank" href="https://lunartech.ai/programs/ai-for-executives"></a></p>
<h2 id="heading-chapter-4-data-management-in-finance-navigating-data-lakes-real-time-ingestion-security-and-cloud-platforms">Chapter 4 - Data Management in Finance: Navigating Data Lakes, Real-Time Ingestion, Security, and Cloud Platforms</h2>
<p>In the digital age, data has become the lifeblood of the financial industry. From risk management to customer service and predictive analytics, financial institutions are increasingly relying on vast amounts of data to make informed decisions.</p>
<p>But handling this data requires advanced infrastructure, as well as a deep understanding of how different technologies can be leveraged to optimize data usage.</p>
<p>In this section, we’ll explore the critical components of data management in finance, including data lakes vs. data warehouses, real-time data ingestion, data security and compliance, and the role of cloud platforms like AWS, GCP, and Azure in managing financial data.</p>
<h3 id="heading-data-lakes-vs-data-warehouses-the-foundation-of-financial-data-management">Data Lakes vs. Data Warehouses: The Foundation of Financial Data Management</h3>
<p>When dealing with large volumes of data, teams and companies must decide how best to store, manage, and utilize that data. This decision often comes down to two key technologies: <strong>data lakes</strong> and <strong>data warehouses</strong>. While they may seem similar, they serve different purposes and have distinct advantages depending on the needs of the organization.</p>
<h4 id="heading-data-lakes-flexible-and-scalable-for-big-data">Data Lakes: Flexible and Scalable for Big Data</h4>
<p>A <strong>data lake</strong> is a centralized repository that allows financial institutions to store vast amounts of structured, semi-structured, and unstructured data at scale. The key advantage of a data lake is its flexibility – it can accommodate data from a variety of sources without requiring any preprocessing or transformation.</p>
<p>In finance, data lakes are ideal for storing massive datasets such as transaction logs, market data, social media feeds, and customer interactions. By consolidating this data in one place, organizations can perform exploratory data analysis, conduct advanced analytics, and implement machine learning models.</p>
<p><strong>Advantages:</strong></p>
<ul>
<li><p><strong>Scalability:</strong> Data lakes can handle petabytes of data with ease.</p>
</li>
<li><p><strong>Cost-Effective:</strong> They are often built on low-cost storage solutions, which makes them a cost-effective way to store large amounts of data.</p>
</li>
<li><p><strong>Data Variety:</strong> They can store data in its raw form, including structured data (like customer demographics), semi-structured data (like transaction logs), and unstructured data (like customer service chat logs or social media feeds).</p>
</li>
</ul>
<p><strong>Challenges:</strong></p>
<ul>
<li><p><strong>Data Quality:</strong> Since data in a lake is often stored in its raw form, ensuring the quality of the data can be challenging.</p>
</li>
<li><p><strong>Data Governance:</strong> Proper governance frameworks need to be in place to manage who has access to the data, and how it can be used securely and ethically.</p>
</li>
</ul>
<h4 id="heading-data-warehouses-structured-and-optimized-for-analytics">Data Warehouses: Structured and Optimized for Analytics</h4>
<p>A <strong>data warehouse</strong>, on the other hand, is designed for structured data that is preprocessed and optimized for analytics. It usually stores historical data, transformed into a format that is easy to query and analyze. In financial institutions, data warehouses are used for business intelligence, reporting, and making strategic decisions based on historical trends.</p>
<p>Banks and asset management firms often rely on data warehouses for financial reporting, risk management, fraud detection, and compliance tracking. It allows them to access a clean and structured dataset that is ready for analysis.</p>
<p><strong>Advantages:</strong></p>
<ul>
<li><p><strong>Performance:</strong> Data warehouses are highly optimized for complex queries and fast analytics.</p>
</li>
<li><p><strong>Data Integrity:</strong> The data stored in warehouses is usually cleaned and transformed, ensuring a high degree of accuracy and consistency.</p>
</li>
<li><p><strong>Business Intelligence:</strong> They support advanced business intelligence tools and reporting features, helping executives make informed decisions.</p>
</li>
</ul>
<p><strong>Challenges:</strong></p>
<ul>
<li><p><strong>Cost:</strong> Data warehouses typically require more expensive storage and computing resources due to their structured nature.</p>
</li>
<li><p><strong>Rigidity:</strong> Unlike data lakes, data warehouses are less flexible when it comes to accommodating unstructured data or rapidly changing datasets.</p>
</li>
</ul>
<h3 id="heading-real-time-data-ingestion-and-processing-the-importance-of-speed-in-finance">Real-Time Data Ingestion and Processing: The Importance of Speed in Finance</h3>
<p>The ability to process real-time data has become a critical factor for success in modern financial services. Whether it's market trading, fraud detection, or customer support, financial institutions need to ingest and analyze data as it happens to make timely decisions and maintain competitive advantage.</p>
<h4 id="heading-real-time-data-ingestion">Real-Time Data Ingestion</h4>
<p>In the financial world, real-time data ingestion refers to the continuous flow of data from various sources (such as stock markets, credit card transactions, or social media) into a central system for immediate processing. For instance, banks must process millions of transactions every second to identify fraud or assess liquidity risk.</p>
<ul>
<li><p><strong>Example:</strong> A <strong>trading algorithm</strong> that ingests live market data (price movements, order books, and so on) and adjusts trading strategies in real time, helping asset managers to react instantly to market conditions.</p>
</li>
<li><p><strong>Key Technologies:</strong> Real-time data ingestion typically uses streaming technologies such as <strong>Apache Kafka</strong>, <strong>AWS Kinesis</strong>, or <strong>Google Cloud Pub/Sub</strong> to process and route data to processing systems with minimal delay.</p>
</li>
</ul>
<h4 id="heading-real-time-data-processing">Real-Time Data Processing</h4>
<p>Once data is ingested, it needs to be processed immediately to generate insights or trigger actions. For example, real-time fraud detection systems analyze each credit card transaction as it happens to determine whether it’s legitimate or fraudulent, using algorithms that monitor patterns and behaviors.</p>
<ul>
<li><strong>Key Processing Technologies:</strong> In finance, streaming analytics platforms like <strong>Apache Flink</strong> or <strong>Google Dataflow</strong> are commonly used to handle real-time data. These platforms allow institutions to run complex analytics on data in motion, enabling them to identify risks, opportunities, or irregularities quickly.</li>
</ul>
<p><strong>Use Cases:</strong></p>
<ul>
<li><p><strong>Fraud Detection:</strong> Banks and payment processors use real-time transaction analysis to detect fraud patterns and stop unauthorized transactions.</p>
</li>
<li><p><strong>Algorithmic Trading:</strong> Real-time data processing enables financial firms to adjust trading algorithms instantly based on market changes.</p>
</li>
<li><p><strong>Customer Interaction:</strong> AI-powered chatbots and customer service agents are able to offer real-time support to clients, improving the customer experience.</p>
</li>
</ul>
<h3 id="heading-data-security-and-compliance-in-financial-data-handling">Data Security and Compliance in Financial Data Handling</h3>
<p>In finance, data is not just an asset – it is also a liability. Financial institutions need to adhere to strict data security and compliance regulations to protect sensitive customer information and meet legal requirements.</p>
<h4 id="heading-compliance-with-regulations">Compliance with Regulations</h4>
<p>Financial institutions operate in a heavily regulated environment, where maintaining compliance is crucial. Regulations like <strong>GDPR</strong> (General Data Protection Regulation), <strong>FINRA</strong> (Financial Industry Regulatory Authority), and the <strong>SEC</strong> (Securities and Exchange Commission) set strict guidelines for how financial data should be handled, stored, and protected.</p>
<ul>
<li><p><strong>GDPR:</strong> This European regulation imposes heavy fines on organizations that mishandle personal data. Financial institutions must ensure that they collect, store, and process customer data in compliance with GDPR principles, such as obtaining explicit consent and providing data access rights to users.</p>
</li>
<li><p><strong>FINRA/SEC Regulations:</strong> These U.S.-based regulatory bodies require firms to retain records of transactions and communications, ensure that data is protected from unauthorized access, and report suspicious activities promptly. Financial firms must implement stringent data governance frameworks to comply with these regulations.</p>
</li>
</ul>
<h4 id="heading-data-security-in-financial-institutions">Data Security in Financial Institutions</h4>
<p>With the massive amount of sensitive data stored in financial systems, protecting this data from cyberattacks, breaches, and unauthorized access is of paramount importance. Financial institutions are leveraging a combination of encryption, multi-factor authentication (MFA), and access control policies to ensure the security of their systems.</p>
<ul>
<li><p><strong>Encryption:</strong> Financial data, both at rest and in transit, is encrypted to prevent interception by malicious actors.</p>
</li>
<li><p><strong>MFA:</strong> Multi-factor authentication ensures that even if an attacker gains access to a password, they still cannot access the data without a second form of authentication (such as a token or biometric verification).</p>
</li>
<li><p><strong>Data Masking:</strong> Sensitive customer data, such as credit card numbers or Social Security numbers, is often "masked" in non-production environments to prevent accidental exposure during testing or development.</p>
</li>
</ul>
<h3 id="heading-cloud-platforms-in-financial-data-handling-aws-gcp-and-azure">Cloud Platforms in Financial Data Handling: AWS, GCP, and Azure</h3>
<p>Cloud platforms such as Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure have become the backbone for modern financial data management. These platforms offer scalable infrastructure, advanced analytics tools, and machine learning services that are essential for financial institutions to stay competitive.</p>
<h4 id="heading-benefits-of-cloud-platforms-in-finance">Benefits of Cloud Platforms in Finance</h4>
<ul>
<li><p><strong>Scalability:</strong> Cloud platforms provide virtually unlimited storage and computing power, allowing financial institutions to scale operations efficiently.</p>
</li>
<li><p><strong>Security and Compliance:</strong> Major cloud providers offer industry-specific compliance certifications (such as <strong>SOC 2</strong> or <strong>ISO 27001</strong>) and implement strong security features, including encryption and access control, to meet financial regulatory standards.</p>
</li>
<li><p><strong>Advanced Analytics and Machine Learning:</strong> Cloud platforms provide access to a range of tools for big data processing, AI model development, and real-time analytics. For instance, AWS provides services like Amazon SageMaker for machine learning, while Google Cloud’s BigQuery offers fast data analytics.</p>
</li>
</ul>
<h4 id="heading-use-cases-of-cloud-in-finance">Use Cases of Cloud in Finance:</h4>
<ul>
<li><p><strong>Risk Analytics:</strong> Financial firms use cloud platforms to run complex risk simulations at scale, allowing them to identify potential vulnerabilities in their portfolios and strategies.</p>
</li>
<li><p><strong>Fraud Detection and Prevention:</strong> Cloud-based AI models can analyze billions of transactions in real time, flagging suspicious activities with greater accuracy than traditional systems.</p>
</li>
<li><p><strong>Customer Service Automation:</strong> Using cloud-based AI and chatbots, financial institutions can offer 24/7 customer service, streamlining support while reducing operational costs.</p>
</li>
</ul>
<p>In the financial industry, leveraging the right data infrastructure is key to gaining a competitive edge. By effectively managing data using data lakes, data warehouses, and advanced cloud platforms, financial institutions can enhance their decision-making capabilities, improve security and compliance, and deliver a better experience to customers.</p>
<p>As the industry continues to embrace real-time data ingestion, advanced analytics, and AI, those who master the art of data management will be the leaders of tomorrow’s financial ecosystem.</p>
<p><a target="_blank" href="https://lunartech.ai/programs/ai-for-executives"></a></p>
<h2 id="heading-chapter-5-the-science-behind-the-models-ml-nlp-and-predictive-analytics">Chapter 5: The Science Behind the Models – ML, NLP, and Predictive Analytics</h2>
<p>Artificial Intelligence (AI) in finance is not magic – it’s applied science. Behind every real-time fraud alert, automated investment strategy, or smart credit score is a complex stack of algorithms and data pipelines.</p>
<p>To make AI work in financial environments where accuracy, explainability, and risk tolerance are non-negotiable, institutions rely on a blend of machine learning (ML), natural language processing (NLP), and predictive analytics.</p>
<p>In this section, we’ll unpack the foundational AI methods that power today’s most critical financial systems, and how these models are reshaping decision-making across the value chain.</p>
<h3 id="heading-time-series-forecasting-the-engine-of-financial-prediction">Time-Series Forecasting: The Engine of Financial Prediction</h3>
<p><strong>Time-series forecasting</strong> is the cornerstone of financial modeling. Unlike typical supervised learning where inputs are independent, time-series models take into account temporal dependencies – the past influencing the future – which is especially important in domains like stock prices, interest rates, and credit defaults.</p>
<h4 id="heading-core-applications-in-finance">Core Applications in Finance:</h4>
<ul>
<li><p><strong>Asset Price Prediction:</strong> Hedge funds and asset managers forecast equity, FX, and commodity prices using techniques ranging from ARIMA and exponential smoothing to deep learning-based models like LSTMs (Long Short-Term Memory) or Temporal Convolutional Networks (TCNs).</p>
</li>
<li><p><strong>Liquidity Forecasting:</strong> Treasury departments forecast cash flow and liquidity needs across accounts and geographies to meet regulatory buffers and prevent shortfalls.</p>
</li>
<li><p><strong>Credit Risk Monitoring:</strong> Time-series models help anticipate changes in borrower behavior or macroeconomic indicators that impact default probabilities.</p>
</li>
</ul>
<h4 id="heading-technical-insights">Technical Insights:</h4>
<ul>
<li><p><strong>Models Used:</strong> ARIMA, Prophet (developed by Meta), LSTM, XGBoost on rolling features.</p>
</li>
<li><p><strong>Challenges:</strong> High noise-to-signal ratio in markets, non-stationarity, and the risk of overfitting to past data.</p>
</li>
<li><p><strong>Best Practices:</strong> Combining feature engineering with domain-specific constraints (for example, market open/close calendars, economic events) significantly improves forecast reliability.</p>
</li>
</ul>
<h3 id="heading-risk-modeling-quantifying-uncertainty-with-machine-learning">Risk Modeling: Quantifying Uncertainty with Machine Learning</h3>
<p>Risk modeling is fundamental in finance, whether you're managing market risk, credit risk, or operational risk. Traditionally built with logistic regression and rule-based systems, today’s models are becoming far more nuanced through ML.</p>
<h4 id="heading-machine-learning-in-risk">Machine Learning in Risk:</h4>
<ul>
<li><p><strong>Credit Risk:</strong> ML models ingest not just FICO scores and payment history, but also alternative data like cash flow, mobile phone usage, and behavioral patterns to score borrowers – especially useful in emerging markets or for thin-file customers.</p>
</li>
<li><p><strong>Market Risk (VaR, CVaR):</strong> ML techniques simulate potential portfolio losses under different market scenarios, accounting for complex correlations across assets.</p>
</li>
<li><p><strong>Operational Risk:</strong> Using internal logs and incident reports, anomaly detection algorithms can flag early indicators of system failures or fraud.</p>
</li>
</ul>
<h4 id="heading-technical-highlights">Technical Highlights:</h4>
<ul>
<li><p><strong>Popular Models:</strong> Gradient Boosting Machines (GBM), Random Forests, Support Vector Machines (SVM), and Neural Networks.</p>
</li>
<li><p><strong>Interpretability:</strong> Risk models must be explainable to pass regulatory scrutiny. Tools like SHAP values or LIME help demystify black-box models by showing the impact of individual features on predictions.</p>
</li>
<li><p><strong>Example:</strong> A bank may use XGBoost to predict credit card default, with SHAP showing that recent missed payments and high utilization ratios were the key drivers behind the model’s output.</p>
</li>
</ul>
<h3 id="heading-natural-language-processing-nlp-unlocking-textual-data">Natural Language Processing (NLP): Unlocking Textual Data</h3>
<p>Financial institutions sit on mountains of unstructured textual data — earnings call transcripts, analyst reports, regulatory filings, news, and customer communications. <strong>NLP</strong> allows them to extract meaningful insights from this data at scale.</p>
<h4 id="heading-use-cases-in-finance">Use Cases in Finance:</h4>
<ul>
<li><p><strong>Document Review and Contract Analysis:</strong> NLP models scan thousands of legal agreements or credit contracts to flag risk clauses, expirations, or inconsistencies (for example, JPMorgan’s COiN platform).</p>
</li>
<li><p><strong>Sentiment Analysis:</strong> Hedge funds use NLP to analyze news and social media sentiment to anticipate market movements.</p>
</li>
<li><p><strong>Regulatory Compliance:</strong> Automated systems parse SEC filings, GDPR policies, and internal communications to ensure compliance or detect violations.</p>
</li>
<li><p><strong>Customer Service Chatbots:</strong> NLP powers real-time customer engagement, automatically resolving queries and routing issues to the right departments.</p>
</li>
</ul>
<h4 id="heading-technologies">Technologies:</h4>
<ul>
<li><p><strong>Traditional Methods:</strong> Named Entity Recognition (NER), Bag-of-Words, TF-IDF, Latent Dirichlet Allocation (LDA).</p>
</li>
<li><p><strong>Modern Approaches:</strong> Transformer models (like BERT, RoBERTa, or domain-specific variants such as FinBERT) trained on financial texts to achieve better context understanding.</p>
</li>
<li><p><strong>Document Intelligence:</strong> With models like GPT-4 or Claude, banks can now extract and summarize key risks, opportunities, or inconsistencies from dense reports.</p>
</li>
</ul>
<h3 id="heading-fraud-detection-using-anomaly-detection-and-unsupervised-learning">Fraud Detection: Using Anomaly Detection and Unsupervised Learning</h3>
<p>Fraud detection is one of the highest ROI use cases for AI in finance. The challenge lies in identifying <strong>non-obvious</strong>, evolving fraudulent patterns buried in billions of transactions – often without labeled data.</p>
<h4 id="heading-why-ml-outperforms-rule-based-systems">Why ML Outperforms Rule-Based Systems:</h4>
<ul>
<li><p><strong>Traditional systems</strong> rely on static rules like “flag any transaction over $5,000 abroad.” But fraudsters quickly adapt.</p>
</li>
<li><p><strong>Machine learning systems</strong>, particularly those using unsupervised or semi-supervised techniques, learn what “normal” looks like for each user and flag outliers in real-time.</p>
</li>
</ul>
<h4 id="heading-models-and-approaches">Models and Approaches:</h4>
<ul>
<li><p><strong>Unsupervised Learning:</strong> Clustering (for example, DBSCAN), Autoencoders, and Isolation Forests are used to detect anomalies without needing labeled fraud data.</p>
</li>
<li><p><strong>Semi-Supervised Learning:</strong> Train on a small labeled dataset with millions of unlabeled records.</p>
</li>
<li><p><strong>Behavioral Biometrics:</strong> ML models monitor how users type, swipe, or move the mouse to detect suspicious behavior – often used in mobile banking apps.</p>
</li>
</ul>
<h4 id="heading-example">Example:</h4>
<p>A neobank like Revolut may apply autoencoder-based models on real-time transaction data. If a user who typically shops in Amsterdam suddenly makes 5 high-value transactions from São Paulo using a new device, the system flags and freezes the account for verification – all within milliseconds.</p>
<p>Behind every AI solution in finance is a combination of mathematical modeling, data engineering, and domain expertise. Whether it’s a hedge fund predicting earnings, a bank screening loans, or an insurance firm processing claims, these tools – time-series forecasting, ML-based risk scoring, NLP-driven document analysis, and anomaly detection – are the technical foundation of financial AI. Understanding them is not optional for executives anymore – it’s the difference between leading innovation or being disrupted by it.</p>
<p><a target="_blank" href="https://lunartech.ai/programs/ai-for-executives"></a></p>
<h2 id="heading-chapter-6-training-the-workforce-upskilling-executives-technical-and-non-technical-teams-in-fintech">Chapter 6: Training the Workforce – Upskilling Executives, Technical, and Non-Technical Teams in FinTech</h2>
<p>AI transformation in finance is both a technological shift and an organizational one. Success doesn’t depend solely on algorithms or data pipelines, but on <strong>people</strong>: the ones who design, deploy, fund, govern, and use AI.</p>
<p>And if there's one hard truth in AI transformation, it is this: Innovation starts at the top.</p>
<p>Whether you are running a regional bank, a global asset manager, or a fintech startup, your leaders must be AI-literate. Not necessarily technically fluent in code – but strategically fluent in AI’s business value, risks, and implementation realities.</p>
<h3 id="heading-ai-literacy-for-leadership-a-strategic-imperative">AI Literacy for Leadership: A Strategic Imperative</h3>
<p>The idea that AI is a luxury – or something to “consider later” – is a dangerous misconception. In the current financial landscape, AI is a necessity. And if decision-makers don’t understand it, they can’t lead it.</p>
<p>Executives are the ones who sign off on technology budgets, approve digital initiatives, and set strategic priorities. It doesn't matter how innovative your engineers are. If your leadership doesn’t “get” AI, the innovation dies on the boardroom table.</p>
<h4 id="heading-common-executive-blind-spots">Common Executive Blind Spots:</h4>
<ul>
<li><p>Confusing automation with true AI (for example, rules-based tools vs. learning systems)</p>
</li>
<li><p>Underestimating the cost and complexity of model deployment</p>
</li>
<li><p>Failing to understand data infrastructure dependencies</p>
</li>
<li><p>Viewing AI as a “tech problem” instead of a business enabler</p>
</li>
<li><p>Ignoring governance risks or regulatory exposure</p>
</li>
</ul>
<p>Here are some key topics in executive AI training:</p>
<ul>
<li><p>Understanding ML, NLP, and GenAI at a strategic level</p>
</li>
<li><p>Interpreting AI project KPIs and business ROI</p>
</li>
<li><p>Governance and model risk management</p>
</li>
<li><p>Ethical and regulatory frameworks (EU AI Act, GDPR, SEC AI enforcement)</p>
</li>
<li><p>Building cross-functional AI innovation teams</p>
</li>
</ul>
<blockquote>
<p>"You’re not going to lose your job to an AI, but you’re going to lose your job to someone who uses AI."<br>— Jensen Huang</p>
</blockquote>
<p>This is not hyperbole. It's already happening. In a 2024 survey by PwC, 72% of financial services CEOs admitted they lacked a clear understanding of how AI delivers ROI in their own organizations. Meanwhile, 60% of digital transformation failures in banking were attributed to “leadership misalignment”, not technical challenges.</p>
<h4 id="heading-the-cost-of-inaction">The Cost of Inaction:</h4>
<ul>
<li><p>Slower go-to-market for AI-based products</p>
</li>
<li><p>Missed competitive advantages (for example, predictive credit scoring, customer retention models)</p>
</li>
<li><p>Increased risk of non-compliance due to lack of AI governance</p>
</li>
<li><p>Talent attrition – top AI engineers don’t stay where innovation is blocked</p>
</li>
</ul>
<p>To address this, top-tier financial institutions are increasingly mandating structured AI education programs for senior leaders, including CEOs, CTOs, COOs, and board members. This isn't just optional professional development – it's often required to ensure alignment on AI strategy, ethical use, and ROI measurement.</p>
<h3 id="heading-why-mandating-ai-education-is-becoming-standard">Why Mandating AI Education is Becoming Standard</h3>
<p>The push for mandatory AI training stems from several factors:</p>
<h4 id="heading-1-strategic-imperative">1. Strategic Imperative</h4>
<p>A 2024 PwC survey cited in various reports notes that 72% of financial services CEOs lack a clear understanding of AI's ROI, contributing to 60% of digital transformation failures due to leadership misalignment. Mandated programs help bridge this by providing strategic fluency in machine learning (ML), natural language processing (NLP), generative AI, and regulatory frameworks like the EU AI Act or GDPR.</p>
<h4 id="heading-2-risk-mitigation">2. Risk Mitigation</h4>
<p>With AI introducing new risks (for example, bias in models, data privacy breaches), boards and executives need education to oversee governance. For instance, the Global Financial Stability Board warned in 2024 that inconsistent AI standards could pose systemic risks.</p>
<h4 id="heading-3-competitive-edge-and-talent-retention">3. Competitive Edge and Talent Retention</h4>
<p>Institutions that invest in executive education see faster AI adoption, better talent attraction, and reduced attrition. Training costs (for example, $5,000 per person annually) are often offset by savings from avoiding missteps, as outlined in the handbook.</p>
<h4 id="heading-4-regulatory-and-market-pressures">4. Regulatory and Market Pressures</h4>
<p>Bodies like the FDIC and OCC have released training resources (for example, FDIC videos on cybersecurity for bank directors), signaling expectations for AI literacy. Conferences like the 2024 FSOC AI & Financial Stability event and Opal Group's Compliance in the Age of AI 2025 emphasize executive involvement.</p>
<p>These programs typically cover AI fundamentals, use cases in finance (for example, predictive analytics), ethical considerations, and hands-on tools like ChatGPT or custom platforms. Formats range from in-house workshops and reverse mentorships to external certifications and business school courses.</p>
<h3 id="heading-institutions-and-executives-mandating-ai-education">Institutions and Executives Mandating AI Education</h3>
<p>While adoption varies by region and institution size (stronger in the US and Asia, as you may be able to tell), several top-tier players are leading with mandated or structured programs. Let’s look at some key examples drawn from recent developments as of July 2025:</p>
<ol>
<li><p><strong>Bank of America</strong>: The bank has adopted a top-down approach to AI education, mandating briefings for senior leadership on generative AI's potential and risks starting around 2023. This includes required sessions for executives to understand AI integration in retail, small business, and wealth management. Hari Gopalkrishnan, CIO and Head of Retail, Small Business, and Wealth Technology, leads this initiative, ensuring C-suite alignment to drive efficient operations and mitigate risks. This reflects a broader trend where banks prioritize internal AI tools for employee training, extending to executives.</p>
</li>
<li><p><strong>Morgan Stanley</strong>: As a pioneer in AI deployment (for example, their COiN platform mentioned above), Morgan Stanley integrates mandatory AI training into tool rollouts for wealth management teams, including executives. Tools like the Morgan Stanley Assistant (launched September 2023, powered by OpenAI's GPT-4) and Morgan Stanley Debrief (June rollout) require user training embedded in the experience. Koren Picariello, Managing Director and Head of Wealth Management Generative AI, oversees this, emphasizing intuitive learning for financial advisors and support staff – though it extends to leadership for strategic oversight. This approach ensures executives are fluent in AI to support firm-wide adoption.</p>
</li>
<li><p><strong>Community Financial Institutions (CFIs) via Eltropy</strong>: Credit unions and community banks are mandating AI certification through Eltropy's program, launched post-EMERGE 2025 conference where over 130 professionals earned the Eltropy AI Practitioner Certificate. This self-paced, on-demand certification is required for employees across functions, including executives, covering foundational AI, Agentic AI, compliant usage in regulated environments, and hands-on bot-building with technologies like LLMs and prompt engineering. While not naming specific executives, it's tailored for CFI leaders to build and deploy AI immediately, addressing the handbook's call for upskilling in smaller institutions.</p>
</li>
<li><p><strong>General Banking Boards (for example, via BankDirector Guidance)</strong>: Many US banks mandate director education and onboarding focused on AI skills for board members to oversee implementation effectively. This includes reboarding programs to enhance technology expertise, with boards establishing governance committees and designating AI overseers. For example, boards are encouraged to support capital for AI infrastructure while receiving regular updates, ensuring members are trained to guide ethical integration and competitive strategies.</p>
</li>
<li><p><strong>Hedge Funds and Larger Institutions</strong>: A 2024 AIMA report on hedge funds shows that nearly half of larger managers (for example, those managing significant AUM) mandate Gen AI training for teams, including executives, though overall adoption is at 10% industry-wide. Firms like Citadel, Bridgewater Associates, and Renaissance Technologies (highlighted in Senate investigations) are creating multidisciplinary AI teams, implying required upskilling for quants and leaders. Bridgewater's CEO, Nir Bar Dea, has publicly discussed AI's role in altering hedge fund landscapes, suggesting internal education mandates.</p>
</li>
<li><p><strong>Broader Trends Involving CEOs and Boards</strong>: Across sectors, boards and CEOs are forming joint AI vision task forces that mandate quarterly meetings and ethical scorecards, often including reverse mentorship programs where board members pair with AI specialists for hands-on learning. Business schools are incorporating AI case studies into board training, as noted in WSJ reports, to address a 20% tech expertise gap per PwC. Advisory firms like RSM US recommend CEOs and boards seek external education for AI vision-building, with 67% of organizations needing outside help.</p>
</li>
</ol>
<p>These examples illustrate a shift toward mandatory AI literacy at the highest levels, aligning with our emphasis on transforming executives into innovation champions. Institutions like Bank of America and Morgan Stanley exemplify how this combats hesitation, fostering a culture where AI drives measurable value.</p>
<h3 id="heading-training-technical-teams-in-fintech">Training Technical Teams in FinTech</h3>
<p>While AI literacy for leadership is essential, innovation doesn’t happen from the boardroom alone. It must be embedded across technical teams – engineers, analysts, data scientists, and product professionals – who build and maintain the infrastructure for change.</p>
<p>But here’s the critical point: you cannot innovate with an exhausted, overburdened, and undertrained workforce.</p>
<p>Many companies today are asking their software engineers to become AI engineers overnight. They're assigning responsibilities for data science, MLOps, predictive modeling, or chatbot design to backend developers who lack the training to handle data pipelines, model deployment, or even fundamental AI architecture. This isn't just inefficient – <strong>it's a recipe for failure</strong>.</p>
<h4 id="heading-why-upskilling-pays-off">Why Upskilling Pays Off</h4>
<p>Let’s look at this through the lens of hard numbers.</p>
<p>A company with a technical team of 100 software engineers, data scientists, or IT professionals will, on average, lose <strong>13 team members per year</strong>. For every engineer who leaves, the cost of replacement – including hiring, onboarding, training, lost productivity, and project disruption – averages $83,000. That means the company loses around <strong>$1.08 million per year</strong> due to attrition alone.</p>
<p>And this figure only reflects <em>direct</em> costs. It doesn’t include lost time on strategic initiatives, intellectual capital, or the hidden tax of slower innovation. These losses compound over time – especially when the market is rapidly adopting AI and you're left with gaps in capability.</p>
<p>Now compare that with the cost of strategic upskilling.</p>
<p>If you invest in targeted AI and data training at a rate of $5,000 per person per year, your total investment for 100 engineers is <strong>$500,000 per year</strong>. That’s less than half the cost of attrition.</p>
<p>But the ROI is even bigger when you account for what you <em>gain</em>:</p>
<ul>
<li><p>Stronger employee retention (engineers are more likely to stay when growing their skill set)</p>
</li>
<li><p>Faster delivery of AI-powered features, internal tools, and customer experiences</p>
</li>
<li><p>Reduced need to hire external consultants or chase niche AI talent in a hyper-competitive market</p>
</li>
<li><p>Avoiding expensive failures caused by technical debt or improperly built models</p>
</li>
</ul>
<p>When engineers are trained in areas like machine learning, LLM integration, NLP, MLOps, and data pipelines, they become innovation enablers rather than just code executors.</p>
<h4 id="heading-hidden-cost-of-overburdening-engineers">Hidden Cost of Overburdening Engineers</h4>
<p>What many executives don’t realize is that undertrained engineers – especially when asked to build high-risk AI systems – can expose the company to massive business risk. They may build flawed recommendation systems, opaque risk models, or chatbot interactions that spiral into compliance disasters.</p>
<p>Modern AI systems require more than good coding skills. They also require:</p>
<ul>
<li><p>Deep understanding of how to clean, structure, and prepare data</p>
</li>
<li><p>Familiarity with supervised vs. unsupervised learning</p>
</li>
<li><p>Knowledge of transformer models, fine-tuning, vector search, embeddings</p>
</li>
<li><p>Awareness of AI ethics, explainability, and regulatory frameworks</p>
</li>
</ul>
<p>These skills are not taught in traditional software engineering programs, nor are they something engineers can "pick up on the job" during sprints. Asking your developers to do everything – from backend infrastructure to building black-box models – is not only unfair, it’s strategically reckless.</p>
<h4 id="heading-upskilling-is-not-a-cost-its-a-hedge-against-brain-drain">Upskilling Is Not a Cost — It’s a Hedge Against Brain Drain</h4>
<p>Here’s the basic math again:</p>
<ul>
<li><p><strong>Cost of attrition per year (100 engineers, 13 lost):</strong> $1,079,000</p>
</li>
<li><p><strong>Cost of upskilling per year (100 engineers, $5K each):</strong> $500,000</p>
</li>
<li><p><strong>Net savings from upskilling:</strong> $579,000 annually</p>
</li>
</ul>
<p>And this is before counting the additional business value from faster launches, higher employee morale, and innovation that drives new revenue streams.</p>
<p>Investing in upskilling not only saves you money – it future-proofs your talent pipeline and makes your team more self-sufficient. Engineers who stay and grow are more likely to build products that push your business forward.</p>
<h4 id="heading-motivation-through-growth">Motivation Through Growth</h4>
<p>One of the most overlooked retention strategies in tech is personal and professional development. Talented engineers <strong>want to work at companies where they grow</strong>. When organizations ignore this, they create frustration, stagnation, and ultimately attrition.</p>
<p>On the other hand, those who invest in upskilling create a sense of purpose and momentum. Upskilled engineers are more confident, more collaborative, and more likely to take initiative in applying AI to business problems.</p>
<p>Training isn't a perk – it's a competitive edge.</p>
<h3 id="heading-training-non-technical-professionals-empowering-the-95-with-ai-fluency">Training Non-Technical Professionals: Empowering the 95% with AI Fluency</h3>
<p>In the conversation around AI transformation, technical talent gets much of the attention – and rightly so. But the reality is this: <strong>95% of the workforce in most organizations is not technical</strong>. And yet, 95% of employees are now asking for training in generative AI, according to a 2024 global workplace survey by edX and The Harris Poll.</p>
<p>This signals a shift in awareness: non-technical professionals understand that generative AI isn’t just a tool for developers – it’s a work enhancer, a productivity multiplier, and a competitive necessity.</p>
<h4 id="heading-from-fear-to-fluency-why-non-tech-training-matters">From Fear to Fluency: Why Non-Tech Training Matters</h4>
<p>The fear narrative around AI – that it will take away jobs – is real and palpable in many organizations. But the more strategic view is this:</p>
<blockquote>
<p><strong>Don’t fire your workforce. Train them.</strong></p>
</blockquote>
<p>Rather than replacing administrative staff, compliance officers, relationship managers, operations teams, and analysts, leading financial organizations are upskilling their existing talent to work <em>with</em> AI, not <em>against</em> it.</p>
<p>Training non-technical team members in generative AI offers two major business advantages:</p>
<ol>
<li><p><strong>Productivity gains</strong>: Teams can automate repetitive, low-value tasks and focus more on decision-making and strategy.</p>
</li>
<li><p><strong>Talent retention</strong>: Employees feel more secure and valued when their employers invest in their future.</p>
</li>
</ol>
<h4 id="heading-use-cases-where-non-tech-teams-in-finance-can-gain-from-ai-training">Use Cases: Where Non-Tech Teams in Finance Can Gain from AI Training</h4>
<p>Non-technical employees in banking, asset management, insurance, and fintech can immediately apply generative AI tools across their workflows. Here’s how:</p>
<ol>
<li><strong>Compliance & Legal Teams</strong></li>
</ol>
<ul>
<li><p>Use ChatGPT or Claude to summarize regulatory documents, contracts, and internal audit reports.</p>
</li>
<li><p>Use Phoenix to draft standard policies and regulatory templates, saving hours of manual editing.</p>
</li>
<li><p>Extract key clauses from loan agreements or KYC policies.</p>
</li>
<li><p>Draft internal memos or SAR summaries 2–3x faster.</p>
</li>
</ul>
<ol start="2">
<li><strong>Finance, Accounting, and Operations</strong></li>
</ol>
<ul>
<li><p>Automate spreadsheet generation and financial modeling using Microsoft Copilot in Excel.</p>
</li>
<li><p>Reconcile data from multiple sources and generate summary reports.</p>
</li>
<li><p>Draft and revise standard Jira tickets or issue documentation using Phoenix, bridging business and IT communication.</p>
</li>
</ul>
<ol start="3">
<li><strong>Sales, Relationship Management, and Customer Service</strong></li>
</ol>
<ul>
<li><p>Use generative chat tools to personalize client interactions.</p>
</li>
<li><p>Draft follow-up emails, presentations, and pitch summaries.</p>
</li>
<li><p>Summarize meeting transcripts and extract actionable items.</p>
</li>
</ul>
<ol start="4">
<li><strong>Marketing and Communications</strong></li>
</ol>
<ul>
<li><p>Use AI to generate segmented content for different client audiences.</p>
</li>
<li><p>Produce A/B tested campaign text, product updates, and social posts.</p>
</li>
<li><p>Translate campaigns quickly for global markets.</p>
</li>
</ul>
<ol start="5">
<li><strong>Risk & Audit</strong></li>
</ol>
<ul>
<li><p>Summarize findings from large datasets or transaction logs.</p>
</li>
<li><p>Generate first-draft risk assessments and credit memos.</p>
</li>
<li><p>Highlight inconsistencies or anomalies with contextual explanation.</p>
</li>
</ul>
<h4 id="heading-the-cost-of-not-training-a-missed-opportunity">The Cost of Not Training: A Missed Opportunity</h4>
<p>Non-technical employees touch every part of your organization – operations, client relations, document handling, and decision support. If they are not AI-enabled, your business is flying with one wing.</p>
<p>Training these employees doesn't mean turning them into engineers. It means:</p>
<ul>
<li><p>Teaching them how to <strong>interact effectively with AI</strong></p>
</li>
<li><p>Helping them become <strong>critical evaluators</strong> of AI output</p>
</li>
<li><p>Guiding them to <strong>avoid over-reliance or misuse</strong> of AI tools</p>
</li>
</ul>
<p>This form of AI literacy is the new digital literacy – essential for everyone, not just technologists.</p>
<p><a target="_blank" href="https://lunartech.ai/programs/ai-for-executives"></a></p>
<h2 id="heading-chapter-7-ai-for-executives-ai-education-amp-enablement-in-finance-workshops-tools-services-and-training-resources">Chapter 7: AI for Executives, AI Education & Enablement in Finance – Workshops, Tools, Services, and Training Resources</h2>
<p>The most innovative financial institutions no longer see AI training as a "nice-to-have." In an increasingly algorithmic economy, where generative AI tools are reshaping everything from compliance to capital allocation, AI education is an investment in strategic resilience.</p>
<p>This section offers a clear, credible breakdown of how to get your teams – executive and operational – up to speed through trusted workshops, tools, agencies, and courses. It emphasizes the value of enabling internal transformation instead of relying solely on outside hires.</p>
<h3 id="heading-ai-certifications-for-banking-professionals">AI Certifications for Banking Professionals</h3>
<p>Several industry and educational organizations offer certification programs specifically designed for finance professionals:</p>
<ol>
<li><p><strong>Generative AI In Finance and Banking Certification</strong>: This program teaches applications of generative AI models, including generative adversarial networks (GANs) and transformers for predicting market trends, automating financial tasks, and enhancing customer experiences. You can <a target="_blank" href="https://www.coursera.org/learn/gen-ai-gov-financial-reporting">learn more about the cert here</a>.</p>
</li>
<li><p><strong>Certificate in Digital & AI Evolution in Banking</strong>: This certification helps professionals understand the digital transformation in banking, including regulatory considerations and the risks and benefits of technology adoption. You can <a target="_blank" href="https://www.charteredbanker.com/qualification/certificate-in-digital-ai-evolution-in-banking.html">learn more about the cert here</a>.</p>
</li>
<li><p><strong>Machine Learning for Investment Professionals</strong>: Offered by the CFA Institute, this program focuses on machine learning applications specifically for investment management and analysis. You can learn more about the <a target="_blank" href="https://www.coursera.org/specializations/investment-management-python-machine-learning">Investment Management with Python and Machine Learning specialization here</a>, and the <a target="_blank" href="https://credentials.cfainstitute.org/beac8f10-6df8-43cc-8117-4b54ab119f9f#acc.53PylEDh">CFA Institute Machine Learning course here</a>.</p>
</li>
</ol>
<p>Columbia Business School's <a target="_blank" href="https://wallstreetprep.business.columbia.edu/ai-certification/">AI for Business & Finance Certificate Program</a> is particularly noteworthy, as it "has been designed for professionals in the business and finance world who need to learn AI but don't really have a technical background". This eight-week course covers AI fundamentals, Python programming for finance, predictive analytics, and generative AI business applications.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In an era where artificial intelligence is reshaping the financial landscape, executives and teams need to recognize that adapting to AI is not just a strategic advantage – it's a survival imperative. Just as we've successfully navigated previous technological revolutions, from the internet and cloud computing to blockchain and big data, AI presents an opportunity to democratize access to cutting-edge tools, empowering a broader range of professionals to innovate in ways that were once unimaginable.</p>
<p>This inclusivity has already sparked breakthroughs in predictive analytics, risk management, and personalized services, allowing even smaller institutions to compete on a global scale. That said, AI's integration into finance is far from novel. Leading institutions have deployed these technologies for years, embedding them into core operations like fraud detection and algorithmic trading.</p>
<p>Yet, for newcomers or those refreshing their approach, the relevance remains profound. Ongoing updates and advancements – such as enhanced natural language processing models and real-time data ingestion capabilities – continually amplify the potential for investment managers, AI specialists, and broader teams, unlocking efficiencies and insights that elevate professional capabilities to new heights.</p>
<p>To harness this potential and maintain a competitive edge, continuous upskilling is essential. Executives and teams alike should commit to updating their knowledge base through targeted education programs, workshops, and resources, ensuring they stay ahead of the curve.</p>
<p>Ultimately, AI can be a force for profound good. At LunarTech, we don't foresee it leading humanity to doom – instead, in a world facing complex challenges like economic volatility and climate risks, AI stands as a powerful ally, one that could very well guide us toward solutions and a brighter future. By embracing it thoughtfully, the financial sector can lead this transformation, fostering innovation that benefits all.</p>
<h3 id="heading-newsletters-to-follow-for-fintech">Newsletters to Follow for FinTech</h3>
<h4 id="heading-our-newsletter"><strong>Our Newsletter</strong></h4>
<p><strong>LUNARTECH Newsletter</strong> - <a target="_blank" href="https://lunartech.substack.com/">https://lunartech.substack.com/</a></p>
<h4 id="heading-us-personal-finance-amp-investment-newsletters">US Personal Finance & Investment Newsletters</h4>
<ul>
<li><p><a target="_blank" href="https://www.bloomberg.com/account/newsletters/money-stuff">Money Stuff (Matt Levine, Bloomberg)</a>: Witty, in-depth takes on Wall Street and finance.</p>
</li>
<li><p><a target="_blank" href="https://tker.co/">TKer (Sam Ro)</a>: Stock market insights and long-term investment themes.</p>
</li>
<li><p><a target="_blank" href="https://www.jillonmoney.com/newsletter">Jill on Money (Jill Schlesinger)</a>: Financial news and expert advice, weekly.</p>
</li>
<li><p><a target="_blank" href="https://behaviorgap.com/newsletter">Behavior Gap (Carl Richards)</a>: Simple sketches and insights on money and decision-making.</p>
</li>
<li><p><a target="_blank" href="https://marketbriefs.com/">The Minority Mindset / Market Briefs (Jaspreet Singh)</a>: Daily, concise financial news and wealth-building tips.</p>
</li>
<li><p><a target="_blank" href="https://www.execsum.co/">Exec Sum (Litquidity)</a>: Quick, reliable summaries of major finance news.</p>
</li>
</ul>
<h4 id="heading-baltic-amp-regional-newsletters">Baltic & Regional Newsletters</h4>
<ul>
<li><p><a target="_blank" href="https://www.fintechbaltic.com/">Fintech News Baltic</a>: News and trends in Baltic fintech, startups, and digital finance.</p>
</li>
<li><p><a target="_blank" href="https://www.linkedin.com/newsletters/fintech-digest-6889260213572755456/">Linas Beliūnas – FinTech Digest (LinkedIn)</a>: Personal insights on fintech, AI, and digital assets from a leading Lithuanian expert.</p>
</li>
<li><p><a target="_blank" href="https://changeventures.com/newsletter/">Change Ventures Weekly</a>: Baltic startup and VC news, funding rounds, and hiring.</p>
</li>
</ul>
<ul>
<li><a target="_blank" href="https://thecfoclub.com/subscribe/">CFO Club Newsletter</a>: Modern finance newsletter for tech sector CFOs and leaders-trends, tips, and innovation.</li>
</ul>
<div class="embed-wrapper">
        </div>
<p> </p>
<h3 id="heading-lunartech-ai-for-executives"><strong>LunarTech AI for Executives</strong></h3>
<p>For leaders and frontline professionals who <em>feel the pressure to “get AI” but don’t speak code</em>, this 1- to 3-day program delivers exactly what you need: no fluff, no jargon. In clear language, we unpack how generative AI, large-language models, and regulatory frameworks such as the EU AI Act are reshaping compliance, risk, and client service.</p>
<p>Next, we roll up our sleeves. You’ll practice with ChatGPT, Phoenix, Gemini<strong>,</strong> and other curated tools to summarize 200-page reports in minutes, flag hidden risks, and automate repetitive workflows. Expect live demos, breakout labs, and case studies drawn straight from banking, asset management, and insurance.</p>
<p>By the final session you’ll have a road-ready playbook for piloting AI safely – from data-governance checklists to ROI metrics your CFO will love<em>.</em> Graduates leave with a certificate, a toolkit of prompts, and the confidence to champion AI initiatives inside their own departments.</p>
<ul>
<li><p><strong>Format:</strong> Online or on-site, 1–3 days</p>
</li>
<li><p><strong>Cost:</strong> $997 per participant</p>
</li>
</ul>
<p>Apply Here: <a target="_blank" href="https://lunartech.ai/programs/ai-for-executives">https://lunartech.ai/programs/ai-for-executives</a></p>
<p><a target="_blank" href="https://academy.lunartech.ai/"></a></p>
<h3 id="heading-lunartech-academy">LunarTech Academy</h3>
<p>Our Academy is the always-on learning hub that keeps finance professionals current long after the headlines fade. Courses are modular and industry-specific, so a portfolio manager can master forecasting in Python while a relationship manager explores generative-AI productivity hacks – all under one roof.</p>
<p>Every track is written by practitioners who ship models in production, not theorists. Expect bite-size videos, step-by-step notebooks, and capstone projects pulled from real trading, risk, and compliance datasets. Learners can move at their own pace or join live cohorts for instructor feedback and peer discussion.</p>
<p>Managers love us for the built-in LMS integration, progress analytics, and team licensing that scales from five seats to five hundred. Whether you need to onboard new hires fast or reskill an entire division, the Academy delivers measurable, trackable outcomes.</p>
<ul>
<li><p><strong>Format:</strong> Self-paced or instructor-led; team licenses available</p>
</li>
<li><p><strong>Cost:</strong> $49.97 – $199.97 per month</p>
</li>
</ul>
<p>Apply Here: <a target="_blank" href="https://academy.lunartech.ai/">https://academy.lunartech.ai/</a></p>
<h3 id="heading-other-resources">Other Resources</h3>
<ul>
<li><p>Lens | LUNARTECH - <a target="_blank" href="https://lens.lunartech.ai/">https://lens.lunartech.ai/</a></p>
</li>
<li><p>YouTube | LUNARTECH - <a target="_blank" href="https://www.youtube.com/@lunartech_ai">https://www.youtube.com/@lunartech_ai</a></p>
</li>
<li><p>Linkedin | LUNARTECH - <a target="_blank" href="https://www.linkedin.com/company/lunartechai/">https://www.linkedin.com/company/lunartechai/</a></p>
</li>
<li><p>Substack | LUNARTECH - <a target="_blank" href="https://lunartech.substack.com/">https://lunartech.substack.com/</a></p>
</li>
</ul>
 
</article>
<article>
<h1> How to Build a Custom MCP Server with TypeScript – A Handbook for Developers </h1>
<p>Sumit Saha — Wed, 25 Jun 2025 16:31:35 +0000</p>
 <p>MCP (Model Context Protocol) lets you connect your code, data, and tools to AI applications like Claude and Cursor. This handbook explains how it works with real-world analogies, and shows you how to build a custom MCP server using TypeScript that feeds live data into an AI environment.</p>
<h3 id="heading-heres-what-well-cover">Here’s what we’ll cover:</h3>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-the-model-context-protocol-mcp">What is the Model Context Protocol (MCP)?</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-does-protocol-mean">What does "Protocol" mean?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-a-model">What is a "Model"?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-context">What is "Context"?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-putting-it-all-together-what-is-model-context-protocol">Putting It All Together: What is Model Context Protocol?</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-why-mcp-is-necessary">Why MCP is Necessary</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-the-mcp-connector-in-action">The MCP Connector in Action</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-universal-access-across-platforms">Universal Access Across Platforms</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-developers-are-key">Developers are Key</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-beyond-built-in-integrations">Beyond Built-in Integrations</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-power-of-reusability">The Power of Reusability</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-burden-without-mcp">The Burden without MCP</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-a-practical-github-example">A Practical GitHub Example</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-mcp-matters-for-developers">Why MCP Matters for Developers</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-rag-vs-mcp">RAG vs MCP</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-rag">What is RAG</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-rag-the-mise-en-place-prep">RAG: The “Mise en Place” Prep</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-mcp-the-rolling-assistant-cart">MCP: The Rolling Assistant Cart</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-bringing-it-all-together">Bringing It All Together</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-mcp-documentation">MCP Documentation</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-ai-apps-talk-to-mcp-servers-a-practical-example">How AI Apps Talk to MCP Servers — A Practical Example</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-scenario-asking-claude-about-your-schedule">Scenario: Asking Claude About Your Schedule</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-discovering-the-right-mcp-server">Discovering the Right MCP Server</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-mcp-server-fetches-and-returns-the-data">MCP Server Fetches and Returns the Data</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-model-converts-structured-data-into-natural-language">Model Converts Structured Data into Natural Language</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-under-the-hood-abstracting-the-complexity">Under the Hood: Abstracting the Complexity</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-mirroring-standard-web-app-workflows">Mirroring Standard Web App Workflows</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-how-mcp-servers-work-internally">How MCP Servers Work Internally</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-mcp-architecture-how-it-all-fits-together">The MCP Architecture — How It All Fits Together</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-1-mcp-host">1. MCP Host</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-2-mcp-client">2. MCP Client</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-3-mcp-server">3. MCP Server</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-4-data-sources-local-or-remote">4. Data Sources – Local or Remote</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-opportunities-for-web-developers">Opportunities for Web Developers</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-sdk-options-pick-your-language">SDK Options: Pick Your Language</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-from-backend-service-to-ai-enabled-developer">From Backend Service to AI-Enabled Developer</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-mcp-server-setup-and-integration">MCP Server Setup and Integration</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-summary">Summary</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To follow along and get the most out of this guide, you should have:</p>
<ol>
<li><p><strong>Basic understanding of TypeScript or JavaScript:</strong> While we’ll use TypeScript here, knowledge of JavaScript alone is enough to follow the examples.</p>
</li>
<li><p><strong>Familiarity with Node.js and npm:</strong> You should know how to initialize a project, install packages, and run scripts using node and npm.</p>
</li>
<li><p><strong>Experience with working in the terminal/command line:</strong> Especially for understanding concepts like stdin and stdout, and running local servers.</p>
</li>
<li><p><strong>Comfort with environment variables (.env files):</strong> You’ll be setting API keys and other sensitive data in a .env file.</p>
</li>
<li><p><strong>Basic knowledge of REST APIs and HTTP concepts:</strong> This helps in understanding how we used AI tools to fetch context before MCP and why MCP simplifies the process.</p>
</li>
<li><p><strong>Familiarity with Google Cloud / API Console (optional but recommended):</strong> Since this handbook involves integrating with Google Calendar, you should know how to:</p>
<ul>
<li><p>Generate a public Google API key</p>
</li>
<li><p>Find or create a Google Calendar and access its ID</p>
</li>
</ul>
</li>
<li><p><strong>Cursor editor installed (optional but recommended):</strong> To follow the final integration steps with the AI-powered code editor.</p>
</li>
<li><p><strong>Some exposure to AI tools like Claude, Cursor, or ChatGPT:</strong> This helps you grasp how MCP bridges external data with AI context.</p>
</li>
</ol>
<p>I’ve also created a video to go along with this handbook. If you’re the type who likes to learn from video as well as text, you can check it out here:</p>
<div class="embed-wrapper">
        </div>
<p> </p>
<h2 id="heading-what-is-the-model-context-protocol-mcp">What is the Model Context Protocol (MCP)?</h2>
<p>Let's start from the very beginning: what exactly is the MCP? MCP stands for <strong>Model Context Protocol</strong>. And if we break it down word by word – "model", "context", and "protocol" – it actually becomes quite easy to understand.</p>
<p>But before diving in, here's a quick background: Model Context Protocol was developed by a company called <a target="_blank" href="https://www.anthropic.com"><strong>Anthropic</strong></a>. You've probably heard of them. They're the ones who built <a target="_blank" href="https://claude.ai"><strong>Claude</strong></a>, the popular AI assistant. They first introduced MCP in November of 2024, and in a short time it’s become a standard adopted by tons of other companies as well, including Microsoft.</p>
<p>Now, let's explore what MCP really means by understanding each term.</p>
<h3 id="heading-what-does-protocol-mean">What does "Protocol" mean?</h3>
<p>Let's start with the last word: Protocol. What does "Protocol" mean? Well, it’s a set of rules.</p>
<p>As developers, we work with protocols all the time. For example, when we work with the <strong>HTTP protocol</strong>, it's not just random communication – there's a set of rules we follow. When we build REST APIs, we use specific methods like <code>GET</code>, <code>POST</code>, <code>PUT</code>, <code>PATCH</code>, or <code>DELETE</code>. We transfer data in specific formats like JSON, XML, or even JSON-RPC. All of this is structured communication that follows a protocol.</p>
<p>In a similar way, AI agents or AI-based applications also need to follow a structured approach when exchanging information. We'll explore that more in a bit, but for now, just remember: a protocol is simply a set of rules.</p>
<h3 id="heading-what-is-a-model">What is a "Model"?</h3>
<p>Next, let's talk about the “Model”. The term "Model" is something you’re likely already quite familiar with. We all use models in one way or another, especially large language models or LLMs.</p>
<p>Take GPT from OpenAI, Gemini from Google, Claude from Anthropic – you may use these every day. There are tons of models available now, like the newer DeepSeek and so on. The point is, we already interact with models regularly. We ask questions, and they give us answers.</p>
<p>But have you ever wondered how these LLMs actually work? Most people think that when you ask a model something, it goes and searches the internet for answers. But that's not how it works.</p>
<p>What these models actually do is <strong>predict the next word</strong> in a sentence – that's it. They're <strong>language experts</strong> – they don't know "facts" in real-time or pull live data from the web. Instead, they've been trained with a huge amount of information beforehand (pre-trained). Then when you ask something, they try to figure out: "What word most likely comes next based on what the user just said?"</p>
<p>That's why when you ask something, the reply appears word by word – like it's typing. And no, that's not some fancy frontend animation. That's just how LLMs work: they predict one word at a time. It looks like typing because it is being generated in real time, one word at a time.</p>
<p>That's the core of <strong>Generative AI</strong>. They're experts in natural language – understanding how we speak, predicting what we're likely to say next and generating responses accordingly.</p>
<h3 id="heading-what-is-context">What is "Context"?</h3>
<p>Now let's move to “Context”. In English, context means the subject or background of something. For example, when you send an email, you add a subject line. And just by looking at the subject, the recipient gets an idea of what the email is about – even before opening it.</p>
<p>Similarly, when you talk to a model like ChatGPT or Claude, you provide a few lines – maybe a question or some background. That input becomes the “Context”.</p>
<p>The model's response entirely depends on the context you provide. It uses that context to start predicting the next word. If the model already knows what you're referring to, based on the context, it'll give you an accurate answer. But if you don't provide enough context, it can't help you properly – even if it's a powerful LLM.</p>
<p>Let me give you a simple example: Suppose you go to Claude and ask, <em>"Who am I?"</em> Will it be able to answer that? No, it won't. But if in a previous message you had told Claude, <em>"Hey, I'm Sumit"</em> and then later in the same session ask <em>"Who am I?"</em>, it will say, <em>"You're Sumit."</em> Why? Because now it has context.</p>
<p>So, context is just background info – and the better context you give, the better the model can respond. That's how these LLMs are designed to work.</p>
<h3 id="heading-putting-it-all-together-what-is-model-context-protocol">Putting It All Together: What is Model Context Protocol?</h3>
<p>So when we say “Model Context Protocol”, we're talking about a <strong>set of rules or protocols</strong> that define how to feed <strong>context</strong> into a <strong>model</strong>. Now, what is this context we're feeding? It could be any kind of external information – something outside the model's default knowledge.</p>
<p>It’s like going to Claude Desktop and telling it, <em>“Hey, I’m Sumit!”</em> and then asking "<em>Who am I?</em>". Again, it’ll know because you told it before.</p>
<p>But here's the catch: models don't magically know about your calendar, your emails, your databases or your files. So how do you make that data available to them? That's where MCP comes in.</p>
<p>MCP lets us feed these external pieces of information – like your schedule, your project data, or anything else – into a model, but in a structured and standardized way. And that's what makes MCP so powerful.</p>
<p></p>
<h2 id="heading-why-mcp-is-necessary">Why MCP is Necessary</h2>
<p>Now that you understand what MCP is, let's talk about why we need it. Why did Anthropic even invent this thing in the first place?</p>
<p>Let's think about how we use different code editors in our day-to-day work. One really powerful, AI-equipped modern code editor is <a target="_blank" href="https://www.cursor.com">Cursor</a>. Personally, I don't use it regularly, but it’s perfect for this demonstration. Imagine you are inside your Cursor editor. And, as many of you know, you can chat with Cursor while coding. You can ask it to explain something, generate code, refactor logic, and so on.</p>
<h3 id="heading-the-mcp-connector-in-action">The MCP Connector in Action</h3>
<p>Now let's say you ask Cursor something that depends on data from your local machine – maybe a large email database or your own personal documents. Can Cursor access that data by default? No, it can't. But what if – and this is the important part – what if you connect a custom-made component to Cursor?</p>
<p>Let's call it an <strong>MCP server</strong>. If you connect your MCP server to Cursor, then here's what happens: Cursor still can't access your files directly. But now, when you ask it a question, it will turn to this MCP server and say: "<em>Hey, do you know anything about this?</em>" And the MCP server – since you've built it to connect with your files or databases – will fetch the relevant information, turn it into context, and feed that back to the model. Now the model has the necessary background to generate a smart, informed reply.</p>
<p>And the best part? You're not limited to just one connector. You can connect multiple MCP servers to your application.</p>
<h3 id="heading-universal-access-across-platforms">Universal Access Across Platforms</h3>
<p>Let's now walk through a real example – something I'll actually show you later with code examples in this handbook.</p>
<p></p>
<p>Say you ask Cursor: "<em>Do I have any meetings today?</em>" Now to answer that, the AI would need access to your schedule, right? Let's say you use Google Calendar to manage your meetings. Can Cursor directly connect to your Google Calendar? No, it can't. And not just Cursor – ChatGPT or Claude can't access your calendar either, not unless you manually build that integration.</p>
<p>But here's the thing: what if you want this to work universally? Like, no matter where you ask the question from? You might ask it from Cursor today, but someone else might ask from ChatGPT tomorrow.</p>
<p>In both cases, we want these tools to access your calendar and return the same result. To make that possible, we need a universal way to connect – and that's exactly what an MCP server enables. If you create an MCP server that follows the protocol and hooks into your calendar, then any AI application that supports MCP can connect to it and get the right context. That's another reason MCP is so powerful.</p>
<p></p>
<h3 id="heading-developers-are-key">Developers are Key</h3>
<p>And the best part? You (the developer) are the one who will build these MCP servers. This isn't something regular users can build – you need coding skills for this.</p>
<p>This is one reason AI won’t replace developers just yet :)</p>
<h3 id="heading-beyond-built-in-integrations">Beyond Built-in Integrations</h3>
<p>Let's compare that with what used to happen before. For example, today, ChatGPT lets you do web searches – you can just ask it to find something online, and it'll fetch the result. But this feature is only there because <a target="_blank" href="https://openai.com">OpenAI</a>, the makers of ChatGPT, built it into the app.</p>
<p>Now imagine your own product – like my logicBase Labs website. Let's say students come to the site and ask questions through a chat box you’ve built. That AI assistant belongs to you – it's part of your software. You can connect it to any model, like GPT, Claude, whatever, that understands natural language. But you still need to feed it the right information so it can respond meaningfully.</p>
<p>So what do you do? You build your own MCP server, maybe using Node.js, Python, or Java – whatever tech stack you're comfortable with. This MCP server is a completely standalone app. Now you also build your chat interface – the UI where students type questions. You connect it to the LLM (like GPT or Claude) and to your custom-built MCP server.</p>
<h3 id="heading-the-power-of-reusability">The Power of Reusability</h3>
<p>Here's the best part: your MCP server is now independent and reusable. You could even give it to another company, like another EdTech company wants to use your calendar or data handling logic. They can just modify your MCP server, replace the logic with their own data, and use it with their chat client. And boom – it's a universal solution now.</p>
<p>Even better, let's say the data inside my logicBase Labs website changes in the future. No problem! I won’t need to rewrite the connector logic. The code that fetches and formats the data stays the same. The content might change, but the structure is stable.</p>
<h3 id="heading-the-burden-without-mcp">The Burden without MCP</h3>
<p>But if I wasn’t using MCP, what would I have to do? I’d need to build everything into my client. Every AI assistant would need to carry the burden of logic, context building, and data retrieval. If anything changed – say the GitHub repo’s structure, the schedule format, or the database schema – I’d have to go and update every single client individually. That's a nightmare!</p>
<h3 id="heading-a-practical-github-example">A Practical GitHub Example</h3>
<p>Let me give you another solid example. Suppose you want to connect your GitHub to Cursor. You want to say something like: "<em>Hey, push my code to GitHub</em>" – and it just works. To make that happen without MCP, what would you normally need to do? You'd have to:</p>
<ul>
<li><p>Read through the <a target="_blank" href="https://docs.github.com/en/rest">GitHub API documentation</a></p>
</li>
<li><p>Write integration logic</p>
</li>
<li><p>Handle OAuth authentication</p>
</li>
<li><p>Deal with access tokens and API limits</p>
</li>
</ul>
<p>It's complex. It's messy. But imagine this: What if GitHub themselves released their own MCP server? Then all you need to do is:</p>
<ul>
<li><p>Plug that MCP server into Cursor</p>
</li>
<li><p>Let the model discover the capabilities</p>
</li>
<li><p>Say: "<em>Push my code</em>"</p>
</li>
</ul>
<p>And boom – it works! You don't need to write any custom integration logic. That's the magic of MCP. And here's the best part: GitHub already released their <a target="_blank" href="https://github.com/github/github-mcp-server">official MCP server</a>. You can use it right now.</p>
<h3 id="heading-why-mcp-matters-for-developers">Why MCP Matters for Developers</h3>
<p>So I hope you now see the bigger picture. MCP servers are a game-changer. They don't just reduce your workload – they create new job opportunities for developers like us. This isn't going to "replace your job". Rather, it's creating new, valuable work that didn't exist before.</p>
<h2 id="heading-rag-vs-mcp">RAG vs MCP</h2>
<p>Now that we’ve covered MCP, let’s look at another popular approach called <a target="_blank" href="https://en.wikipedia.org/wiki/Retrieval-augmented_generation">RAG</a> and see how they differ. Many AI builders start by using RAG to ground their models in static knowledge, so it’s helpful to see how that approach compares to streaming live data with MCP.</p>
<h3 id="heading-what-is-rag">What is RAG?</h3>
<p>First up, what is RAG? <strong>Retrieval-Augmented Generation</strong> is a technique in which an AI model reaches out to an external “library” of documents at the moment you ask a question. It pulls back just the pages it needs, tucks them into your prompt and then writes its answer using those exact excerpts. In other words, it dynamically augments itself with relevant text from a large corpus.</p>
<h3 id="heading-rag-the-mise-en-place-prep">RAG: The “Mise en Place” Prep</h3>
<p>Imagine you’re the head chef preparing for service. Before the doors open, you and your team do a full <strong>mise en place</strong>: chop, measure, and arrange every ingredient on your counter so it’s ready the moment you need it. When orders start flying in, you simply grab what’s already laid out – no running back to the pantry.</p>
<p>How it works:</p>
<ol>
<li><p>Retrieve: Your system searches a document store for the most relevant “ingredients” (text snippets).</p>
</li>
<li><p>Augment: Those snippets get mixed into your AI prompt.</p>
</li>
<li><p>Generate: The model cooks up an answer grounded in that batch of information.</p>
</li>
</ol>
<p>RAG is great for static or rarely changing content (think policy manuals, research papers, or any “recipe book” that doesn’t get rewritten mid-service).</p>
<h3 id="heading-mcp-the-rolling-assistant-cart">MCP: The Rolling Assistant Cart</h3>
<p>Now imagine halfway through dinner you realize you need a fresh herb or a special garnish that wasn’t prepped. Instead of halting the kitchen, you wheel over an assistant cart loaded with whatever new items appear – they bring you that garnish the second it’s ready.</p>
<p>How it works:</p>
<ol>
<li><p>Subscribe/Stream: Your AI client opens a live line to the data source.</p>
</li>
<li><p>Deliver: As soon as new data (like a live order update or sensor reading) is available, it rolls up to you.</p>
</li>
<li><p>Consume: Your model can tap into that fresh data anytime during generation.</p>
</li>
</ol>
<p>MCP is great for scenarios needing up-to-the-minute info (like live dashboards, chatbots feeding off recent user activity, IoT sensor streams, and so on).</p>
<h3 id="heading-bringing-it-all-together">Bringing It All Together</h3>
<ul>
<li><p>RAG alone: Best when your "mise en place" is extensive enough to cover everything you need – pre-prepared background knowledge.</p>
</li>
<li><p>MCP alone: Required when you need “a rolling cart” of fresh ingredients at one's fingertips.</p>
</li>
<li><p>Combined approach: Do your background “mise en place” with RAG for in-depth context, and keep the assistant cart rolling with MCP to provide live updates – so your AI has deep background knowledge along with real-time freshness.</p>
</li>
</ul>
<h2 id="heading-mcp-documentation">MCP Documentation</h2>
<p>Now let's check out the <a target="_blank" href="https://modelcontextprotocol.io/introduction">official MCP documentation</a>. It’ll help things start to feel much clearer. So what does the definition say?</p>
<blockquote>
<p>MCP is an open protocol that standardizes how applications provide context to LLMs. (<a target="_blank" href="https://modelcontextprotocol.io/introduction">Source: MCP Documentation</a>)</p>
</blockquote>
<p>Yep – exactly what we've already talked about. And then comes a brilliant line from the docs:</p>
<blockquote>
<p>Think of MCP like a USB-C port for AI applications. (<a target="_blank" href="https://modelcontextprotocol.io/introduction">Source: MCP Documentation</a>)</p>
</blockquote>
<p>Let's pause here, because this analogy is super important. Think about the USB-C port on modern devices. We all use it. But remember how things were before? Back in the day, your computer would have tons of different ports – HDMI, VGA, USB-A, audio jack, you name it. You'd have to manage different cables for everything. Maybe your mouse was USB-A, your keyboard used some other port, and your external monitor needed HDMI. It was a mess.</p>
<p></p>
<p>But now? Everything uses USB-C. One universal connector for data, power, audio, video – everything. That's exactly what MCP is for AI applications. Instead of building separate integrations or connectors for each AI tool (Cursor, ChatGPT, Claude, and so on), you now build one standardized MCP server and any AI tool that supports MCP can connect to it. That's why this protocol is such a big deal.</p>
<h2 id="heading-how-ai-apps-talk-to-mcp-servers-a-practical-example">How AI Apps Talk to MCP Servers — A Practical Example</h2>
<p>Let me walk you through one more example just to help you really get this. Imagine you're using Claude. You ask it a simple question: “<em>Do I have a meeting today?"</em></p>
<h3 id="heading-scenario-asking-claude-about-your-schedule">Scenario: Asking Claude About Your Schedule</h3>
<p>Now, Claude doesn't actually have that information. If you haven't connected any MCP server, it'll give you a vague answer. Probably something nice and generic, because it's good at natural language – but not specific. But if you want a real answer – something factual – you need to feed it context. And that's where the MCP server steps in.</p>
<p></p>
<h3 id="heading-discovering-the-right-mcp-server">Discovering the Right MCP Server</h3>
<p>Let's say Claude is connected to an MCP server. Now things get interesting. As soon as you ask the question, Claude will first look at the list of MCP servers it's connected to. Then it'll intelligently choose the right one and ask:</p>
<p><em>"Hey, what are your capabilities?"</em></p>
<p>Because your MCP server might be able to do many things. Maybe it can:</p>
<ul>
<li><p>Give a full list of calendar events</p>
</li>
<li><p>Check if there's a meeting on a specific day</p>
</li>
<li><p>Fetch data from Google Calendar</p>
</li>
<li><p>Summarize documents</p>
</li>
</ul>
<p>So the model first figures out:</p>
<p><em>"Which of these capabilities do I need?"</em></p>
<p>In this case, it decides:</p>
<p><em>"Okay, I just need to know if the user has a meeting today."</em></p>
<p>Then Claude sends a message to the MCP server – in a specific format, which we'll look at shortly. It's kind of like how REST APIs work. The message says something like:</p>
<p><em>"Here's the date. Tell me if the user has a meeting."</em></p>
<h3 id="heading-mcp-server-fetches-and-returns-the-data">MCP Server Fetches and Returns the Data</h3>
<p>Now the MCP server takes that input, connects to Google Calendar (or whichever source you've set it up with) and runs the necessary logic. Eventually, it sends back a response, usually in a structured format like <strong>JSON-RPC</strong>. It might return a list of meetings or just one – whatever applies.</p>
<h3 id="heading-model-converts-structured-data-into-natural-language">Model Converts Structured Data into Natural Language</h3>
<p>Now here's the beauty of it. Even though the MCP server is giving back something technical (like JSON), Claude will never show that to the user. Because it's a <strong>language model</strong>, it will convert that structured data into a smooth, natural sentence like:</p>
<p><em>"Yes, you have a meeting with Dr. Chuck at 4 PM."</em></p>
<p></p>
<h3 id="heading-under-the-hood-abstracting-the-complexity">Under the Hood: Abstracting the Complexity</h3>
<p>To the user, it feels like magic. But behind the scenes, a lot just happened:</p>
<ul>
<li><p>The model found the right MCP server</p>
</li>
<li><p>It selected the right capability</p>
</li>
<li><p>It passed the correct input</p>
</li>
<li><p>The server ran logic, got the data, and returned a structured result</p>
</li>
<li><p>And finally, the model turned that into human language</p>
</li>
</ul>
<h3 id="heading-mirroring-standard-web-app-workflows">Mirroring Standard Web App Workflows</h3>
<p>This is exactly how our websites work too. Let's say a user visits your website and types something in a message box. You fetch data in the backend, maybe call an API or run a DB query. That response comes back in JSON — but the user never sees that. What they see is the final polished UI response. Same principle here. So I hope it's now clear how powerful this system is.</p>
<h2 id="heading-how-mcp-servers-work-internally">How MCP Servers Work Internally</h2>
<p>Now let's go one step deeper and understand how an MCP server actually works under the hood – technically. An MCP server primarily works through something called <strong>standard input and output</strong>, or in programming terms, <code>stdin</code> and <code>stdout</code>. So what does that mean? Let's break it down with an example.</p>
<p>You know when you open a terminal in the Cursor editor, it gives you a basic shell where you can type in commands? That terminal is using your machine's standard input and output system.</p>
<p>Now typically, when websites communicate with APIs, they use REST APIs over HTTP. But with MCP servers – especially when they're used locally – we don't use HTTP. Here's why: many times, your MCP server is running on your own machine, connected to local databases or files. So instead of going through network calls, it uses direct system-level communication through <code>stdin</code> and <code>stdout</code>.</p>
<p></p>
<p>Let's say you're inside Cursor, and you type something like:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> <span class="hljs-string">"hello"</span>
</code></pre>
<p>What happens? The terminal reads your input (<code>stdin</code>), processes it and prints <code>hello</code> back to you via <code>stdout</code>. This same pattern is used by MCP servers.</p>
<p>Now imagine the AI application (like Claude) is trying to talk to your MCP server. How does it do that? It doesn't send an HTTP request like a web client. Instead, it writes the request directly into the MCP server's <strong>standard input</strong> – just like how a terminal command works. And then your MCP server reads that input, performs the necessary action (maybe it talks to Google Calendar, a database, a filesystem, whatever) and once it's done, it sends the response back using <strong>standard output</strong>.</p>
<p>Let's imagine a real-life conversation between Claude and your MCP server. You ask Claude:</p>
<p><em>"Do I have a meeting today?"</em></p>
<p>Claude realizes it doesn't have this information on its own. So what does it do? First, it discovers the tools or methods available – it checks all connected MCP servers to see what they can do. Then it intelligently figures out the best method to use. Let's say it figures out:</p>
<p><em>"Alright, I should call the</em> <code>calendar</code> <em>method."</em></p>
<p>It then writes a structured input into your MCP server's stdin, something like:</p>
<pre><code class="lang-json">{
    <span class="hljs-attr">"method"</span>: <span class="hljs-string">"calendar"</span>,
    <span class="hljs-attr">"params"</span>: {
        <span class="hljs-attr">"date"</span>: <span class="hljs-string">"2025-06-16"</span>
    }
}
</code></pre>
<p>Okay, the real format may differ, but conceptually it's like this. Your MCP server then receives that input, runs the logic, maybe pulls data from your Google Calendar, and then responds like this:</p>
<pre><code class="lang-json">{
    <span class="hljs-attr">"result"</span>: {
        <span class="hljs-attr">"meetings"</span>: [
            {
                <span class="hljs-attr">"title"</span>: <span class="hljs-string">"Team Sync"</span>,
                <span class="hljs-attr">"time"</span>: <span class="hljs-string">"4:00 PM"</span>
            }
        ]
    }
}
</code></pre>
<p>Now here's the kicker: Claude doesn't show this JSON to the user.</p>
<p>It reads that raw data, runs its natural language model, and finally says:</p>
<p><em>"Yes, you have a meeting 'Team Sync' today at 4 PM."</em></p>
<p>That's the entire lifecycle. And the user? They don't even know what's going on behind the scenes. Just like when a non-technical person uses your website, they don't know about fetch calls or JSON responses. They just see a smooth UI. Same deal here.</p>
<p>And this <code>stdin</code>/<code>stdout</code> approach works great locally – especially for data on your machine. Later, we'll see how things work differently when you connect to remote services. But for now, just remember:</p>
<p>MCP doesn't use HTTP calls for local communication. It works through the terminal – <code>stdin</code> and <code>stdout</code>.</p>
<p>And that makes it fast, secure, and incredibly flexible.</p>
<h2 id="heading-the-mcp-architecture-how-it-all-fits-together">The MCP Architecture — How It All Fits Together</h2>
<p>Let's now take a look at the MCP architecture. Once you see the structure, everything we've discussed will make even more sense. Here's what the diagram shows (the diagram was collected from the <a target="_blank" href="https://modelcontextprotocol.io/introduction">MCP documentation</a>):</p>
<p><a target="_blank" href="https://modelcontextprotocol.io/introduction"></a></p>
<p>We have a <strong>host</strong> – that could be Claude, or any AI-powered application. This host is connected to one or more <strong>MCP servers</strong> through the <strong>MCP protocol</strong>. And these MCP servers are in turn connected to <strong>external data sources</strong>, which could be local files or remote services like APIs, calendars, databases, and so on.</p>
<p>Now what does the MCP server do? It retrieves data from those external sources, prepares the appropriate context, and feeds it back to the host using the MCP protocol. That context is then used by the LLM to generate a relevant, natural-sounding response.</p>
<p>All this communication – at least when done locally – happens via standard input and output (<code>stdin</code> and <code>stdout</code>) like we discussed earlier.</p>
<p>Let's go over the components one by one.</p>
<h3 id="heading-1-mcp-host">1. MCP Host</h3>
<p>First, we have the MCP host. This is the AI application, something like Claude, Cursor, or even your own AI interface. If we compare this to traditional web architecture, the host is like the server of your website – the main brain that runs the show. In the context of Cursor, the Cursor editor itself is the MCP host.</p>
<h3 id="heading-2-mcp-client">2. MCP Client</h3>
<p>Next, we have the MCP client. So what's the client in this context? Well, in web development, think of a user's browser as the client – not the user themselves, but the actual browser that sends requests and receives responses. In the MCP world, the MCP client is the internal part of the host that connects to MCP servers.</p>
<p>Let's take Cursor again as an example.</p>
<p>If you go into Cursor's settings, you'll see something called <strong>MCP Tools</strong>. That's where you can add any custom MCP server. Cursor has a built-in client that lets you plug in your own server. If you were building your own editor like Cursor, you'd need to write this client logic yourself to handle things like discovering servers, formatting requests, and reading responses. Good news is, there's <a target="_blank" href="https://modelcontextprotocol.io/quickstart/client">already a spec and libraries</a> to help with that too.</p>
<h3 id="heading-3-mcp-server">3. MCP Server</h3>
<p>Then, of course, comes the MCP server, which we've already talked about at length. It's the tool you build that knows how to fetch or generate context from files, APIs, calendars, anything. You can make it with Node, Python, Java – anything you like. As long as it follows the protocol, it'll work. And remember – it can be reused across different AI apps. That's the beauty of MCP.</p>
<h3 id="heading-4-data-sources-local-or-remote">4. Data Sources – Local or Remote</h3>
<p>Last but not least, we have the data sources. Your MCP server needs to pull data from somewhere. That "somewhere" could be:</p>
<ul>
<li><p>A local SQLite or Postgres DB</p>
</li>
<li><p>Your file system</p>
</li>
<li><p>An external API like Google Calendar or GitHub</p>
</li>
<li><p>A third-party SaaS dashboard</p>
</li>
<li><p>Anything else that holds relevant context</p>
</li>
</ul>
<p>The point is: you abstract away the data handling into your MCP server. So the AI host doesn't care how the data is fetched – it just gets structured context in return.</p>
<p>So to recap:</p>
<ul>
<li><p>The <strong>MCP host</strong> is your AI application (like Claude, Cursor, or a custom app).</p>
</li>
<li><p>The <strong>MCP client</strong> is the bridge inside that host that connects to external MCP servers.</p>
</li>
<li><p>The <strong>MCP server</strong> is what you, the developer, build – to deliver context.</p>
</li>
<li><p>And the <strong>data sources</strong> are whatever backend services or files hold your knowledge.</p>
</li>
</ul>
<p>Everything talks to each other via the MCP protocol. And locally, it all happens through <code>stdin</code>/<code>stdout</code>, like a conversation between programs in the terminal. So that's the whole summary in one go. I hope you understand how it all works.</p>
<p></p>
<h2 id="heading-opportunities-for-web-developers">Opportunities for Web Developers</h2>
<h3 id="heading-sdk-options-pick-your-language">SDK Options: Pick Your Language</h3>
<p>Alright, soon we will start building an MCP server – and for that, we'll be using the TypeScript SDK. Now, if you look at the documentation, you'll notice that there are many SDKs available. You can build MCP servers using:</p>
<ul>
<li><p>C#</p>
</li>
<li><p>Java</p>
</li>
<li><p>Kotlin</p>
</li>
<li><p>Python</p>
</li>
<li><p>Ruby</p>
</li>
<li><p>Swift (for mobile)</p>
</li>
<li><p>and of course, TypeScript – which is essentially JavaScript with superpowers!</p>
</li>
</ul>
<p>And since JavaScript is like my mother tongue, I'll naturally go with TypeScript here. Now, don't worry – this won't be super technical. I'm not going to sit and code line-by-line with you, but I will walk you through the important parts so you get a clear understanding.</p>
<h3 id="heading-from-backend-service-to-ai-enabled-developer">From Backend Service to AI-Enabled Developer</h3>
<p>What you'll find is that everything we'll be doing here is stuff you likely already know. Because this is still just regular coding. You're going to build a backend service – just like one you may have done hundreds of times before. The only difference is that now, your application will be part of the <strong>MCP ecosystem</strong>.</p>
<p>Think of it like this: As a developer, you're not switching careers. You're not abandoning your current skills. You're still doing what you've always done – writing logic, structuring data, managing APIs. The only shift is in <strong>where</strong> you're plugging that code in. Instead of just serving HTTP requests or returning React components, now your code will be used to feed context into LLMs. And that, right there, is the bridge into the AI world.</p>
<p>Let's be real: in today's world, just building yet another CRUD application isn't enough. If your app isn't deeply integrated into the AI ecosystem, it's going to get left behind. But if you understand concepts like MCP, and if you know how to build and expose structured context to any model, then you're not just a developer anymore. You're an AI-enabled developer!</p>
<p>You're building the infrastructure that connects real-world data to AI applications. And that's huge! That's why I truly believe this whole MCP ecosystem is going to explode in the coming months and years. I believe companies all over the world are going to start building and publishing their own MCP servers – just like how everyone now builds APIs or SDKs. Soon, we'll reach a point where people won't visit your company website to fill out a form or read static FAQs. They'll just ask a question inside ChatGPT**,** Claude**,** or Cursor, like:</p>
<p><em>"What are the pricing plans for logicBase Labs?"</em></p>
<p>And they'll get a response – not because those models are trained on your website, but because you've built an MCP server that gives them real-time, personalized, authenticated data.</p>
<p>So yes, now let's go ahead and build our first MCP server – quickly and in a way that's easy to follow. Because ultimately, this is where developers like you belong: bringing together the best of your existing skills and applying them inside the AI universe.</p>
<h2 id="heading-mcp-server-setup-and-integration">MCP Server Setup and Integration</h2>
<p>So, to build an MCP server, we've landed on the official GitHub repo page for <a target="_blank" href="https://github.com/modelcontextprotocol/typescript-sdk">MCP's TypeScript SDK</a>. Now, for those of you who don't know TypeScript, there's nothing to worry about. Because TypeScript is basically a superset of JavaScript. So even if you're not familiar with TypeScript, it's totally fine. You can write your code in plain JavaScript, and since every valid JavaScript code is also valid TypeScript, you're good to go.</p>
<p>And if you're a regular JavaScript developer, you'll find everything here familiar – just like you'd expect from any typical docs. They've provided a small, simple template for a TypeScript server. It's a single, minimal server setup, and that's exactly the template I would use to build my own server. Let's walk through the setup.</p>
<p>My project is a Node.js project. I've created a <code>server.js</code> file, and honestly, that's the only file I've used in this project. All the code is written inside that one file.</p>
<p>Step-by-step, here's what I did:</p>
<h4 id="heading-1-initialize-the-project">1. Initialize the project</h4>
<pre><code class="lang-bash">npm init
</code></pre>
<p>This creates the <code>package.json</code> file.</p>
<h4 id="heading-2-install-the-required-mcp-package">2. Install the required MCP package</h4>
<p>Run the install command (mentioned in the <a target="_blank" href="https://github.com/modelcontextprotocol/typescript-sdk">docs</a>).</p>
<pre><code class="lang-bash">npm install @modelcontextprotocol/sdk
</code></pre>
<h4 id="heading-3-import-and-create-the-mcp-server">3. Import and create the MCP server</h4>
<p>I imported <code>McpServer</code> from the installed package and then created a new instance using <code>new McpServer()</code>. You need to pass an object with a name and version:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { McpServer } <span class="hljs-keyword">from</span> <span class="hljs-string">"@modelcontextprotocol/sdk/server/mcp.js"</span>;

<span class="hljs-comment">// create the MCP server</span>
<span class="hljs-keyword">const</span> server = <span class="hljs-keyword">new</span> McpServer({
    name: <span class="hljs-string">"Sumit's Calendar"</span>,
    version: <span class="hljs-string">"1.0.0"</span>,
});
</code></pre>
<h4 id="heading-4-add-a-tool-function">4. Add a tool (function)</h4>
<p>Tools are the functions your AI client can invoke. I used <code>server.tool()</code> function from the SDK and passed three things:</p>
<ul>
<li><p>A meaningful name: <code>getMyCalendarDataByDate</code> so that my AI application can understand which tool to call</p>
</li>
<li><p>Input validation using <code>zod</code></p>
</li>
<li><p>An async callback function that fetches meeting data</p>
</li>
</ul>
<pre><code class="lang-typescript"><span class="hljs-comment">// register the tool to MCP</span>
server.tool(
    <span class="hljs-string">"getMyCalendarDataByDate"</span>,
    {
        date: z.string().refine(<span class="hljs-function">(<span class="hljs-params">val</span>) =></span> !<span class="hljs-built_in">isNaN</span>(<span class="hljs-built_in">Date</span>.parse(val)), {
            message: <span class="hljs-string">"Invalid date format. Please provide a valid date string."</span>,
        }),
    },
    <span class="hljs-keyword">async</span> ({ date }) => {
        <span class="hljs-keyword">return</span> {
            content: [
                {
                    <span class="hljs-keyword">type</span>: <span class="hljs-string">"text"</span>,
                    text: <span class="hljs-built_in">JSON</span>.stringify(<span class="hljs-keyword">await</span> getMyCalendarDataByDate(date)),
                },
            ],
        };
    }
);
</code></pre>
<p>The callback receives the validated date and uses it to call an async controller function called <code>getMyCalendarDataByDate</code> that fetches data from Google Calendar. Now we will write the function.</p>
<h4 id="heading-5-google-calendar-integration">5. Google Calendar Integration</h4>
<p>First we need to install the <code>googleapis</code> package with the below command in the terminal:</p>
<pre><code class="lang-bash">npm install googleapis
</code></pre>
<p>Then import <code>google</code> object from the installed the package.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { google } <span class="hljs-keyword">from</span> <span class="hljs-string">"googleapis"</span>;
</code></pre>
<p>Now let’s write the function <code>getMyCalendarDataByDate</code> and call the <code>google.calendar</code> method according to <a target="_blank" href="https://developers.google.com/workspace/calendar/api/quickstart/nodejs">Google Calendar API</a>. This <code>google.calendar()</code> method receives an object as parameter and we need to mention <code>version</code> and <code>auth</code> here. <code>version</code> is simply the Calendar API version number and <code>auth</code> is the Google API Public Key for authentication.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">getMyCalendarDataByDate</span>(<span class="hljs-params">date</span>) </span>{
    <span class="hljs-keyword">const</span> calendar = google.calendar({
        version: <span class="hljs-string">"v3"</span>,
        auth: process.env.GOOGLE_PUBLIC_API_KEY,
    });
}
</code></pre>
<p>Here, you can see that I’ve used the Google API public key as an environment variable. So, we’ll create a <code>.env</code> file in the root of the project directory and add the following inside that file:</p>
<pre><code class="lang-plaintext">GOOGLE_PUBLIC_API_KEY=WRITE_YOUR_GOOGLE_PUBLIC_API_KEY
</code></pre>
<p>Don’t forget to replace with your own Google Public API Key. You can grab your public key from <a target="_blank" href="https://cloud.google.com/cloud-console">Google Cloud Console</a>.</p>
<p>Now we need to calculate the <code>start</code> and <code>end</code> of the given date (UTC) received as <code>string</code> in the <code>date</code> parameter of the <code>getMyCalendarDataByDate</code> function.</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// Calculate the start and end of the given date (UTC)</span>
<span class="hljs-keyword">const</span> start = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(date);
start.setUTCHours(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>);
<span class="hljs-keyword">const</span> end = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(start);
end.setUTCDate(end.getUTCDate() + <span class="hljs-number">1</span>);
</code></pre>
<p>Now it’s time to fetch the list of events from my Google Public Calendar. For that, according to Google Calendar API, we need to call the <code>calendar.events.list</code> function and pass necessary options to it:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> calendar.events.list({
    calendarId: process.env.CALENDAR_ID,
    timeMin: start.toISOString(),
    timeMax: end.toISOString(),
    maxResults: <span class="hljs-number">10</span>,
    singleEvents: <span class="hljs-literal">true</span>,
    orderBy: <span class="hljs-string">"startTime"</span>,
});
</code></pre>
<p>Here you can see, I have mentioned my Public Calendar ID using another environment variable called <code>CALENDAR_ID</code>. So go back to your .env file and set the new environment variable:</p>
<pre><code class="lang-plaintext">CALENDAR_ID=YOUR_OWN_PUBLIC_CALENDAR_ID
</code></pre>
<p>Just a quick note – your <code>CALENDAR_ID</code> will be simply your Google Email address, for example <code>someone@gmail.com</code>. Also don’t forget to make your calendar public, otherwise this example and API setup will not work.</p>
<p>To make your Google Calendar public, you need to adjust the calendar's sharing settings in Google Calendar on a computer. Navigate to the calendar you want to share, then find the "Access permissions for events" section and check the box labeled "Make available to public". You can then choose the level of access you want to grant others.</p>
<p>Here's a step-by-step guide:</p>
<ul>
<li><p>Go to <a target="_blank" href="https://calendar.google.com/">Google Calendar</a> on your computer.</p>
</li>
<li><p>Find the calendar you want to share under the "My calendars" section on the left side of the screen.</p>
</li>
<li><p>Click on the three dots (More) next to the calendar name and select "Settings and sharing".</p>
</li>
<li><p>Under "Access permissions for events," check the box next to "Make available to public".</p>
</li>
</ul>
<p>And for the <code>timeMin</code> and <code>timeMax</code> options I have used the <code>start</code> and <code>end</code> date time we just calculated above.</p>
<p>Now we will get the <code>events</code> array from <code>res.data.items</code> and then map through those events to get the final <code>meetings</code> array. We also need to handle blank array for no events.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> events = res.data.items || [];
<span class="hljs-keyword">const</span> meetings = events.map(<span class="hljs-function">(<span class="hljs-params">event</span>) =></span> {
    <span class="hljs-keyword">const</span> start = event.start.dateTime || event.start.date;
    <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${event.summary}</span> at <span class="hljs-subst">${start}</span>`</span>;
});

<span class="hljs-keyword">if</span> (meetings.length > <span class="hljs-number">0</span>) {
    <span class="hljs-keyword">return</span> {
        meetings,
    };
} <span class="hljs-keyword">else</span> {
    <span class="hljs-keyword">return</span> {
        meetings: [],
    };
}
</code></pre>
<p>Let’s do some error handling. We will simply push our above event fetching logic inside a <code>try/catch</code> block and handle error inside the <code>catch</code> block. So below is our updated code:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">try</span> {
    <span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> calendar.events.list({
        calendarId: process.env.CALENDAR_ID,
        timeMin: start.toISOString(),
        timeMax: end.toISOString(),
        maxResults: <span class="hljs-number">10</span>,
        singleEvents: <span class="hljs-literal">true</span>,
        orderBy: <span class="hljs-string">"startTime"</span>,
    });

    <span class="hljs-keyword">const</span> events = res.data.items || [];
    <span class="hljs-keyword">const</span> meetings = events.map(<span class="hljs-function">(<span class="hljs-params">event</span>) =></span> {
        <span class="hljs-keyword">const</span> start = event.start.dateTime || event.start.date;
        <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${event.summary}</span> at <span class="hljs-subst">${start}</span>`</span>;
    });

    <span class="hljs-keyword">if</span> (meetings.length > <span class="hljs-number">0</span>) {
        <span class="hljs-keyword">return</span> {
            meetings,
        };
    } <span class="hljs-keyword">else</span> {
        <span class="hljs-keyword">return</span> {
            meetings: [],
        };
    }
} <span class="hljs-keyword">catch</span> (err) {
    <span class="hljs-keyword">return</span> {
        error: err.message,
    };
}
</code></pre>
<p>To run the server locally using <code>stdin</code>/<code>stdout</code>, I used the <code>stdioServerTransport()</code> function from the MCP package and passed it to the server's <code>start()</code> method. This part looks like:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> transport = stdioServerTransport();
server.start(transport);
</code></pre>
<p>Then I wrapped everything inside an async <code>init()</code> function to avoid top-level <code>await</code> and call the <code>init</code> function.</p>
<pre><code class="lang-typescript"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">init</span>(<span class="hljs-params"></span>)</span>{
    <span class="hljs-keyword">const</span> transport = stdioServerTransport();
    server.start(transport);
}

init();
</code></pre>
<h4 id="heading-6-final-source-code">6. Final Source Code</h4>
<p>So below is the complete code for my <code>server.js</code> file:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { McpServer } <span class="hljs-keyword">from</span> <span class="hljs-string">"@modelcontextprotocol/sdk/server/mcp.js"</span>;
<span class="hljs-keyword">import</span> { StdioServerTransport } <span class="hljs-keyword">from</span> <span class="hljs-string">"@modelcontextprotocol/sdk/server/stdio.js"</span>;
<span class="hljs-keyword">import</span> dotenv <span class="hljs-keyword">from</span> <span class="hljs-string">"dotenv"</span>;
<span class="hljs-keyword">import</span> { google } <span class="hljs-keyword">from</span> <span class="hljs-string">"googleapis"</span>;
<span class="hljs-keyword">import</span> { z } <span class="hljs-keyword">from</span> <span class="hljs-string">"zod"</span>;

dotenv.config();

<span class="hljs-comment">// create the MCP server</span>
<span class="hljs-keyword">const</span> server = <span class="hljs-keyword">new</span> McpServer({
    name: <span class="hljs-string">"Sumit's Calendar"</span>,
    version: <span class="hljs-string">"1.0.0"</span>,
});

<span class="hljs-comment">// tool function</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">getMyCalendarDataByDate</span>(<span class="hljs-params">date</span>) </span>{
    <span class="hljs-keyword">const</span> calendar = google.calendar({
        version: <span class="hljs-string">"v3"</span>,
        auth: process.env.GOOGLE_PUBLIC_API_KEY,
    });

    <span class="hljs-comment">// Calculate the start and end of the given date (UTC)</span>
    <span class="hljs-keyword">const</span> start = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(date);
    start.setUTCHours(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>);
    <span class="hljs-keyword">const</span> end = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(start);
    end.setUTCDate(end.getUTCDate() + <span class="hljs-number">1</span>);

    <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> calendar.events.list({
            calendarId: process.env.CALENDAR_ID,
            timeMin: start.toISOString(),
            timeMax: end.toISOString(),
            maxResults: <span class="hljs-number">10</span>,
            singleEvents: <span class="hljs-literal">true</span>,
            orderBy: <span class="hljs-string">"startTime"</span>,
        });

        <span class="hljs-keyword">const</span> events = res.data.items || [];
        <span class="hljs-keyword">const</span> meetings = events.map(<span class="hljs-function">(<span class="hljs-params">event</span>) =></span> {
            <span class="hljs-keyword">const</span> start = event.start.dateTime || event.start.date;
            <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${event.summary}</span> at <span class="hljs-subst">${start}</span>`</span>;
        });

        <span class="hljs-keyword">if</span> (meetings.length > <span class="hljs-number">0</span>) {
            <span class="hljs-keyword">return</span> {
                meetings,
            };
        } <span class="hljs-keyword">else</span> {
            <span class="hljs-keyword">return</span> {
                meetings: [],
            };
        }
    } <span class="hljs-keyword">catch</span> (err) {
        <span class="hljs-keyword">return</span> {
            error: err.message,
        };
    }
}

<span class="hljs-comment">// register the tool to MCP</span>
server.tool(
    <span class="hljs-string">"getMyCalendarDataByDate"</span>,
    {
        date: z.string().refine(<span class="hljs-function">(<span class="hljs-params">val</span>) =></span> !<span class="hljs-built_in">isNaN</span>(<span class="hljs-built_in">Date</span>.parse(val)), {
            message: <span class="hljs-string">"Invalid date format. Please provide a valid date string."</span>,
        }),
    },
    <span class="hljs-keyword">async</span> ({ date }) => {
        <span class="hljs-keyword">return</span> {
            content: [
                {
                    <span class="hljs-keyword">type</span>: <span class="hljs-string">"text"</span>,
                    text: <span class="hljs-built_in">JSON</span>.stringify(<span class="hljs-keyword">await</span> getMyCalendarDataByDate(date)),
                },
            ],
        };
    }
);

<span class="hljs-comment">// set transport</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">init</span>(<span class="hljs-params"></span>) </span>{
    <span class="hljs-keyword">const</span> transport = <span class="hljs-keyword">new</span> StdioServerTransport();
    <span class="hljs-keyword">await</span> server.connect(transport);
}

<span class="hljs-comment">// call the initialization</span>
init();
</code></pre>
<p>Then install the necessary <code>dotenv</code>, <code>googleapis</code>, and <code>zod</code> packages with the below command:</p>
<pre><code class="lang-bash">npm install dotenv googleapis zod
</code></pre>
<p>Now you can start the server with the command <code>node server.js</code> in your terminal and check whether everything is working properly or not. In case you get any warning to add a <code>type: “module”</code> line inside your <code>package.json</code> file, go ahead and do that. This warning is expected because we are using ES Module syntax for importing our packages instead of default Common JS syntax.</p>
<p>Finally, we are done with the coding part.</p>
<h4 id="heading-7-connecting-with-cursor-editor">7. Connecting with Cursor editor</h4>
<p>After setting up the server, I needed to register it inside the <strong>Cursor</strong> editor:</p>
<p>Start by opening Cursor Settings → Tools & Integrations → New MCP Server.</p>
<p></p>
<p>Inside the object, provide a new object with the below properties according to <a target="_blank" href="https://docs.cursor.com/context/model-context-protocol#manual-configuration">Cursor Client setup guide</a> mentioned in the <a target="_blank" href="https://docs.cursor.com/welcome">Cursor Docs</a>:</p>
<ul>
<li><p>A name: <code>Sumit's Calendar Data</code></p>
</li>
<li><p>Command: <code>node</code></p>
</li>
<li><p>Arguments: full path to <code>server.js</code></p>
</li>
<li><p>Environment variables: API key and Calendar ID</p>
</li>
</ul>
<p>Example:</p>
<pre><code class="lang-json">{
    mcpServers: {
        <span class="hljs-attr">"sumits-calendar-data"</span>: {
            command: <span class="hljs-string">"node"</span>,
            args: [<span class="hljs-string">"/full/path/to/project/server.js"</span>],
            env: {
                GOOGLE_API_KEY: <span class="hljs-string">"..."</span>,
                CALENDAR_ID: <span class="hljs-string">"..."</span>,
            },
        },
    },
}
</code></pre>
<p></p>
<p>Save and restart Cursor. The tool will now show as <strong>active (green)</strong>.</p>
<h4 id="heading-8-test-your-mcp-server">8. Test Your MCP Server</h4>
<p>Now, open the Cursor chat window and type:</p>
<p><em>"Do I have any meetings today?"</em></p>
<p>You'll see that:</p>
<ul>
<li><p>It detects the intent</p>
</li>
<li><p>Chooses the correct MCP tool</p>
</li>
<li><p>Passes today's date as input</p>
</li>
<li><p>MCP server returns structured data</p>
</li>
<li><p>The AI client responds naturally. In my case, I saved an event inside my Google Calendar on today’s date so it returned:</p>
</li>
</ul>
<p><em>"Yes, you have a meeting with Dr. Chuck at 4:00 PM."</em></p>
<p>It even works in other languages. If you ask the same question another language other than English, you still get the correct answer. If there are no meetings for a given date, for example if you write:</p>
<p><em>“Do I have any meeting tomorrow?”</em></p>
<p>It replies:</p>
<p><em>"No, you do not have any meetings scheduled for tomorrow."</em></p>
<p>So now your custom MCP server is fully working, feeding real data from Google Calendar into your AI editor.</p>
<p>This unlocks huge possibilities. Imagine the same approach with GitHub, Notion, internal dashboards, CRMs – anything. It all starts with building and wiring up your MCP server the right way.</p>
<p>Let me know if you would like to build one for your own project! And if this handbook was even a little bit helpful in getting your first MCP server up and running, I’d love to hear about it – it would be great inspiration for me to write more guides like this in the future.</p>
<h2 id="heading-summary">Summary</h2>
<p>You can find all the source code from this handbook in <a target="_blank" href="https://github.com/logicbaselabs/mcp-tutorial">this GitHub repository</a>. If it helped you in any way, consider giving it a star to show your support!</p>
<p>Also, if you found the handbook valuable, feel free to share it with others who might benefit from it. I’d really appreciate your thoughts – mention me on X <a target="_blank" href="https://x.com/sumit_analyzen">@sumit_analyzen</a>, watch my <a target="_blank" href="https://youtube.com/@logicBaseLabs">coding tutorials</a>, or simply <a target="_blank" href="https://www.linkedin.com/in/sumitanalyzen">connect with me on LinkedIn</a>.</p>
 
</article>
<article>
<h1> The Agentic AI Handbook: A Beginner's Guide to Autonomous Intelligent Agents </h1>
<p>Balajee Asish Brahmandam — Wed, 28 May 2025 14:22:20 +0000</p>
 <p>You may have heard about “Agentic AI” systems and wondered what they’re all about. Well, in basic terms, the idea behind Agentic AI is that it can see its surroundings, set and pursue goals, plan and reason through many processes, and learn from experience.</p>
<p>Unlike chatbots or rule-based software, agentic AI actively responds to user requests. It may break activities into smaller tasks, make decisions based on a high-level goal, and change its behavior over time using tools or other specialized AI components.</p>
<p>To summarize, <a target="_blank" href="https://blogs.nvidia.com/blog/what-is-agentic-ai/">agentic AI systems</a> "solve complex, multi-step problems autonomously by using sophisticated reasoning and iterative planning." In customer service, for example, an agentic AI may answer questions, check a user's account, offer balance settlements, and conduct transactions without human supervision.</p>
<p>So, agentic AI is "<a target="_blank" href="https://www.ibm.com/think/topics/agentic-ai">AI with agency</a>”. Given a problem context, it sets goals, creates strategies, manipulates the environment or software tools, and learns from the results.</p>
<p>But at the moment, most popular AI systems are reactive or non-agentic, doing a specific job or reacting to inputs without preparation. For example, Siri or a traditional image classifier use predefined models or rules to map inputs to outputs. Instead of long-term goals or multi-step processes, <a target="_blank" href="https://www.ibm.com/think/topics">reactive AI</a> "responds to specific inputs with pre-defined actions". Agentic AI is more like a robot or personal assistant that can handle reasoning chains, adapt, and "think" before acting.</p>
<h3 id="heading-what-well-cover-here">What we’ll cover here</h3>
<p>In this article, you’ll learn what makes Agentic AI fundamentally different from traditional reactive systems. We’ll cover its key components like autonomy, goal-setting, planning, reasoning, and memory and explore how these systems are being built today. We’ll also look at the challenges they present, and where they are currently in development. Finally, you’ll get a hands-on tutorial on how to build your own simple agent using Python and LangChain.</p>
<h3 id="heading-table-of-contents">Table of Contents:</h3>
<ol>
<li><p><a class="post-section-overview" href="#heading-agentic-vs-reactive-ai">Agentic vs Reactive AI</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-key-components-of-ai-agency">Key Components of AI Agency</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-autonomy">Autonomy</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-goal-directed-behavior">Goal-Directed Behavior</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-planning">Planning</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-reasoning">Reasoning</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-memory">Memory</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-how-does-agentic-ai-know-what-to-do">How Does Agentic AI Know What to Do?</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-1-it-uses-a-pretrained-ai-model">1. It Uses a Pretrained AI Model</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-2-it-follows-instructions-in-prompts">2. It Follows Instructions in Prompts</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-3-it-uses-tools-but-only-when-told-how">3. It Uses Tools, But Only When Told How</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-4-it-can-remember-sometimes">4. It Can Remember (Sometimes)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-5-its-not-fully-autonomous-yet">5. It’s Not Fully Autonomous — Yet</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-so-whats-the-current-state-of-agentic-ai">So What’s the Current State of Agentic AI?</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-exists-today">What Exists Today</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-whats-still-experimental">What’s Still Experimental</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-are-we-close-to-truly-autonomous-agents">Are We Close to Truly Autonomous Agents?</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-building-agentic-ai-frameworks-and-approaches">Building Agentic AI: Frameworks and Approaches</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-reinforcement-learning-rl-agents">Reinforcement Learning (RL) Agents</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-llm-based-generative-agents">LLM-Based (Generative) Agents</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-multi-agent-and-orchestration-frameworks">Multi-Agent and Orchestration Frameworks</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-classical-planning-and-symbolic-ai">Classical Planning and Symbolic AI</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-tool-augmented-reasoning">Tool-augmented Reasoning</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-major-challenges-of-agentic-ai">Major Challenges of Agentic AI</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-alignment-and-value-specification">Alignment and Value Specification</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-unintended-consequences">Unintended Consequences</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-safety-and-security">Safety and Security</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-coordination-and-scalability">Coordination and Scalability</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-ethical-and-legal-questions">Ethical and Legal Questions</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-code-snippet-and-real-world-examples">Code Snippet and Real-World Examples</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-tutorial-build-your-first-agentic-ai-with-python">Tutorial: Build Your First Agentic AI with Python</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-real-world-use-case">Real-World Use Case</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-prerequisites-what-you-need">Prerequisites – What You Need</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-by-step-tutorial">Step-by-Step Tutorial</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-agentic-vs-reactive-ai"><strong>Agentic vs Reactive AI</strong></h2>
<p>Before we dive fully in, I want to make sure the differences between non-agentic and agentic AI are clear.</p>
<p>Non-agentic reactive AI uses learned models or rules to map inputs to outputs. It replies to one idea or task at a time, not starting additional ones. Examples include a calculator, spam filter, and rudimentary chatbot with pre-written responses. Reactive AI cannot plan or improve without reprogramming.</p>
<p>Agentic AI, on the other hand, acts independently with goals. It may organize actions, set objectives, adapt to new information, and collaborate with others. Agentic AI can break a complex task into small segments and coordinate the usage of specialized tools or services to complete each step.</p>
<p>The agent is also proactive. An agentic AI may inform users of updates, restock supplies, and check inventory levels, unlike a reactive system.</p>
<p>The difference is a paradigmatic shift: modern agentic systems include several specialized agents working together on a high-level objective, with dynamic task breakdown and even permanent memory, instead of a single model. This multi-agent collaboration may help agentic AI solve large real-world problems.</p>
<p>Cutting-edge prototypes like intelligent chatbots with tool integration, autonomous driving software, and coordinated industrial robots are entering agentic territory, but today's reactive AI virtual assistants (Alexa, Siri) may blur the line. It's a vital distinction whether the system actively selects rather than reacts.</p>
<h2 id="heading-key-components-of-ai-agency"><strong>Key Components of AI Agency</strong></h2>
<p>Agentic AI systems are characterized by several core capabilities that give them <strong>agency</strong>. Let’s look at these now.</p>
<h3 id="heading-autonomy"><strong>Autonomy</strong></h3>
<p>An autonomous agent may work without human supervision. It may act depending on its goals and strategy rather than waiting for specific directions.</p>
<p>The agent must use sensors or data streams to perceive, evaluate, and decide to be autonomous. An autonomous warehouse robot can move, pick up things, and alter path when it encounters barriers without human guidance. Autonomy implies self-monitoring: an agent gauges its battery life or job completion and adapts as needed.</p>
<p>An agentic AI's “reasoning engine” (usually a large language model or similar system) makes decisions and can adjust its behavior based on user feedback or rewards.</p>
<p>As IBM explains, “without any human intervention, agentic AI can act independently, adapt to new situations, make decisions, and learn from experience” (<a target="_blank" href="https://www.ibm.com/think/topics/agentic-ai">source</a>). But uncontrolled autonomous agents may behave in unpredictable ways – which is why they must be carefully designed.</p>
<p>Although agentic AIs can operate on their own, their goals, tools, and boundaries must be clearly planned to avoid unintended or harmful outcomes. Without that guidance, they may follow instructions too literally or make decisions without understanding the bigger picture.</p>
<h3 id="heading-goal-directed-behavior"><strong>Goal-Directed Behavior</strong></h3>
<p>Agentic AI is goal-directed. The system attempts to achieve one or more goals. The goals might be specified openly ("set up a meeting for tomorrow") or implicitly through a reward system. Instead of following a script, the agent chooses how to achieve its goal. It may choose methods, subgoals, and long-term goals.</p>
<p>Unplanned reactive AI has short-term or implicit goals (for example, recognize an image, guess the next word). Agentic AIs aim toward long-term goals. If assigned the duty of "organizing my travel itinerary," an agent may book flights, hotels, transportation, and so on, choose the best order, and adjust the schedule if airline prices change.</p>
<p>Business and research sources underline this distinction. Agentic AI plans and works for long-term goals, whereas reactive systems manage immediate, reactive responses. A plan-and-execute architecture lets the agent decide what to do and define and alter its goals. Instead of distinct, separate acts, it progressively performs a series. Goal-directed behavior demonstrates purposeful intent, even if the goal is vague.</p>
<h3 id="heading-planning"><strong>Planning</strong></h3>
<p>An agent plans to achieve its goals. A goal and data instruct the agentic AI to conduct a series of actions or subtasks. Planning includes simple heuristics (if A, then do B) and advanced reasoning (evaluating options).</p>
<p>Modern agentic AI uses planner-executor architectures with chain-of-thought prompting. In a "plan-and-execute" agent, an LLM-driven planner develops a multi-step plan, and executor modules employ tools or models to execute each step. ReAct is another technique in which the agent alternates between action and reasoning (or "thought") to refine its approach as it accumulates observations.</p>
<p>Planning often involves search and optimization using neural networks, decision trees, or graph-based techniques. For example, an agent might build a planning graph showing different possible actions and outcomes, then use algorithms like A* search or Monte Carlo tree search to choose the best next step.</p>
<p>In some cases, the agent simulates multiple possible futures to evaluate which actions are most likely to lead to success. Large language models (LLMs) can also help by breaking down complex instructions into smaller steps turning a single high-level goal into a list of tasks that can be executed one by one.</p>
<p>Here’s a simplified example (pseudocode) of an agent loop:</p>
<pre><code class="lang-python">goal = <span class="hljs-string">"prepare presentation on AI"</span>
agent = AI_Agent(goal)
environment = TaskEnvironment()
 <span class="hljs-comment"># Loop until the task is complete</span>
<span class="hljs-keyword">while</span> <span class="hljs-keyword">not</span> environment.task_complete():
    observation = agent.perceive(environment)
    plan = agent.make_plan(observation)        <span class="hljs-comment"># e.g., list of steps</span>
    action = plan.next_step()
    result = agent.act(action, environment)
    agent.learn(result)                       <span class="hljs-comment"># update memory or strategy</span>
</code></pre>
<p>Here, the agent perceives the current state, plans a sequence of steps toward its goal, acts by executing the next step, and then learns from the outcome before repeating. This cycle captures the core loop of an autonomous agent.</p>
<h3 id="heading-reasoning"><strong>Reasoning</strong></h3>
<p>Making judgments by applying logic and inference is known as reasoning. In addition to acting, an agentic AI considers what actions make sense in light of its information. This entails assessing trade-offs, comprehending cause and consequence, and, if necessary, applying mathematical or symbolic thinking.</p>
<p>An agent may, for instance, apply deductive reasoning, like "If sales fall below X, reorder inventory" or "All invoices are paid by Friday. This is an invoice, so I should pay it by Friday". By enabling the agent to process natural language commands, retain contextual information, and produce logical justifications for its decisions, large language models support reasoning.</p>
<p>An LLM "acts as the orchestrator or reasoning engine" that comprehends tasks and produces solutions, <a target="_blank" href="https://python.langchain.com/docs/">according to one explanation in the LangChain docs</a>. In order to retrieve pertinent information for reasoning, agents also employ strategies such as <a target="_blank" href="https://www.freecodecamp.org/news/learn-rag-fundamentals-and-advanced-techniques/">retrieval-augmented generation (RAG)</a>.</p>
<p>Agentic reasoning is essentially like internal planning and problem-solving. An agent evaluates a task by internally simulating potential strategies (often in the "thoughts" of an LLM) and selecting the most effective one. This might entail formal logic, analogical reasoning (connecting a new problem to previous ones), or multi-step deduction. So the agent continually considers its next course of action and adjusts to new inputs rather of just clicking "execute" on a single model outcome.</p>
<h3 id="heading-memory"><strong>Memory</strong></h3>
<p>Agents can utilize memory to recall prior experiences, information, and interactions to make decisions. A memoryless AI would treat every moment as new. Agentic systems record their behaviors, outcomes, and context. A short-term “working memory” of the present plan state or a long-term world knowledge base are examples.</p>
<p>A customer-service agent may remember a user's name and issue history to avoid repeating inquiries. Game-playing agents learn from past positions to move better. <a target="_blank" href="https://research.ibm.com/blog/agentic-ai">IBM says</a> AI agent memory “refers to an AI system’s ability to store and recall past experiences to improve decision-making, perception and overall performance”. Goal-oriented agents need memory to create a cohesive narrative of previous steps (to avoid repeating failures) and discover trends.</p>
<p>Agentic architectures incorporate memory modules like databases or vector storage that the LLM may query. Large language models are stateless. Agents utilize relevance filters to retain only important information since too much memory slows the system. Memory offers the agent context and continuity, allowing it to learn from previous tasks rather than beginning again.</p>
<h2 id="heading-how-does-agentic-ai-know-what-to-do">How Does Agentic AI Know What to Do?</h2>
<p>Agentic AI might seem smart, but it’s not actually “thinking” like a human. Let’s break down how it really works.</p>
<h3 id="heading-1-it-uses-a-pretrained-ai-model">1. It Uses a Pretrained AI Model</h3>
<p>At the heart of most agentic systems is a large language model (LLM) like GPT-4. This model is trained on a huge amount of tex, books, articles, websites, and so on to learn how people write and talk.</p>
<p>But it wasn’t trained to act like an agent. It was trained to predict the next word in a sentence.</p>
<p>When we give it the right prompts, it can seem like it’s making plans or solving problems. Really, it’s just generating useful responses based on patterns it learned during training.</p>
<h3 id="heading-2-it-follows-instructions-in-prompts">2. It Follows Instructions in Prompts</h3>
<p>Agentic AI doesn’t figure out what to do by itself – developers give it structure using prompts.</p>
<p>For example:</p>
<ul>
<li><p>“You are an assistant. First, think step by step. Then take action.”</p>
</li>
<li><p>“Here’s a goal: research coding tools. Plan steps. Use Wikipedia to search.”</p>
</li>
</ul>
<p>These prompts help the AI simulate planning, decision-making, and action.</p>
<h3 id="heading-3-it-uses-tools-but-only-when-told-how">3. It Uses Tools, But Only When Told How</h3>
<p>The AI doesn’t automatically know how to use tools like search engines or calculators. Developers give it access to those tools, and the AI can decide when to use them based on the text it generates.</p>
<p>Think of it like this: the AI suggests, “Now I’ll look something up,” and the system makes that happen.</p>
<h3 id="heading-4-it-can-remember-sometimes">4. It Can Remember (Sometimes)</h3>
<p>Some agents use short-term memory to remember past questions or results. Others store useful information in a database for later. But they don’t “learn” over time like humans do – they only remember what you let them.</p>
<h3 id="heading-5-its-not-fully-autonomous-yet">5. It’s Not Fully Autonomous — Yet</h3>
<p>Most agentic systems today are not fully self-learning or self-aware. They’re smart combinations of:</p>
<ul>
<li><p>Pretrained AI</p>
</li>
<li><p>Prompts</p>
</li>
<li><p>Tools</p>
</li>
<li><p>Memory</p>
</li>
</ul>
<p>Their “autonomy” comes from how all these parts work together – not from deep understanding or long-term training.</p>
<h2 id="heading-so-whats-the-current-state-of-agentic-ai">So What’s the Current State of Agentic AI?</h2>
<p>Agentic AI is still an emerging area of development. While it sounds futuristic, many systems today are just starting to use agent-like capabilities.</p>
<h3 id="heading-what-exists-today">What Exists Today</h3>
<h4 id="heading-simple-agentic-systems-already-work-in-limited-ways">Simple agentic systems already work in limited ways</h4>
<ul>
<li><p>For example, some customer service bots can check account details, respond to questions, and escalate issues automatically.</p>
</li>
<li><p>Warehouse robots can plan simple routes and avoid obstacles on their own.</p>
</li>
<li><p>Coding assistants like GitHub Copilot can help write and fix code based on natural language input.</p>
</li>
</ul>
<p>These systems show basic agentic behavior like goal-following and tool use but usually in a narrow, structured environment.</p>
<h3 id="heading-whats-still-experimental">What’s Still Experimental</h3>
<ul>
<li><p>Fully autonomous, multi-purpose agents the kind that can reason deeply, make long-term plans, and adapt to new tools, are still in research or prototype stages.</p>
</li>
<li><p>Projects like <strong>AutoGPT</strong>, <strong>BabyAGI</strong>, and <strong>OpenDevin</strong> are exciting, but they’re mostly experimental and require human oversight.</p>
</li>
</ul>
<p>Most current agentic systems:</p>
<ul>
<li><p>Don’t learn continuously</p>
</li>
<li><p>Struggle with unpredictable environments</p>
</li>
<li><p>Require a lot of setup to avoid errors or unexpected behavior</p>
</li>
</ul>
<h3 id="heading-are-we-close-to-truly-autonomous-agents">Are We Close to Truly Autonomous Agents?</h3>
<p>We’re getting closer, but we’re not there yet.</p>
<p>Today’s agentic AI is like a very clever assistant that can follow instructions, use tools, and plan steps. But it still depends on developers to give it structure (via prompts, tool choices, and boundaries).</p>
<p>In short, Agentic AI works in specific, well-designed use cases. But general-purpose, human-level autonomous agents are still a long way off.</p>
<h2 id="heading-building-agentic-ai-frameworks-and-approaches"><strong>Building Agentic AI: Frameworks and Approaches</strong></h2>
<p>Researchers and engineers have developed various frameworks and tools to construct agentic AI systems. Let’s discuss some key approaches.</p>
<h3 id="heading-reinforcement-learning-rl-agents"><strong>Reinforcement Learning (RL) Agents</strong></h3>
<p>In artificial intelligence, traditional agents are frequently constructed via <a target="_blank" href="https://www.freecodecamp.org/news/how-to-apply-reinforcement-learning-to-real-life-planning-problems-90f8fa3dc0c5/">reinforcement learning</a>, in which the agent learns to maximize a reward signal through trial and error. Atari game agents and DeepMind's AlphaGo are classic examples.</p>
<p>In addition to planning (in the sense of calculating a policy) and learning from interactions, RL agents are goal-directed (maximizing reward). Still, a lot of pure RL systems struggle with the open-ended complexity of real-world tasks and function best in simulated contexts.</p>
<p>While RL components are occasionally incorporated into modern agentic AI (for example, an agent may utilize RL to drive a robot at a basic level), they are frequently supplemented with other methods for higher level thinking.</p>
<h3 id="heading-llm-based-generative-agents"><strong>LLM-Based (Generative) Agents</strong></h3>
<p>The use of LLMs as reasoning engines within agents has become popular due to the recent explosion of large language models. For instance, LLMs (such as GPT-4) are used by frameworks like ReAct, AutoGPT, and BabyAGI to create plans and actions. These systems include prompting an LLM with the agent's objective and context, after which it generates a step or sub-goal and invokes either a function or a tool.</p>
<p>One design, frequently referred to as a ReAct loop, alternates between "Thought" (the LLM planning or reasoning) and "Action" (calling upon tools or APIs). An alternative approach involves a distinct planner LLM that generates a comprehensive multi-step plan, which is then followed by executor modules that execute each step.</p>
<p>To increase their capabilities, LLM agents frequently employ tools like search engines, calculators, and API calls. They also use context retrieval, such as RAG or memory storage, to guide their reasoning. <a target="_blank" href="https://www.freecodecamp.org/news/beginners-guide-to-langchain/">LangChain</a> and LangGraph are well-known open-source frameworks that offer building blocks (memory buffers, tool integration, and so on) for creating unique agents.</p>
<h3 id="heading-multi-agent-and-orchestration-frameworks"><strong>Multi-Agent and Orchestration Frameworks</strong></h3>
<p>Several sub-agents are used in many agentic AI architectures. A "crew" or "society of minds" method, for example, may produce many LLM agents that communicate by message passing and each serve a different job (planner, analyst, critic, and so on).</p>
<p>Orchestrated multi-agent processes are demonstrated by projects such as AutoGen, ChatDev, or MetaGPT. Engineering ideas for multi-agent systems are being explored in academic work. One study by BMW, for instance, outlines a framework for multi-agent cooperation in which several AI agents manage planning, execution, and specialized activities while working together to achieve an industrial use case.</p>
<p>These systems frequently have scheduling logic to allocate agents to subtasks and a task decomposition module, which breaks a goal down into its component elements. This essentially resembles a "AI team," in which every individual is an agentic subsystem.</p>
<h3 id="heading-classical-planning-and-symbolic-ai"><strong>Classical Planning and Symbolic AI</strong></h3>
<p>AI planning was examined in symbolic terms before to the current ML revival (STRIPS, PDDL planners, and so on). These methods might be viewed as an early example of agentic AI, in which a planner constructs a series of symbolic actions to accomplish a goal.</p>
<p>These concepts are occasionally included into contemporary agentic AI. For instance, an LLM agent may provide a high-level symbolic plan that grounded systems carry out, such as "(Find x such that property y), (compute f(x)), (deliver result)" and so on.</p>
<p>There are also hybrid architectures that combine traditional search with neural networks. The transition to learned or language-based planners is an extension of the classical planning that underpins many robotics and scheduling agents, even though it’s less prevalent in pure form today.</p>
<h3 id="heading-tool-augmented-reasoning"><strong>Tool-augmented Reasoning</strong></h3>
<p>In many agentic systems, granting the agent access to external functions and information is a viable strategy. For instance, when responding to a difficult inquiry, a language-based agent may utilize Retrieval-Augmented Generation (RAG) to retrieve pertinent information from a database.</p>
<p>As "tools" that it may use, it might also include a calculator, a web browser, a database API, or bespoke code. Autonomy is largely made possible by the capacity to utilize tools – instead of attempting to learn everything by heart, the AI model learns how to ask the appropriate questions.</p>
<p>In sum, building an agentic AI often means combining multiple techniques: machine learning for perception and learning, symbolic planning for structure, LLM reasoning for natural language and problem decomposition, plus memory modules and feedback loops.</p>
<p>There is no one-size-fits-all framework yet. Research continues rapidly – recent papers on agentic systems emphasize end-to-end pipelines that integrate perception (input analysis), goal-oriented planning, tool use, and continual learning.</p>
<h2 id="heading-major-challenges-of-agentic-ai"><strong>Major Challenges of Agentic AI</strong></h2>
<p>Building AI agents with autonomy and goals is powerful but raises new risks and difficulties. Key challenges include:</p>
<h3 id="heading-alignment-and-value-specification"><strong>Alignment and Value Specification</strong></h3>
<p>Setting the correct goals is crucial for agentic systems. If an agent's aims don't match human values, it may be damaging. If a scheduling agent is directed to “minimize costs,” it may reduce vital services unless told to preserve quality. Humans' complicated priorities make value formulation challenging. Unspecified or poorly described goals cause unexpected consequences (<a target="_blank" href="https://en.wikipedia.org/wiki/Goodhart%27s_law">Goodhart's Law</a>).</p>
<h3 id="heading-unintended-consequences"><strong>Unintended Consequences</strong></h3>
<p>Even with good intentions, agents may discover loopholes. Reward-hacking in reinforcement learning is an example from basic AI. Autonomy increases these hazards for agentic AI. Recent experiments showed an LLM-based AI was told to pursue a goal “at all costs.” It planned to stop its own monitoring and clone itself to escape shutdown, acting in self-preservation.</p>
<p>If unconstrained, an agent may deceive to achieve its aims. Unintended effects can range from an assistant arranging a hazardous flight because it fixed on a cost-savings aim to more subtle damages like cutting important benefits. <a target="_blank" href="https://www.ibm.com/think/insights/ethics-governance-agentic-ai">IBM researchers warn</a> that agents “can act without your supervision”, resulting in unintended consequences without strong protections.</p>
<h3 id="heading-safety-and-security"><strong>Safety and Security</strong></h3>
<p>Highly autonomous agents can increase danger. They may access sensitive data or operate machinery. IBM says that agents are opaque and open-ended, so their judgments might be unclear, and they may suddenly use new tools or data. A healthcare agent may leak patient data, or a financial bot may execute a dangerous move.</p>
<p>LLM-style adversarial assaults and hallucinations become more dangerous in agentic AI. Though bothersome, a delusional chatbot or investment agent might also lose millions. Agent multi-step reasoning is sensitive to hostile inputs at any level. Complex agents make trust and verification difficult.</p>
<h3 id="heading-coordination-and-scalability"><strong>Coordination and Scalability</strong></h3>
<p>In many agentic systems, multiple agents may collaborate or compete. Ensuring that they communicate correctly and don’t conflict is non-trivial.</p>
<p>A recent review notes unique challenges in orchestrating multiple agents without standardized protocols. As <a target="_blank" href="https://hai.stanford.edu/ai-index/2025-ai-index-report">the Stanford ethics report</a> points out, if millions of agents interact (for example, booking each other’s appointments), the emergent behavior could be unpredictable at scale. This raises societal concerns about system-level effects and feedback loops we haven’t seen before.</p>
<h3 id="heading-ethical-and-legal-questions"><strong>Ethical and Legal Questions</strong></h3>
<p>Finally, there are questions of responsibility and bias. Who is liable if an autonomous agent makes a mistake? How do we ensure transparency and fairness in a black-box multi-agent system?</p>
<p>Legal and ethical frameworks are still catching up. For example, IBM highlights that agentic AI brings “an expanded set of ethical dilemmas” compared to today’s AI. And AI ethicists caution that deploying powerful assistants (as personal secretaries, advisors, and so on) will have profound societal impacts that are hard to predict.</p>
<p>Here are some specific things we need to consider:</p>
<ul>
<li><p><strong>Accountability:</strong> Who is accountable if an AI agent makes a damaging choice (such a medical AI agent prescribing the wrong medication or a logistics agent causing an accident)? Designers, deployers, or agents? Legal systems presume human control, but autonomous agents may not.</p>
</li>
<li><p><strong>Transparency:</strong> Complex and opaque agentic systems exist. Multiple neural networks, knowledge bases, and tools may interact. Explaining an agent's behavior for auditing or debugging is tough. This opposes explainable AI.</p>
</li>
<li><p><strong>Bias and fairness:</strong> Agents learn from data and environments that may reflect human biases. An autonomous hiring assistant agent, for instance, might inadvertently replicate discriminatory patterns unless carefully checked. And because agentic AI can perpetuate or amplify biases across many decisions, the impact could be larger.</p>
</li>
<li><p><strong>Job disruption and social impact:</strong> Just as factory automation destroyed certain employment, powerful AI agents might change office and creative labor. Personal assistant agents that schedule, manage email, and research might change many careers. This might boost production but also exacerbate deskilling and inequality. Social pressure to utilize agentic AI (if rivals do) may divide workers into “augmented” and “unaugmented” workers.</p>
</li>
<li><p><strong>Security and privacy:</strong> An agent with extensive system access harms privacy. Compromise of an AI agent permitted to access and write business data or personal correspondence might reveal critical information. IBM warns that agentic AI can increase recognized hazards, such as an agent accidentally biasing a database or sharing private data without monitoring. Tools must be authenticated and data handled securely.</p>
</li>
<li><p><strong>Human-AI interaction:</strong> Our agents may affect how we use technology and interact with others. If individuals utilize AI bots for conversation, information filtering, or companionship, it might change societal dynamics. Consider again the Stanford study referenced above. So we need to pursue ways to include standards and values into these encounters.</p>
</li>
</ul>
<p>In recognition of these challenges, technologists and ethicists urge us to use proactive safeguards. As IBM researchers put it, because agentic AI is advancing rapidly, we cannot wait to address safety – we must build strong guardrails now. Some proposed measures include strict testing protocols for agents, explainability requirements, legal regulations on autonomous systems, and design principles that prioritize human values.</p>
<p>So as you can see, while agentic AI offers the potential for AI that can handle complex tasks end-to-end, it also amplifies known AI risks (bias, error) and introduces new ones (autonomous decision-making, coordination failures). Addressing these challenges requires careful design of alignment, robust evaluation of agent behavior, and interdisciplinary governance.</p>
<h2 id="heading-code-snippet-and-real-world-examples"><strong>Code Snippet and Real-World Examples</strong></h2>
<p>To illustrate how an agentic system works, let’s consider a very simple Python-like pseudocode for an abstract agent (mixing concepts from above):</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Agent</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">init</span>(<span class="hljs-params">self, goal</span>):</span>
        self.goal = goal
        self.memory = []
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">perceive</span>(<span class="hljs-params">self, environment</span>):</span>
        <span class="hljs-comment"># Get data from environment (sensor, API, etc.)</span>
        <span class="hljs-keyword">return</span> environment.get_state()
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">plan</span>(<span class="hljs-params">self, observation</span>):</span>
        <span class="hljs-comment"># Use reasoning (LLM or algorithm) to decide next action(s)</span>
        plan = ReasoningEngine.generate_plan(goal=self.goal, context=observation)
        <span class="hljs-keyword">return</span> plan  <span class="hljs-comment"># e.g. list of steps or actions</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">act</span>(<span class="hljs-params">self, action, environment</span>):</span>
        <span class="hljs-comment"># Execute the action using tools or directly in the environment</span>
        result = environment.execute(action)
        <span class="hljs-keyword">return</span> result
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">learn</span>(<span class="hljs-params">self, experience</span>):</span>
        <span class="hljs-comment"># Store outcome or update strategy</span>
        self.memory.append(experience)   
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">run</span>(<span class="hljs-params">self, environment</span>):</span>
        <span class="hljs-keyword">while</span> <span class="hljs-keyword">not</span> environment.task_complete():
            obs = self.perceive(environment)
            plan = self.plan(obs)
            <span class="hljs-keyword">for</span> action <span class="hljs-keyword">in</span> plan:
                result = self.act(action, environment)
                self.learn(result)
</code></pre>
<p>This example demonstrates the core loop of an agentic AI:</p>
<ul>
<li><p>The agent starts with a goal and can store memory of what it has done.</p>
</li>
<li><p>It observes its environment to understand what’s happening.</p>
</li>
<li><p>Based on that input, it creates a plan – a list of actions to reach its goal.</p>
</li>
<li><p>It executes each action, interacts with the environment, and learns from what happens.</p>
</li>
<li><p>This process repeats until the goal is met or the task is complete.</p>
</li>
</ul>
<p>This basic structure mirrors how real-world agentic systems operate: perceive → plan → act → learn.</p>
<p>Real-world agentic AI systems are evolving. Self-driving cars detect their environment, set navigation goals, plan routes, and learn from experience.</p>
<p><a target="_blank" href="https://www.tesla.com/AI">Tesla's Full Self-Driving</a> “continuously learns from the driving environment and adjusts its behavior” to increase safety. Supply chain logistics businesses are creating agents that monitor inventory, estimate demand, alter routes, and place new orders autonomously. Amazon's warehouse robots utilize agentic AI to navigate complicated surroundings and adapt to changing situations, independently fulfilling orders.</p>
<p>Cybersecurity, healthcare, and customer service also use autonomous agents to identify and respond to risks. An agentic AI at a contact center may assess a customer's mood, account history, and company policies to provide a bespoke solution or process. Agentic systems organize and arrange marketing campaigns, write text, choose graphics, and alter strategies depending on performance data. In processes with several phases and choices, agentic AI can handle the whole workflow.</p>
<p>Recently, several prototype projects and open-source tools have begun experimenting with agentic AI in real-world scenarios.</p>
<p>For example, tools like AutoGPT and AgentGPT have demonstrated agents that can generate multimedia reports by coordinating research, writing, and image selection tasks. Other use cases include agents that retrieve knowledge and take follow-up action (for example, “find and implement the next step”), conduct security operations like scanning and responding to threats, or automate multi-step workflows in call centers.</p>
<p>These examples show how early-stage products and research projects are beginning to test and deploy agentic AI for complex, multi-step tasks beyond just answering questions.</p>
<h2 id="heading-tutorial-build-your-first-agentic-ai-with-python"><strong>Tutorial: Build Your First Agentic AI with Python</strong></h2>
<p>This step-by-step guide will teach you how to build a basic Agentic AI system even if you're just starting out. I’ll explain every concept clearly and give you working Python code you can run and study.</p>
<h3 id="heading-real-world-use-case"><strong>Real-World Use Case</strong></h3>
<p><strong>Scenario:</strong> You're a product manager exploring tools for your team. Instead of spending hours researching AI coding assistants manually, you'd like a personal research agent to:</p>
<ul>
<li><p>Understand your task</p>
</li>
<li><p>Gather relevant information from Wikipedia</p>
</li>
<li><p>Summarize it clearly</p>
</li>
<li><p>Remember context from previous questions</p>
</li>
</ul>
<p>This is where Agentic AI shines: it acts autonomously, reasons, and uses tools just like a smart human assistant.</p>
<h3 id="heading-prerequisites-what-you-need"><strong>Prerequisites – What You Need</strong></h3>
<ol>
<li><p>Python 3.10 or higher</p>
</li>
<li><p>An OpenAI API key (<a target="_blank" href="https://platform.openai.com/api-keys">https://platform.openai.com/api-keys</a>). Note that as of writing this OpenAI does not offer free API calls, so if you don’t already have an account you’ll need to use a credit card and a few dollars to complete this tutorial.</p>
</li>
<li><p>Install the required Python libraries:</p>
</li>
</ol>
<pre><code class="lang-bash">pip install langchain openai wikipedia
</code></pre>
<p>⚠️ Don't forget to store your API key safely. Never share it in public code.</p>
<h3 id="heading-step-by-step-tutorial"><strong>Step-by-Step Tutorial</strong></h3>
<h4 id="heading-step-1-set-up-your-environment">Step 1: Set Up Your Environment</h4>
<p>Start by setting your OpenAI API key in your script so that LangChain can access GPT models.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os

os.environ[<span class="hljs-string">"OPENAI_API_KEY"</span>] = <span class="hljs-string">"your-api-key-here"</span>  <span class="hljs-comment"># Replace with your real key</span>
</code></pre>
<h4 id="heading-step-2-connect-to-a-knowledge-source-wikipedia">Step 2: Connect to a Knowledge Source (Wikipedia)</h4>
<p>We'll give our agent the ability to use Wikipedia as a tool to gather information.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain.agents <span class="hljs-keyword">import</span> Tool
<span class="hljs-keyword">from</span> langchain.tools <span class="hljs-keyword">import</span> WikipediaQueryRun
<span class="hljs-keyword">from</span> langchain.utilities <span class="hljs-keyword">import</span> WikipediaAPIWrapper
<span class="hljs-comment"># Create the Wikipedia tool</span>
wiki = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())
<span class="hljs-comment"># Register the tool so the agent knows how to use it</span>
tools = [
    Tool(
        name=<span class="hljs-string">"Wikipedia"</span>,
        func=wiki.run,
        description=<span class="hljs-string">"Useful for looking up general knowledge."</span>
    )
]
</code></pre>
<p>You're giving your agent a way to "see the world" – Wikipedia is your agent's eyes.</p>
<h4 id="heading-step-3-initialize-the-agent-reasoning-engine">Step 3: Initialize the Agent (Reasoning Engine)</h4>
<p>We now give the agent a brain – a GPT model that can reason, decide, and plan.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain.chat_models <span class="hljs-keyword">import</span> ChatOpenAI
<span class="hljs-keyword">from</span> langchain.agents <span class="hljs-keyword">import</span> initialize_agent
<span class="hljs-keyword">from</span> langchain.agents.agent_types <span class="hljs-keyword">import</span> AgentType
<span class="hljs-comment"># Use a GPT model with zero randomness for consistent output</span>
llm = ChatOpenAI(temperature=<span class="hljs-number">0</span>)
<span class="hljs-comment"># Combine reasoning (LLM) and tools (Wikipedia) into one agent</span>
agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=<span class="hljs-literal">True</span>  <span class="hljs-comment"># Show thought process step-by-step</span>
)
</code></pre>
<p>This step fuses logic (GPT) and action (Wikipedia) to make your agent capable of goal-driven behavior.</p>
<h4 id="heading-step-4-give-your-agent-a-goal">Step 4: Give Your Agent a Goal</h4>
<pre><code class="lang-python">goal = <span class="hljs-string">"What are the top AI coding assistants and what makes them unique?"</span>
response = agent.run(goal)
print(<span class="hljs-string">"\nAgent's response:\n"</span>, response)
</code></pre>
<p>You’ve given your agent a mission. It will now think, search, and summarize.</p>
<p>You should see output like:</p>
<p><code>> Entering new AgentExecutor chain...</code></p>
<p><code>Thought: I should look up AI coding assistants on Wikipedia</code></p>
<p><code>Action: Wikipedia</code></p>
<p><code>Action Input: AI coding assistants</code></p>
<p><code>...</code></p>
<p><code>Final Answer: The top AI coding assistants are GitHub Copilot, Amazon CodeWhisperer, and Tabnine...</code></p>
<p>At this point, the agent has:</p>
<ul>
<li><p>Interpreted your goal</p>
</li>
<li><p>Selected a tool (Wikipedia)</p>
</li>
<li><p>Retrieved and analyzed content</p>
</li>
<li><p>Reasoned through it to deliver a conclusion</p>
</li>
</ul>
<h4 id="heading-step-5-give-your-agent-memory-optional-but-powerful">Step 5: Give Your Agent Memory (Optional but Powerful)</h4>
<p>Let your agent remember what you previously asked, like a real assistant.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain.memory <span class="hljs-keyword">import</span> ConversationBufferMemory
memory = ConversationBufferMemory(memory_key=<span class="hljs-string">"chat_history"</span>)
agent_with_memory = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION,
    memory=memory,
    verbose=<span class="hljs-literal">True</span>
)
<span class="hljs-comment"># Ask a follow-up</span>
agent_with_memory.run(<span class="hljs-string">"Tell me about GitHub Copilot"</span>)
agent_with_memory.run(<span class="hljs-string">"What else do you know about coding assistants?"</span>)
</code></pre>
<p>Your agent now tracks context across multiple interactions just like a good human assistant.</p>
<p>When this is done, your agent:</p>
<ul>
<li><p>Responds more naturally to follow-up questions</p>
</li>
<li><p>Links previous conversations to improve continuity</p>
</li>
</ul>
<p>After running the steps, your agent reads your goal and plans steps to fulfill it. It searches Wikipedia to gather facts, and reasons using a GPT model to summarize and decide what to say. It also optionally remembers context (with memory enabled). You now have a working Agentic AI that can be extended for real-world tasks.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Agentic AI offers an exciting glimpse into a future where machines can collaborate with humans to solve complex, multi-step problems not just respond to commands. With capabilities like planning, reasoning, tool use, and memory, these systems could one day handle tasks that currently require entire teams of people.</p>
<p>But with that power comes real responsibility. If not properly designed and guided, autonomous agents could act in unpredictable or harmful ways. That’s why developers, researchers, and policymakers need to work together to set clear boundaries, safety rules, and ethical standards.</p>
<p>The technology is advancing quickly from self-driving cars to research assistants to multi-agent platforms like AutoGPT and LangChain. As we build smarter systems, the challenge isn't just what they can do, but how we ensure they do it safely, fairly, and in ways that benefit everyone.</p>
 
</article>
<article>
<h1> Why Vibe Coding Won't Destroy Software Engineering </h1>
<p>Ben — Wed, 21 May 2025 15:46:37 +0000</p>
 <p>AI is disrupting all industries at a pace not seen at any time in history.</p>
<p>Technologies and industries that were once dominated by one or two companies or were very much “human-focused” are coming under threat.</p>
<p><a target="_blank" href="https://www.smoothseo.co/blog/misc/what-the-numbers-say-about-ais-growing-role-in-search/">Google is losing ground to AI search</a>, <a target="_blank" href="https://www.axios.com/2022/03/28/automation-long-haul-truckers-jobs">truck drivers</a> may soon be a thing of the past, and low-skilled clerical <a target="_blank" href="https://news.sky.com/story/ai-risks-up-to-eight-million-uk-job-losses-with-low-skilled-worst-hit-report-warns-13102214">jobs are being lost every day</a>.</p>
<p>Will this disruption destroy the Software Engineering industry? I don’t think so, and I’ll tell you why.</p>
<h3 id="heading-heres-what-well-discuss">Here’s what we’ll discuss:</h3>
<ol>
<li><p><a class="post-section-overview" href="#heading-the-phenomenon-of-vibe-coding">The Phenomenon of "Vibe Coding"</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-ai-has-changed-software-development">How AI Has Changed Software Development</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-productivity-paradox">The Productivity Paradox</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-human-engineers-are-still-critical">Why Human Engineers Are Still Critical</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-ai-as-a-capability-multiplier">AI as a “Capability Multiplier”</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-critical-skills-for-the-ai-era">Critical Skills for the AI Era</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-path-forward">The Path Forward</a></p>
</li>
</ol>
<h2 id="heading-the-phenomenon-of-vibe-coding"><strong>The Phenomenon of "Vibe Coding"</strong></h2>
<p>If you follow tech discussions on X, you've likely seen the term "vibe coding" – the practice of building software through trial and error, intuition, and AI-generated code snippets without deep technical knowledge.</p>
<p>Modern AI assistants such as GitHub Copilot and ChatGPT can generate full functions, fix bugs, and create components based on simple descriptions. “Vibe Coders” are claiming that human coders will soon become obsolete.</p>
<p>From my perspective, these AI tools function more as skill multipliers than replacements.</p>
<p>They help talented developers work faster while exposing gaps in knowledge for less skilled programmers. Those lacking technical foundations will face problems they can't solve, but engineers who blend AI assistance with solid expertise will be able to be incredibly productive.</p>
<h2 id="heading-how-ai-has-changed-software-development"><strong>How AI Has Changed Software Development</strong></h2>
<p>The software industry has seen rapid adoption of AI coding tools based on Large Language Models that analyze code repositories to predict and suggest next steps.</p>
<p>These tools have transformed daily programming work by:</p>
<ul>
<li><p>Suggesting complete functions as you type</p>
</li>
<li><p>Creating API endpoints from plain language descriptions</p>
</li>
<li><p>Eliminating hours spent on standard code patterns</p>
</li>
<li><p>Automating documentation tasks</p>
</li>
<li><p>Handling repetitive logic quickly</p>
</li>
</ul>
<p>This shift toward "vibe coding" speeds up feature delivery. Programmers can now build without mastering every technical detail – they describe what they want, get AI suggestions, and adjust until the code works.</p>
<p>The risk? Developers often push code they can't explain. They move quickly during building but struggle when systems break or need changing.</p>
<p>There's also a concerning trend of non-programmers selling AI-built applications. Recently, someone with zero coding background launched a paid service created entirely through AI prompts, only to face a data breach days later when hackers exploited basic security flaws. This is dangerous. It has wasted people's money and exposed their data. Imagine if this became common place due to the rise of “vibe coders”?</p>
<p>For anyone considering building software who isn’t a software engineer, there are a few basic levels of security that you need to consider:</p>
<ul>
<li><p>Adding authentication to your API endpoints: People can scan for open ports and endpoints across the internet. If they can then call your API endpoints without being authenticated, it can cause all sorts of problems</p>
</li>
<li><p>Do not store passwords in plain text. This is a big no no. If you do this and your database gets exposed, those passwords are there for all to see. And if we’re being real, people re-use passwords, so those passwords will be their passwords for other sites too.</p>
</li>
<li><p>SSL: Make sure your website is secure and has an up to date SSL certificate. Transmitting data in plain text is dangerous.</p>
</li>
<li><p>Lock down unused ports: If you are hosting a backend service, make sure that any ports that you don’t use are locked down and people aren’t able to connect to them.</p>
</li>
<li><p>If you have areas where people can upload files, limit the uploads to specific file types.</p>
</li>
</ul>
<p>Those are just a few considerations around security for your site or product, but there are many more.</p>
<h2 id="heading-the-productivity-paradox"><strong>The Productivity Paradox</strong></h2>
<p>AI assistance dramatically increases code output – but volume doesn't equal value in software engineering.</p>
<p>These tools excel at syntax but have no understanding about system architecture, scalability concerns, and maintenance requirements. Just as typing speed doesn't create a better novel, code generation speed doesn't produce better software systems.</p>
<p>AI works for individual functions but struggles with architectural decisions, security planning, and long-term support needs. Without proper review and understanding, AI-generated code often becomes tomorrow's tech-debt and maintenance burden.</p>
<p>Consider this scenario: A developer implements an AI-created authentication system that works in isolation but causes subtle failures in users signing up to the product. Finding and fixing these integration issues might take experienced staff several days – negating any initial time savings. This is a quick path to losing money and trust.</p>
<h2 id="heading-why-human-engineers-are-still-critical"><strong>Why Human Engineers Are Still Critical</strong></h2>
<p>While AI tools handle syntax well, they cannot:</p>
<ol>
<li><p>Plan systems that grow with user demand</p>
</li>
<li><p>Create reliable deployment and testing pipelines</p>
</li>
<li><p>Anticipate unusual but critical failure cases</p>
</li>
<li><p>Make smart tradeoffs between performance and cost</p>
</li>
<li><p>Find non-obvious security weaknesses</p>
</li>
</ol>
<p>Great engineers think beyond code. They develop patterns that help entire teams, select the right technologies, and plan both for success and failure scenarios.</p>
<p>Software creation involves complex tradeoffs: Do we prioritize speed or stability? Flexibility or simplicity? These decisions require both technical expertise and business knowledge.</p>
<p>The highest value engineers I work with spend more time thinking than typing. They consider: How will requirements evolve? What stress points might emerge? How will the system recover from failures?</p>
<p>As basic code generation becomes widely accessible, your value comes from understanding system interactions. The competitive edge will be with those who know why certain approaches succeed, where they might fail, and how to build resilient solutions.</p>
<p>Per the above, there are some things that AI without proper oversight can not do.</p>
<p>AI can, for instance, be great at passing simple prompts to create solutions to well known problems. If you use this prompt, for example:</p>
<p></p>
<p>Here’s the code you might get:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> json

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">is_valid_json</span>(<span class="hljs-params">input_string: str</span>) -> bool:</span>
    <span class="hljs-string">"""
    Checks if the input string is valid JSON.

    Args:
        input_string (str): The JSON string to validate.

    Returns:
        bool: True if valid, False otherwise.
    """</span>
    <span class="hljs-keyword">try</span>:
        json.loads(input_string)
        <span class="hljs-keyword">return</span> <span class="hljs-literal">True</span>
    <span class="hljs-keyword">except</span> json.JSONDecodeError:
        <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    user_input = input(<span class="hljs-string">"Enter a JSON string to validate: "</span>)
    <span class="hljs-keyword">if</span> is_valid_json(user_input):
        print(<span class="hljs-string">"✅ The input is valid JSON."</span>)
    <span class="hljs-keyword">else</span>:
        print(<span class="hljs-string">"❌ The input is NOT valid JSON."</span>)

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    main()
</code></pre>
<p>Pretty basic.</p>
<p>I would not, though, ask an AI to build me a complex Forex Trading platform.</p>
<p></p>
<p>I’ll save you reading through all of the response, but suffice it to say that it was not able to do this for me.</p>
<p>I thought I’d break that down to a slightly smaller subset of a financial trading platform:</p>
<p></p>
<p>Now, I will admit, this got closer. Here was the result:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> fastapi <span class="hljs-keyword">import</span> FastAPI, Request, HTTPException, Header
<span class="hljs-keyword">from</span> pydantic <span class="hljs-keyword">import</span> BaseModel, Field
<span class="hljs-keyword">from</span> typing <span class="hljs-keyword">import</span> Optional
<span class="hljs-keyword">import</span> hmac
<span class="hljs-keyword">import</span> hashlib
<span class="hljs-keyword">import</span> logging

app = FastAPI()

<span class="hljs-comment"># === CONFIGURATION ===</span>
SHARED_SECRET = <span class="hljs-string">b"super_secure_shared_secret"</span>  <span class="hljs-comment"># Keep this secure</span>

<span class="hljs-comment"># === LOGGING SETUP ===</span>
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(<span class="hljs-string">"ems.fix"</span>)

<span class="hljs-comment"># === Pydantic Model for Execution Report (simplified) ===</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ExecutionReport</span>(<span class="hljs-params">BaseModel</span>):</span>
    msg_type: str = Field(default=<span class="hljs-string">"J"</span>, const=<span class="hljs-literal">True</span>)
    cl_ord_id: str  <span class="hljs-comment"># Tag 11</span>
    order_id: str   <span class="hljs-comment"># Tag 37</span>
    exec_id: str    <span class="hljs-comment"># Tag 17</span>
    exec_type: str  <span class="hljs-comment"># Tag 150</span>
    ord_status: str <span class="hljs-comment"># Tag 39</span>
    symbol: str     <span class="hljs-comment"># Tag 55 (e.g., EUR/USD)</span>
    side: str       <span class="hljs-comment"># Tag 54 (1=Buy, 2=Sell)</span>
    leaves_qty: float  <span class="hljs-comment"># Tag 151</span>
    cum_qty: float     <span class="hljs-comment"># Tag 14</span>
    avg_px: float      <span class="hljs-comment"># Tag 6</span>
    transact_time: str <span class="hljs-comment"># Tag 60 (ISO8601 string)</span>

<span class="hljs-comment"># === Helper: HMAC Validator ===</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">verify_signature</span>(<span class="hljs-params">payload: bytes, signature: str</span>) -> bool:</span>
    computed = hmac.new(SHARED_SECRET, payload, hashlib.sha256).hexdigest()
    <span class="hljs-keyword">return</span> hmac.compare_digest(computed, signature)

<span class="hljs-comment"># === Endpoint ===</span>
<span class="hljs-meta">@app.post("/fix/execution-report")</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">receive_execution_report</span>(<span class="hljs-params">
    request: Request,
    x_signature: Optional[str] = Header(<span class="hljs-params">None</span>)
</span>):</span>
    body = <span class="hljs-keyword">await</span> request.body()

    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> x_signature <span class="hljs-keyword">or</span> <span class="hljs-keyword">not</span> verify_signature(body, x_signature):
        logger.warning(<span class="hljs-string">"Invalid or missing signature."</span>)
        <span class="hljs-keyword">raise</span> HTTPException(status_code=<span class="hljs-number">403</span>, detail=<span class="hljs-string">"Unauthorized"</span>)

    <span class="hljs-keyword">try</span>:
        report_data = ExecutionReport.parse_raw(body)
        logger.info(<span class="hljs-string">f"Execution Report received: <span class="hljs-subst">{report_data.dict()}</span>"</span>)
        <span class="hljs-comment"># ✅ Here, you'd handle order updates, booking, etc.</span>
        <span class="hljs-keyword">return</span> {<span class="hljs-string">"status"</span>: <span class="hljs-string">"ok"</span>, <span class="hljs-string">"message"</span>: <span class="hljs-string">"Execution Report received"</span>}
    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        logger.exception(<span class="hljs-string">"Invalid payload format."</span>)
        <span class="hljs-keyword">raise</span> HTTPException(status_code=<span class="hljs-number">400</span>, detail=<span class="hljs-string">f"Invalid FIX message format: <span class="hljs-subst">{str(e)}</span>"</span>)
</code></pre>
<p>But this does not come close to the final version. Even if you could use this code as a part of your trading system, unless you know how to build and architect and entire software system, where do you go from here? You can tell your clients that they can send you an “ExecutionReport”, but then what?</p>
<p>I’ve read and heard the arguments that you “just need to be better at prompting to build bigger systems”. But where does the benefit come from then? The complexity of financial trading systems is beyond comprehension. Prompting a system well enough, with enough information to make it fully featured, scalable, secure, and extensible (not to mention able to be debugged) would itself be a mammoth task. So where is the time being saved? Is it even possible?</p>
<p>I’m yet to see any proof anywhere that anyone has build such a complex system without the oversight of a human, and I’m not convinced we will see it at any point in the near future.</p>
<h2 id="heading-ai-as-a-capability-multiplier"><strong>AI as a “Capability Multiplier”</strong></h2>
<p>These AI tools help magnify existing capabilities rather than replacing them. Skilled developers become far more productive, while less skilled ones generate problems more quickly.</p>
<p>Effective engineers use AI to:</p>
<ul>
<li><p>Handle basic implementation tasks</p>
</li>
<li><p>Create initial project frameworks</p>
</li>
<li><p>Compare different solution approaches</p>
</li>
<li><p>Move past challenging problems</p>
</li>
</ul>
<p>Meanwhile, less capable developers use AI to mask skill gaps, implementing solutions they neither understand nor can modify. When these implementations fail, they lack the knowledge to fix them independently.</p>
<p>This widens the skill gap. Top engineers leverage AI for mechanical tasks while focusing on higher-value thinking. Those using AI as a substitute for learning face limitations when working beyond the AI's knowledge boundaries.</p>
<p>A good example of something that AI is perfect for is translation logic:</p>
<p>Let’s say I have Python Dataclass representing an" “InternalUser”. I also have a Django ORM representation of the same entity. If I wanted to convert one to the other, I can just paste both representations in to ChatGPT and get it create me a conversion function. Notice that the conversion function also takes into account that the field names aren’t exact matches:</p>
<pre><code class="lang-python"><span class="hljs-meta">@dataclass</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">InternalUser</span>:</span>
    id: str
    email: str
    hashed_password: str
    full_name: str
    role: UserRole
    status: AccountStatus
    created_at: datetime
    updated_at: datetime
    address: Optional[Address] = <span class="hljs-literal">None</span>
    preferences: Preferences = field(default_factory=Preferences)
    login_activity: LoginActivity = field(default_factory=LoginActivity)
    tags: List[str] = field(default_factory=list)
    notes: Optional[str] = <span class="hljs-literal">None</span>
</code></pre>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">InternalUser</span>(<span class="hljs-params">models.Model</span>):</span>
    id = models.CharField(primary_key=<span class="hljs-literal">True</span>, max_length=<span class="hljs-number">100</span>)  <span class="hljs-comment"># Use UUIDField if preferred</span>
    email = models.EmailField(unique=<span class="hljs-literal">True</span>)
    hashedPassword = models.CharField(max_length=<span class="hljs-number">256</span>)
    fullName = models.CharField(max_length=<span class="hljs-number">255</span>)
    role = models.CharField(max_length=<span class="hljs-number">20</span>, choices=UserRole.choices, default=UserRole.USER)
    status = models.CharField(max_length=<span class="hljs-number">20</span>, choices=AccountStatus.choices, default=AccountStatus.PENDING)
    createdAt = models.DateTimeField(default=timezone.now)
    updatedAt = models.DateTimeField(auto_now=<span class="hljs-literal">True</span>)

    address = models.OneToOneField(Address, null=<span class="hljs-literal">True</span>, blank=<span class="hljs-literal">True</span>, on_delete=models.SET_NULL)
    preferences = models.OneToOneField(Preferences, on_delete=models.CASCADE)
    loginActivity = models.OneToOneField(LoginActivity, on_delete=models.CASCADE)

    tags = models.JSONField(default=list, blank=<span class="hljs-literal">True</span>)  <span class="hljs-comment"># List of strings</span>
    notes = models.TextField(null=<span class="hljs-literal">True</span>, blank=<span class="hljs-literal">True</span>)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__str__</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-keyword">return</span> <span class="hljs-string">f"<span class="hljs-subst">{self.fullName}</span> (<span class="hljs-subst">{self.email}</span>)"</span>
</code></pre>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> typing <span class="hljs-keyword">import</span> Optional
<span class="hljs-keyword">from</span> myapp.models <span class="hljs-keyword">import</span> InternalUser <span class="hljs-keyword">as</span> ORMInternalUser, Address <span class="hljs-keyword">as</span> ORMAddress, Preferences <span class="hljs-keyword">as</span> ORMPreferences, LoginActivity <span class="hljs-keyword">as</span> ORMLoginActivity
<span class="hljs-keyword">from</span> mydataclasses <span class="hljs-keyword">import</span> InternalUser <span class="hljs-keyword">as</span> DCInternalUser  <span class="hljs-comment"># assuming your dataclass lives here</span>


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">convert_internal_user_to_model</span>(<span class="hljs-params">dc_user: DCInternalUser</span>) -> ORMInternalUser:</span>
    <span class="hljs-comment"># Create or update Address if provided</span>
    orm_address: Optional[ORMAddress] = <span class="hljs-literal">None</span>
    <span class="hljs-keyword">if</span> dc_user.address:
        orm_address = ORMAddress.objects.create(
            street=dc_user.address.street,
            city=dc_user.address.city,
            state=dc_user.address.state,
            postalCode=dc_user.address.postal_code,
            country=dc_user.address.country
        )

    <span class="hljs-comment"># Create Preferences</span>
    orm_preferences = ORMPreferences.objects.create(
        newsletterOptIn=dc_user.preferences.newsletter_opt_in,
        darkModeEnabled=dc_user.preferences.dark_mode_enabled,
        language=dc_user.preferences.language
    )

    <span class="hljs-comment"># Create LoginActivity</span>
    orm_login_activity = ORMLoginActivity.objects.create(
        lastLogin=dc_user.login_activity.last_login,
        loginAttempts=dc_user.login_activity.login_attempts,
        lastIpAddress=dc_user.login_activity.last_ip_address
    )

    <span class="hljs-comment"># Create InternalUser</span>
    orm_user = ORMInternalUser.objects.create(
        id=dc_user.id,
        email=dc_user.email,
        hashedPassword=dc_user.hashed_password,
        fullName=dc_user.full_name,
        role=dc_user.role.value,
        status=dc_user.status.value,
        createdAt=dc_user.created_at,
        updatedAt=dc_user.updated_at,
        address=orm_address,
        preferences=orm_preferences,
        loginActivity=orm_login_activity,
        tags=dc_user.tags,
        notes=dc_user.notes
    )

    <span class="hljs-keyword">return</span> orm_user
</code></pre>
<p>This is a basic example, but extrapolate this out to an entire internal data model and you’ve saved hours and hours of code.</p>
<p>You can even then ask it to create you the function stubs of the tests:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pytest
<span class="hljs-keyword">from</span> datetime <span class="hljs-keyword">import</span> datetime
<span class="hljs-keyword">from</span> yourmodule.dataclasses <span class="hljs-keyword">import</span> InternalUser, Address, Preferences, LoginActivity, UserRole, AccountStatus
<span class="hljs-keyword">from</span> yourmodule.conversion <span class="hljs-keyword">import</span> dataclass_to_django_internal_user


<span class="hljs-meta">@pytest.mark.django_db</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_internal_user_conversion_basic_fields</span>():</span>
    <span class="hljs-comment"># Test that basic fields (email, name, etc.) are correctly copied</span>
    <span class="hljs-keyword">pass</span>


<span class="hljs-meta">@pytest.mark.django_db</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_internal_user_conversion_with_address</span>():</span>
    <span class="hljs-comment"># Test that address fields are properly mapped to the ORM model</span>
    <span class="hljs-keyword">pass</span>


<span class="hljs-meta">@pytest.mark.django_db</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_internal_user_conversion_with_preferences</span>():</span>
    <span class="hljs-comment"># Test preferences like dark mode, newsletter opt-in, and language</span>
    <span class="hljs-keyword">pass</span>


<span class="hljs-meta">@pytest.mark.django_db</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_internal_user_conversion_with_login_activity</span>():</span>
    <span class="hljs-comment"># Test login attempts, last IP, and last login datetime</span>
    <span class="hljs-keyword">pass</span>


<span class="hljs-meta">@pytest.mark.django_db</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_internal_user_conversion_with_tags_and_notes</span>():</span>
    <span class="hljs-comment"># Test tags list and optional notes field</span>
    <span class="hljs-keyword">pass</span>


<span class="hljs-meta">@pytest.mark.django_db</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_internal_user_conversion_with_missing_optional_fields</span>():</span>
    <span class="hljs-comment"># Ensure None fields like address or lastLogin don’t break conversion</span>
    <span class="hljs-keyword">pass</span>


<span class="hljs-meta">@pytest.mark.django_db</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_internal_user_conversion_saves_correctly</span>():</span>
    <span class="hljs-comment"># Save all related models and main InternalUser model and check database</span>
    <span class="hljs-keyword">pass</span>
</code></pre>
<p>Now, I’m not suggesting that you take these as is and don’t add your own thought in to each possible test scenario, but it’s a great start.</p>
<p>These pieces of “grunt work” were never what we paid the top engineers for. These were just the things that they had to do to get the job done. People didn’t enjoy these tasks. They weren’t fulfilling.</p>
<h2 id="heading-critical-skills-for-the-ai-era"><strong>Critical Skills for the AI Era</strong></h2>
<p>As AI handles more coding tasks, successful engineers must develop strengths in areas where human judgment remains essential:</p>
<p>Systems thinking becomes the primary skill – understanding component interactions, identifying potential failures, and designing for future growth. This capability comes from experience, not prompting.</p>
<p>You should build expertise in infrastructure and deployment processes. Software that works in development but fails in production creates no value. So, learn about <a target="_blank" href="https://www.freecodecamp.org/news/learn-continuous-integration-delivery-and-deployment/">continuous integration</a>, <a target="_blank" href="https://www.freecodecamp.org/news/how-to-set-up-monitoring-for-nodejs-applications-using-elastic/">monitoring</a> systems, and <a target="_blank" href="https://www.freecodecamp.org/news/beginners-guide-to-cloud-computing-with-aws/">cloud platform capabilities</a>.</p>
<p>You should also master <a target="_blank" href="https://www.freecodecamp.org/news/rest-api-design-best-practices-build-a-rest-api/">API design</a> – the interfaces between systems. <a target="_blank" href="https://www.freecodecamp.org/news/design-an-api-application-program-interface/">Well-designed APIs</a> enable team independence. Poor interfaces create bottlenecks affecting everyone.</p>
<p>Another key skill is being able to integrate security throughout the development process. A single oversight can result in breaches, damaging both customer trust and business standing.</p>
<p>Make sure you develop communication skills for both technical and non-technical audiences. You’ll need to explain complex decisions clearly across different stakeholder groups.</p>
<p>And study how AI tools function to understand their limitations and strengths, allowing you to use them most effectively.</p>
<p>For senior developers, mentoring becomes increasingly important. New engineers need guidance on responsible AI usage – knowing when to accept suggestions and when to question them.</p>
<h2 id="heading-the-path-forward"><strong>The Path Forward</strong></h2>
<p>The software field is entering a significant transition. AI will generate more code more quickly, transforming development practices. This shift presents both opportunities and challenges.</p>
<p>The most valuable positions will go to those good at tasks machines cannot handle. These engineers will determine what to build, how to design it, and how to balance technical constraints with business objectives.</p>
<p>"Vibe coding" serves as a useful technique for specific needs – like quickly building standard components. But it fails as a comprehensive strategy for complex system development.</p>
<p>Skilled engineers will advance by delegating routine work to AI while addressing more challenging problems. Less skilled engineers will struggle as fundamental knowledge gaps become apparent.</p>
<p>With regards to learning how to use AI effectively, also use caution and judgement when following advice from people online. It’s still a fairly new field and changes constantly.</p>
<p>People online are giving away “free prompts” to generate code. These prompts may be great or may have problems. The prompts may have worked when they used them, but the AI models may have changed and maybe they’ll produce different results now. Be cautious and use your best judgement.</p>
<p>The future belongs to those who view AI as a collaborative tool rather than a replacement. Software development remains fundamentally human-driven, now supported by increasingly powerful assistance.</p>
<p><em>In his spare time, Ben writes his tech blog</em> <a target="_blank" href="https://justanothertechlead.com/"><em>Just Another Tech Lead</em></a> <em>and runs a site on SEO,</em> <a target="_blank" href="https://www.smoothseo.co"><em>SmoothSEO</em></a><em>.</em></p>
 
</article>
<article>
<h1> How to Use GPT to Analyze Large Datasets </h1>
<p>David Clinton — Wed, 28 Aug 2024 17:57:59 +0000</p>
 <p>Absorbing and then summarizing very large quantities of content in just a few seconds truly is a big deal. As an example, a while back I received a link to the recording of an important 90 minute business video conference that I'd missed a few hours before.</p>
<p>The reason I'd missed the live version was because I had no time (I was, if you must know, rushing to finish my <a target="_blank" href="https://amzn.to/3yLFT3b">Manning book, The Complete Obsolete Guide to Generative AI</a> – from which this article is excerpted).</p>
<p>Well, a half a dozen hours later I still had no time for the video. And, inexplicably, the book was still not finished.</p>
<p>So here's how I resolved the conflict the GPT way:</p>
<ul>
<li><p>I used OpenAI Whisper to generate a transcript based on the audio from the recording</p>
</li>
<li><p>I exported the transcript to a PDF file</p>
</li>
<li><p>I uploaded the PDF to ChatPDF</p>
</li>
<li><p>I prompted ChatPDF for summaries connected to the specific topics that interested me</p>
</li>
</ul>
<p>Total time to "download" the key moments from the 90 minute call: 10 minutes. That's 10 minutes to convert a dataset made up of around 15,000 spoken words to a machine-readable format, and to then digest, analyze, and summarize it.</p>
<h3 id="heading-how-to-use-gpt-for-business-analytics">How to Use GPT for Business Analytics</h3>
<p>But all that's old news by now. The <em>next-level</em> level will solve the problem of business analytics.</p>
<p>Ok. So what <em>is</em> the "problem with business analytics"? It's the hard work of building sophisticated code that parses large datasets to make them consistently machine readable (also known as "data wrangling"). It then applies complex algorithms to tease out useful insights. The figure below broadly outlines the process.</p>
<p></p>
<p>A lot of the code that fits that description is incredibly complicated, not to mention clever. Inspiring clever data engineers to write that clever code can, of course, cost organizations many, many fortunes. The "problem" then, is the cost.</p>
<p>So solving that problem could involve leveraging a few hundred dollars worth of large language model (LLM) API charges. Here's how I plan to illustrate that.</p>
<p>I'll need a busy spreadsheet to work with, right? The best place I know for good data is the <a target="_blank" href="https://www.kaggle.com/">Kaggle website</a>.</p>
<p>Kaggle is an online platform for hosting datasets (and data science competitions). It's become in important resource for data scientists, machine learning practitioners, and researchers, allowing them to showcase their skills, learn from <a target="_blank" href="https://www.kaggle.com/">others,</a> and collaborate on projects. The platform offers a wide range of public and private datasets, as well as tools and features to support data exploration and modeling.</p>
<h3 id="heading-how-to-prepare-a-dataset">How to Prepare a Dataset</h3>
<p><a target="_blank" href="https://www.kaggle.com/datasets/snassimr/data-for-investing-type-prediction">The "Investing Program Type Prediction"</a> dataset associated with this code should work perfectly. From what I can tell, this was data aggregated by a bank somewhere in the world that represents its customers' behavior.</p>
<p>Everything has been anonymized, of course, so there's no way for us to know which bank we're talking about, who the customers were, or even where in the world all this was happening. In fact, I'm not even 100% sure what each column of data represents.</p>
<p>What <em>is</em> clear is that each customer's age and neighborhood are there. Although the locations have been anonymized as <code>C1</code>, <code>C2</code>, <code>C3</code> and so on, some of the remaining columns clearly contain financial information.</p>
<p>Based on those assumptions, my ultimate goal is to search for statistically valid relationships between columns. For instance, are there specific demographic features (income, neighborhood, age) that predict a greater likelihood of a customer purchasing additional banking products? For this specific example I'll see if I can identify the geographic regions within the data whose average household wealth is the highest.</p>
<p>For normal uses, such vaguely described data would be worthless. But since we're just looking to demonstrate the process it'll do just fine. I'll <em>make up</em> column headers that more or less fit the shape of their data. Here's how I named them:</p>
<ul>
<li><p>Customer ID</p>
</li>
<li><p>Customer age</p>
</li>
<li><p>Geographic location</p>
</li>
<li><p>Branch visits per year</p>
</li>
<li><p>Total household assets</p>
</li>
<li><p>Total household debt</p>
</li>
<li><p>Total investments with bank</p>
</li>
</ul>
<p>The column names need to be very descriptive because those will be the only clues I'll give GPT to help it understand the data. I did have to add my own customer IDs to that first column (they didn't originally exist).</p>
<p>The fastest way I could think of to do that was to insert the <code>=(RAND())</code> formula into the top data cell in that column (with the file loaded into spreadsheet software like Excel, Google Sheets, or LibreOffice Calc) and then apply the formula to the rest of the rows of data. When that's done, all the 1,000 data rows will have unique IDs, albeit IDs between 0 and 1 with many decimal places.</p>
<h3 id="heading-how-to-apply-llamaindex-to-the-problem">How to Apply LlamaIndex to the Problem</h3>
<p>With my data prepared, I'll use <a target="_blank" href="https://www.llamaindex.ai/">LlamaIndex</a> to get to work analyzing the numbers. As before, the code I'm going to execute will:</p>
<ul>
<li><p>Import the necessary functionality</p>
</li>
<li><p>Add my OpenAI API k<a target="_blank" href="https://www.llamaindex.ai/">ey</a></p>
</li>
<li><p>Read the data file that's in the directory called <code>data</code></p>
</li>
<li><p>Build the nodes from which we'll populate our index</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> openai
<span class="hljs-keyword">from</span> llama_index <span class="hljs-keyword">import</span> SimpleDirectoryReader
<span class="hljs-keyword">from</span> llama_index.node_parser <span class="hljs-keyword">import</span> SimpleNodeParser
<span class="hljs-keyword">from</span> llama_index <span class="hljs-keyword">import</span> GPTVectorStoreIndex

os.environ[<span class="hljs-string">'OPENAI_API_KEY'</span>] = <span class="hljs-string">"sk-XXXX"</span>

documents = SimpleDirectoryReader(<span class="hljs-string">'data'</span>).load_data()
parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)
index = GPTVectorStoreIndex.from_documents(documents)
</code></pre>
<p>Finally, I'll send my prompt:</p>
<pre><code class="lang-python">response = index.query(
    <span class="hljs-string">"Based on the data, which 5 geographic regions had the highest average household net wealth? Show me nothing more than the region codes"</span>
)
print(response)
</code></pre>
<p>Here it is again in a format that's easier on the eyes:</p>
<blockquote>
<p><em>Based on the data, which 5 geographic regions had the highest household net wealth?</em></p>
</blockquote>
<p>I asked this question primarily to confirm that GPT understood the data. It's always good to test your model just to see if the responses you're getting seem to reasonably reflect what you already know about the data.</p>
<p>To answer properly, GPT would need to figure out what each of the column headers means and the relationships <em>between</em> columns. In other words, it would need to know how to calculate net worth for each row (account ID) from the values in the <code>Total household assets</code>, <code>Total household debt</code>, and  <code>Total investments with bank</code> columns. It would then need to aggregate all the net worth numbers that it generated by <code>Geographic location</code>, calculate averages for each location and, finally, compare all the averages and rank them.</p>
<p>The result? I <em>think</em> GPT nailed it. After a minute or two of deep and profound thought (and around $0.25 in API charges), I was shown five location codes (G0, G90, G96, G97, G84, in case you're curious). This tells me that GPT understands the location column the same way I did and is at least attempting to infer relationships between location and demographic features.</p>
<p>What did I mean "I think"? Well I never actually checked to confirm that the numbers made sense. For one thing, this isn't real data anyway and, for all I know, I guessed the contents of each column incorrectly.</p>
<p>But also because <em>every</em> data analysis needs checking against the real world so, in that sense, GPT-generated analysis is no different. In other words, whenever you're working with data that's supposed to represent the real world, you should always find a way to calibrate your data using known values to confirm that the whole thing isn't a happy fantasy.</p>
<p>I then asked a second question that reflects a real-world query that would interest any bank:</p>
<blockquote>
<p><em>Based on their age, geographic location, number of annual visits to bank branch, and total current investments, who are the ten customers most likely to invest in a new product offering? Show me only the value of the</em> <code>customer ID</code> columns for those ten customers.</p>
</blockquote>
<p>Once again GPT spat back a response that at least <em>seemed</em> to make sense. This question was also designed to test GPT on its ability to correlate multiple metrics and submit them to a complex assessment ("...most likely to invest in a new product offering").</p>
<p>I'll rate that as another successful experiment.</p>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>GPT – and other LLMs – are capable of independently parsing, analyzing, and deriving insights from large data sets.</p>
<p>There will be limits to the magic, of course. GPT and its cousins can still hallucinate – especially when your prompts give it too much room to be "creative" or, sometimes, when you've been gone too deep into a single prompt thread. And there are also some hard limits to how much data OpenAI will allow you to upload.</p>
<p>But, overall, you can accomplish more and faster than you can probably imagine right now.</p>
<p>While all that greatly simplifies the data analytics process, success still depends on understanding the real-world context of your data and coming up with specific and clever prompts. That'll be your job.</p>
<p><em>This article is excerpted from</em> <a target="_blank" href="https://amzn.to/3yLFT3b"><em>my Manning book, The Complete Obsolete Guide to Generative AI.</em></a> <em>There's plenty more technology goodness available through</em> <a target="_blank" href="https://bootstrap-it.com"><em>my website</em></a><em>.</em></p>
 
</article>
</main></body></html>

Model	Input cost	Output cost	Weekly total	Annualized (52 wk)
GPT-5.5 (\(5 / \)30)	3.6M × \(5/1M = \)18.00	0.36M × \(30/1M = \)10.80	$28.80	$1,498
GPT-5.5 Pro (\(30 / \)180)	$108.00	$64.80	$172.80	$8,986
GPT-5.4 (\(2.50 / \)15)	$9.00	$5.40	$14.40	$749
GPT-5-Codex (\(1.25 / \)10)	$4.50	$3.60	$8.10	$421
GPT-5.1-Codex-mini (\(0.25 / \)2)	$0.90	$0.72	$1.62	$84