<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ General Programming - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ General Programming - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Thu, 28 May 2026 04:41:51 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/programming/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Build an AI-Powered Research Automation System with n8n, Groq, and Academic APIs ]]>
                </title>
                <description>
                    <![CDATA[ As a researcher and developer, I found myself spending hours manually searching academic databases, reading abstracts, and trying to synthesize findings across multiple sources. For my work on circula ]]>
                </description>
                <link>https://www.freecodecamp.org/news/build-an-ai-powered-research-automation-system-with-n8n-groq-and-academic-apis/</link>
                <guid isPermaLink="false">69b849372ad6ae5184dbb6b8</guid>
                
                    <category>
                        <![CDATA[ n8n ]]>
                    </category>
                
                    <category>
                        <![CDATA[ freeCodeCamp.org ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ automation ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Chidozie Managwu ]]>
                </dc:creator>
                <pubDate>Mon, 16 Mar 2026 18:17:27 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/d4660bc7-3f3c-4325-bee7-57770e821204.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>As a researcher and developer, I found myself spending hours manually searching academic databases, reading abstracts, and trying to synthesize findings across multiple sources.</p>
<p>For my work on circular economy and battery recycling, I needed a way to query multiple databases at once without the manual fatigue.</p>
<p>In this tutorial, you'll build an automated research pipeline using n8n that reduces roughly six hours of manual literature review into a five-minute automated process.</p>
<p>This isn’t a “cool demo workflow.” It’s a production-minded pipeline with parallel collection, normalisation, deduplication, structured AI extraction, scoring, and practical error handling.</p>
<h3 id="heading-table-of-contents">Table of Contents</h3>
<ol>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-the-problem-research-takes-too-long">The Problem: Research Takes Too Long</a></p>
</li>
<li><p><a href="#heading-the-tech-stack">The Tech Stack</a></p>
</li>
<li><p><a href="#heading-the-project-structure-how-to-think-about-an-n8n-workflow-like-software">The Project Structure: How to Think About an n8n Workflow Like Software</a></p>
</li>
<li><p><a href="#heading-stage-1-centralized-configuration">Stage 1: Centralised Configuration</a></p>
</li>
<li><p><a href="#heading-stage-2-parallel-api-collection=with-failure-isolation">Stage 2: Parallel API Collection (With Failure Isolation)</a></p>
</li>
<li><p><a href="#heading-stage-3-normalisation-and-deduplication-doifirst-title-fallback">Stage 3: Normalisation and Deduplication (DOI-first, Title fallback)</a></p>
</li>
<li><p><a href="#heading-stage-4-aipowered-content-extraction-strict-json">Stage 4: AI-Powered Content Extraction (Strict JSON)</a></p>
</li>
<li><p><a href="#heading-stage-5-scoring-and-synthesis">Stage 5: Scoring and Synthesis</a></p>
</li>
<li><p>[Beginner-Friendly Evals (Retrieval and Extraction QA)(#heading-beginnerfriendly-evals-retrieval-and-extraction-qa)</p>
</li>
<li><p><a href="#heading-key-learnings-and-error-handling">Key Learnings and Error Handling</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>You don’t need to be a DevOps engineer to follow this, but you should have:</p>
<ul>
<li><p>Basic comfort with APIs and JSON (request/response payloads)</p>
</li>
<li><p>Familiarity with spreadsheets (Google Sheets basics)</p>
</li>
<li><p>Willingness to use a small amount of JavaScript inside n8n Function/Code nodes</p>
</li>
</ul>
<p>Access to:</p>
<ul>
<li><p>An n8n instance (self-hosted or cloud)</p>
</li>
<li><p>A Groq API key (or a compatible LLM provider)</p>
</li>
<li><p>Optional API keys, depending on the databases you use</p>
</li>
</ul>
<p>What you’ll build assumes:</p>
<ul>
<li><p>You’re extracting from metadata + abstracts (not downloading full PDFs).</p>
</li>
<li><p>You can accept that some sources will occasionally rate-limit or return partial results (and your workflow will be designed to survive this).</p>
</li>
</ul>
<h2 id="heading-the-problem-research-takes-too-long">The Problem: Research Takes Too Long</h2>
<p>Manual research is often a bottleneck for innovation. Before building this automation, my workflow involved searching multiple academic databases, scanning abstracts, and manually extracting key findings. This process was not only slow but also prone to human error and inconsistent note-taking.</p>
<p>The goal of this automation is to provide a “full-stack research assistant” that handles the heavy lifting of collecting candidate papers, removing duplicates, extracting consistent fields, scoring relevance and quality, and delivering a curated daily or weekly report, so you can spend your time on high-level synthesis rather than repetitive collection.</p>
<h2 id="heading-the-tech-stack">The Tech Stack</h2>
<p>This workflow leverages a combination of automation tooling, high-speed LLM inference, and academic metadata providers.</p>
<table>
<thead>
<tr>
<th>Tool</th>
<th>Purpose</th>
</tr>
</thead>
<tbody><tr>
<td>n8n</td>
<td>The workflow engine that orchestrates all steps</td>
</tr>
<tr>
<td>Groq</td>
<td>Runs a fast LLM (for example, Llama 3.3 70B) for structured extraction/synthesis</td>
</tr>
<tr>
<td>Semantic Scholar / OpenAlex</td>
<td>Broad academic coverage for metadata, abstracts, citations</td>
</tr>
<tr>
<td>arXiv / PubMed</td>
<td>Strong specialised coverage (preprints, life sciences)</td>
</tr>
<tr>
<td>Google Sheets</td>
<td>A lightweight “research database” for storage + history</td>
</tr>
</tbody></table>
<p>Notes: coverage varies by provider. Some APIs return abstracts reliably, while others may omit them. Your pipeline should treat missing abstracts as a normal case, not a failure.</p>
<h2 id="heading-the-project-structure-how-to-think-about-an-n8n-workflow-like-software">The Project Structure: How to Think About an n8n Workflow Like Software</h2>
<p>While n8n is a visual tool, it helps to design your workflow as modular stages to avoid the “spaghetti workflow” problem.</p>
<pre><code class="language-text">.
├── configuration/         # Keywords, thresholds, limits, date filters
├── collectors/            # Parallel HTTP request nodes (multiple sources)
├── processing/            # Normalization + deduplication code nodes
├── extraction/            # LLM extraction nodes (strict JSON)
├── scoring/               # Relevance + quality scoring + filtering
└── delivery/              # Google Sheets + email/HTML report
</code></pre>
<p>Design principle: each stage should produce a clean, predictable output shape that the next stage can rely on.</p>
<h2 id="heading-stage-1-centralised-configuration">Stage 1: Centralised Configuration</h2>
<p>Instead of hardcoding search parameters (keywords, min year, citation thresholds) across multiple nodes, use one configuration node to define workflow variables.</p>
<p>This matters for maintainability (change a value once, not in ten nodes), reusability (repurpose the entire pipeline by swapping one config object), and debuggability (log the config at the start of each run so you can reproduce results).</p>
<p>Use a Set node, or a Code node returning JSON like this:</p>
<pre><code class="language-json">{
  "keywords": "circular economy battery recycling remanufacturing",
  "min_year": 2020,
  "max_results_per_source": 10,
  "min_citations": 2,
  "relevance_threshold": 15,
  "batch_size": 10
}
</code></pre>
<p>Tip: keep numeric fields as numbers (not strings) to avoid scoring bugs later.</p>
<h2 id="heading-stage-2-parallel-api-collection-with-failure-isolation">Stage 2: Parallel API Collection (With Failure Isolation)</h2>
<p>Your workflow should query multiple sources simultaneously. In n8n, you can branch from your configuration node into multiple HTTP Request nodes, and then merge results later.</p>
<p>The production mindset here is simple: APIs fail. Rate limits happen. Providers return partial data. The key is to prevent one failing collector from crashing the whole run.</p>
<p>To implement this, on each HTTP Request node, enable <strong>Continue On Fail</strong> (or the equivalent “don’t stop workflow” behaviour). Then, in the normalisation stage, treat missing or failed outputs as empty arrays so downstream stages still run.</p>
<p>In practice, it also helps to set explicit timeouts and add a small retry policy (one to two retries) for transient failures. “Good” looks like this: if two out of five sources fail, you still produce a useful report from the remaining three, and you log which sources failed so you can investigate later.</p>
<h2 id="heading-stage-3-normalisation-and-deduplication-doi-first-title-fallback">Stage 3: Normalisation and Deduplication (DOI-first, Title fallback)</h2>
<p>Each academic API returns different field names and shapes. One might use <code>title</code>, another <code>display_name</code>, another <code>paper_title</code>. Your next stage should normalise all inputs into one schema.</p>
<h3 id="heading-target-normalised-schema">Target normalised schema</h3>
<p>Here’s a simple baseline schema (expand later as needed):</p>
<pre><code class="language-json">{
  "title": "string",
  "abstract": "string|null",
  "doi": "string|null",
  "year": 2024,
  "citations": 12,
  "url": "string|null",
  "source": "Semantic Scholar|OpenAlex|arXiv|PubMed"
}
</code></pre>
<h3 id="heading-what-deduping-by-doi-means-and-what-a-doi-is">What deduping by DOI means (and what a DOI is)</h3>
<p>A <strong>DOI</strong> (Digital Object Identifier) is a unique, persistent identifier assigned to many scholarly publications. If a paper has a DOI, that DOI functions like a stable ID: the same paper may appear in multiple databases with slightly different metadata, but the DOI should remain consistent.</p>
<p>So, <strong>deduping by DOI</strong> means: if two records share the same DOI, treat them as the same paper and keep only one.</p>
<p>When a DOI is missing (which is common for some preprints and some API responses), the fallback is to dedupe using a normalised title key, lowercased, trimmed, punctuation stripped, and whitespace collapsed. It’s not as perfect as DOI-based matching, but it’s a strong pragmatic backup.</p>
<h3 id="heading-what-normalise-into-a-unified-object-means-whats-happening-in-the-code">What “normalise into a unified object” means (what’s happening in the code)</h3>
<p>“Normalise into a unified object” simply means converting every provider’s raw response into the same predictable shape (the schema above). Once everything looks the same, downstream steps, such as deduplication, scoring, AI extraction, and storage, become straightforward because they don’t need provider-specific logic.</p>
<p>In the code below, that’s what the <code>normalized</code> object is: it maps Semantic Scholar’s fields (<code>paper.title</code>, <code>paper.externalIds.DOI</code>, <code>paper.citationCount</code>) into your standard fields (<code>title</code>, <code>doi</code>, <code>citations</code>, etc.). After that, the workflow generates a dedupe key (<code>doi:...</code> if DOI exists, otherwise <code>title:...</code>) and uses a <code>Set</code> to keep only the first occurrence.</p>
<h4 id="heading-example-n8n-code-node-normalisation-dedupe-pattern">Example n8n Code Node (Normalisation + Dedupe Pattern)</h4>
<pre><code class="language-javascript">const itemsIn = $input.all();

const seen = new Set();
const results = [];

function titleKey(t) {
  return (t || "")
    .toLowerCase()
    .replace(/[\W_]+/g, " ")
    .replace(/\s+/g, " ")
    .trim();
}

for (const item of itemsIn) {
  // Example: Semantic Scholar response shape
  const papers = item.json?.data || [];

  for (const paper of papers) {
    // "Normalize into a unified object":
    // take the provider-specific fields and map them into our standard schema.
    const normalized = {
      title: paper.title || null,
      abstract: paper.abstract || null,
      doi: paper.externalIds?.DOI || null,
      year: paper.year || null,
      citations: paper.citationCount || 0,
      url: paper.url || null,
      source: "Semantic Scholar",
    };

    if (!normalized.title) continue;

    // Dedupe key: DOI is strongest; title is fallback
    const key = normalized.doi
      ? `doi:${normalized.doi.toLowerCase()}`
      : `title:${titleKey(normalized.title)}`;

    if (seen.has(key)) continue;
    seen.add(key);

    results.push(normalized);
  }
}

return results.map(r =&gt; ({ json: r }));
</code></pre>
<p>Production-minded note: keep a field like <code>source</code> so you can debug where bad metadata is coming from later.</p>
<h2 id="heading-stage-4-ai-powered-content-extraction-strict-json">Stage 4: AI-Powered Content Extraction (Strict JSON)</h2>
<p>Once you have a deduplicated list of papers, you can send each paper (or a small batch) to Groq for structured extraction.</p>
<h3 id="heading-why-structured-output-matters">Why structured output matters</h3>
<p>If your LLM returns narrative text instead of JSON, misses fields, or emits malformed JSON, your workflow breaks downstream. In a production workflow, that’s not a rare edge case; it’s something you should expect and design around.</p>
<p>That’s why you’ll use strict schema prompting <em>and</em> validate responses downstream.</p>
<h3 id="heading-system-prompt-vs-user-prompt-and-how-to-compose-them">System prompt vs user prompt (and how to compose them)</h3>
<p>A helpful way to think about prompts in production is:</p>
<ul>
<li><p>The <strong>system prompt</strong> defines the <em>non-negotiable contract</em>: output format, allowed keys, no commentary, and what to do in uncertain cases. This is where you say “return ONLY valid JSON” and “no extra keys.”</p>
</li>
<li><p>The <strong>user prompt</strong> provides the <em>variable data</em> for this specific request: title, year, citations, abstract, and the exact schema you want filled.</p>
</li>
</ul>
<p>Composing them this way keeps your workflow stable. The system prompt stays mostly constant (your formatting contract), while the user prompt changes per paper (your payload). It also makes debugging easier: if outputs start failing, you can adjust the system constraints without rewriting every payload template.</p>
<h3 id="heading-suggested-extraction-schema">Suggested extraction schema</h3>
<p>Extract only what you can support from abstract-level data:</p>
<ul>
<li><p><code>research_question</code></p>
</li>
<li><p><code>methodology</code></p>
</li>
<li><p><code>key_findings</code></p>
</li>
<li><p><code>limitations</code></p>
</li>
<li><p><code>notes</code> (for missing abstract / ambiguity)</p>
</li>
</ul>
<h3 id="heading-example-prompt-system-user">Example prompt (system + user)</h3>
<p><strong>System:</strong></p>
<p>You are a research extraction engine. You must return ONLY valid JSON.<br>No markdown. No extra keys. No commentary.<br>If the abstract is missing or too vague, set fields to null and include a reason in "notes".</p>
<p><strong>User:</strong></p>
<p>Extract structured fields from this paper.</p>
<p>TITLE: {{title}}<br>YEAR: {{year}}<br>CITATIONS: {{citations}}<br>ABSTRACT: {{abstract}}</p>
<p>Return JSON with keys:<br>research_question (string|null)<br>methodology (string|null)<br>key_findings (array of strings)<br>limitations (array of strings)<br>notes (string)</p>
<p>Model settings: keep temperature low (around 0.2–0.3) and keep responses short and structured.</p>
<h3 id="heading-batch-processing-to-avoid-timeouts">Batch processing to avoid timeouts</h3>
<p>Instead of sending 50 papers at once, process them in batches (for example, 10). This reduces latency spikes, failure blast radius, and cost surprises. Smaller batches also make it easier to retry only the failing chunk rather than re-running everything.</p>
<h2 id="heading-stage-5-scoring-and-synthesis">Stage 5: Scoring and Synthesis</h2>
<p>Not every retrieved paper is worth your time. Without scoring, your pipeline becomes a firehose: you’ve automated collection, but you still have to manually decide what to read. Scoring is what turns “a big list of results” into a shortlist you can trust.</p>
<p>I recommend computing two signals:</p>
<ul>
<li><p><strong>Relevance</strong>: Is this actually about your research question?</p>
</li>
<li><p><strong>Quality/priority</strong>: If it’s relevant, is it worth reading first?</p>
</li>
</ul>
<p>For <strong>relevance</strong>, keep it simple and explainable. Count keyword hits in the title and abstract (and optionally in extracted <code>key_findings</code>). Title matches should be weighted higher because titles are deliberately compact summaries. Abstract hits are useful too, but cap them so long abstracts don’t dominate the score.</p>
<p>For <strong>quality/priority</strong>, use lightweight metadata you already have. Recency is a strong signal in fast-moving areas, and citations can help, but they should be treated as a weak signal (and capped) so newer high-value papers aren’t unfairly penalised.</p>
<p>A solid first scoring model is: add a title bonus, add a capped abstract bonus, add a capped citations bonus, and add a small recency bonus for papers from the last two years. Then filter using the <code>relevance_threshold</code> results from Stage 1. The advantage of this approach is that it’s easy to debug and tune: you can always explain why a paper passed or failed.</p>
<p>Once you’ve filtered down to your “gold” set, synthesis becomes safer and more useful. Write one row per accepted paper to Google Sheets, then generate a daily/weekly HTML summary (for example, top 5 papers with 1–2 key findings each) and include links so you can verify quickly.</p>
<h2 id="heading-beginner-friendly-evals-retrieval-and-extraction-qa">Beginner-Friendly Evals: Retrieval and Extraction QA</h2>
<p>AI workflows regress silently. A prompt tweak, a model update, or an API schema change can break extraction without throwing an obvious error. Adding lightweight evals is the difference between “it worked last week” and “it’s reliable.”</p>
<p>The goal here isn’t to build a full evaluation framework. It’s to add small, cheap checks that catch the most common failure modes:</p>
<ul>
<li><p>Are collectors still returning results?</p>
</li>
<li><p>Are we actually removing duplicates?</p>
</li>
<li><p>Is the LLM returning valid JSON with the keys we require?</p>
</li>
</ul>
<h3 id="heading-what-it-looks-like-in-n8n-a-concrete-example">What it looks like in n8n (a concrete example)</h3>
<p>A simple implementation is to add an <strong>“Assertions” Code node</strong> immediately after your extraction step, plus (optionally) another one after normalisation/deduplication.</p>
<p>At a high level, the workflow section looks like:</p>
<ol>
<li><p>Collectors (parallel HTTP Request nodes)</p>
</li>
<li><p>Merge results</p>
</li>
<li><p>Normalise + dedupe (Code node)</p>
</li>
<li><p>Split in Batches (optional)</p>
</li>
<li><p>LLM extraction (Groq/OpenAI-compatible node)</p>
</li>
<li><p><strong>Assertions (Code node)</strong></p>
</li>
<li><p>If node (pass/fail)</p>
</li>
<li><p>Delivery (Sheets + email)</p>
</li>
</ol>
<h3 id="heading-example-assertions-code-node-after-extraction">Example: Assertions code node after extraction</h3>
<p>This code node assumes each item is a paper with:</p>
<ul>
<li><p><code>title</code>, <code>abstract</code> in the normalised fields, and</p>
</li>
<li><p>an <code>extraction</code> field (or whatever you name it) containing the LLM response as an object or JSON string.</p>
</li>
</ul>
<p>Adapt the field name to match your actual node output, but the pattern is the same: parse, validate required keys, compute percentages, then decide whether to fail or warn.</p>
<pre><code class="language-javascript">const items = $input.all();

let total = items.length;
let withTitle = 0;
let withAbstract = 0;

let parseOk = 0;
let schemaOk = 0;

const requiredKeys = [
  "research_question",
  "methodology",
  "key_findings",
  "limitations",
  "notes",
];

const failures = [];

for (let i = 0; i &lt; items.length; i++) {
  const p = items[i].json;

  if (p.title &amp;&amp; String(p.title).trim().length &gt; 0) withTitle++;
  if (p.abstract &amp;&amp; String(p.abstract).trim().length &gt; 0) withAbstract++;

  // Adjust this depending on where you store the model output:
  const raw = p.extraction ?? p.llm ?? p.model_output;

  let obj = null;
  try {
    obj = typeof raw === "string" ? JSON.parse(raw) : raw;
    parseOk++;
  } catch (e) {
    failures.push({ index: i, title: p.title || null, reason: "JSON parse failed" });
    continue;
  }

  const hasAllKeys = requiredKeys.every(k =&gt; Object.prototype.hasOwnProperty.call(obj, k));
  if (!hasAllKeys) {
    failures.push({ index: i, title: p.title || null, reason: "Missing required keys" });
    continue;
  }

  // Optional: ensure arrays are arrays
  const arraysOk = Array.isArray(obj.key_findings) &amp;&amp; Array.isArray(obj.limitations);
  if (!arraysOk) {
    failures.push({ index: i, title: p.title || null, reason: "key_findings/limitations not arrays" });
    continue;
  }

  schemaOk++;
}

const pct = (n) =&gt; (total === 0 ? 0 : Math.round((n / total) * 100));

const report = {
  total_papers: total,
  pct_with_title: pct(withTitle),
  pct_with_abstract: pct(withAbstract),
  pct_extraction_json_parse_ok: pct(parseOk),
  pct_extraction_schema_ok: pct(schemaOk),
  failures_sample: failures.slice(0, 5),
};

// Decide pass/fail thresholds
const HARD_FAIL_PARSE_BELOW = 90;
const HARD_FAIL_SCHEMA_BELOW = 85;

const shouldFail =
  report.pct_extraction_json_parse_ok &lt; HARD_FAIL_PARSE_BELOW ||
  report.pct_extraction_schema_ok &lt; HARD_FAIL_SCHEMA_BELOW;

return [
  {
    json: {
      eval_report: report,
      shouldFail,
    },
  },
];
</code></pre>
<p>Then add an <strong>If node</strong>:</p>
<ul>
<li><p>If <code>shouldFail</code> is true, then route to an “Alert/Stop” branch (Slack/email/log) and optionally stop the workflow.</p>
</li>
<li><p>If false, then continue to the delivery stage.</p>
</li>
</ul>
<p>This is the automation equivalent of unit tests: small, cheap, and extremely effective. It also gives you a concrete paper trail when something changes upstream.</p>
<h2 id="heading-key-learnings-and-error-handling">Key Learnings and Error Handling</h2>
<p>Building this automation taught me that the best workflows are designed for failure.</p>
<p>First, error resilience is not optional. Never let one failing API crash the workflow. Use “Continue On Fail” on your HTTP nodes, merge partial results, and log which sources failed in your final report so you can debug without losing an entire run.</p>
<p>Second, batching is your friend. Process papers in batches (often 5–15) to reduce timeouts and cost spikes. Keep LLM payloads small and focused on what you actually need (metadata + abstract), and retry transient failures once rather than repeatedly hammering the model or API.</p>
<p>Third, structured prompting is what makes AI reliable in automation. A strict JSON schema is the difference between a workflow that runs unattended and one that breaks randomly. Keep temperature low, enforce the schema in the system prompt, and validate everything downstream with simple parse-and-assert checks.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>A good research pipeline doesn’t just retrieve papers – it turns scattered results into a consistent, deduplicated, scored, and review-ready shortlist you can trust.</p>
<p>By treating your n8n workflow like software modular stages, strict contracts between steps, and lightweight eval checks, you can reduce hours of manual literature review into a fast, repeatable process that survives real-world API failures and model quirks.</p>
<p>If you build this with good defaults (failure isolation, batching, normalisation, strict JSON extraction, and simple scoring), you end up with something you can run daily or weekly and actually rely on without the manual fatigue.</p>
<h3 id="heading-about-me">About Me</h3>
<p>I am Chidozie Managwu, an award-winning AI Product Architect and founder focused on helping global tech talent build real, production-ready skills. I contribute to global AI initiatives as a GAFAI Delegate and lead the AI Titans Network, a community for developers learning how to ship AI products.</p>
<p>My work has been recognised with the Global Tech Hero award and featured on platforms like HackerNoon.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Storyteller: A Medium For Guiding Others Through Code ]]>
                </title>
                <description>
                    <![CDATA[ As a computer science instructor, I have long wished that there was a better way to guide others through my code. When I was first learning to program, I was a big fan of traditional programming books ]]>
                </description>
                <link>https://www.freecodecamp.org/news/storyteller-a-medium-for-guiding-others-through-code/</link>
                <guid isPermaLink="false">69a23fd4d4053a09f35c3d3e</guid>
                
                    <category>
                        <![CDATA[ software development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ coding ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ code playbacks ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Mark Mahoney ]]>
                </dc:creator>
                <pubDate>Sat, 28 Feb 2026 01:07:32 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5fc16e412cae9c5b190b6cdd/902c2299-ea98-4136-8ee8-36668f0c08ee.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>As a computer science instructor, I have long wished that there was a better way to guide others through my code. When I was first learning to program, I was a big fan of traditional programming books. I have shelves and shelves of 800+ page books covering different programming languages and technologies.</p>
<p>I have known for a while now that most learners today don't share my love of big thick books, and to be honest, I rarely read those books in their entirety. Those big books often had a lot more exposition about the code than was probably needed. As a book buyer I wanted to make sure that I was getting my money's worth so the thicker they were, the better. It is much more common these days for learners to consume blog based tutorials and videos.</p>
<p>If you're learning to code right now, you've probably experienced the frustration of these formats too. I want to share something I've been working on that might help.</p>
<h2 id="heading-blogs-and-videos"><strong>Blogs and Videos</strong></h2>
<p>Blog-style tutorials mix code and the explanation of it in a top-to-bottom fashion. Scrolling through these web-based explanations feels familiar and one can copy and paste with ease. However, linking the explanation of the code and the code itself has always been less than ideal. Often I find myself jumping around the blog post wishing I could see the entire code example while working through the explanation. Instead, I am only able to see small parts of the code and it is challenging to see how those parts relate to other parts.</p>
<p>Video tutorials are very popular these days. They solve some of the problems associated with blog-style tutorials. Videos are great because you get two streams of information: the author's audible narrative and the code being written. A viewer can focus on the two streams simultaneously. However, videos have some problems too.</p>
<h3 id="heading-viewing-videos"><strong>Viewing Videos</strong></h3>
<p>From the perspective of the viewer, videos are hard to search through and are not useful as a copy and paste source or a code reference. More importantly, though, they discourage the viewer from taking their time and reflecting on the material. Often, when I am viewing a video tutorial I don't pause and let concepts sink in before the video moves on. Yes, I could be more disciplined and pause and rewind more often but usually I don't.</p>
<h3 id="heading-making-videos"><strong>Making videos</strong></h3>
<p>From the perspective of the video creator, it is clear that not all code being developed is interesting to watch. Some of it is not really worth showing the viewer. Not all video creators can keep the narrative interesting the whole time.</p>
<p>I know I struggle with the 'performance' aspect of making videos (you won't find me coding on Twitch anytime soon). Many times after I am done making a video, as I review it, I wish I had mentioned something that I forgot. It is hard to go back and edit the video without scrapping it and starting over.</p>
<h2 id="heading-storyteller"><strong>Storyteller</strong></h2>
<p>I have created a new medium to guide viewers through code examples. It combines the best of books, blog posts, and videos. This new medium allows a developer to write code using a top-notch editor (Visual Studio Code) and then replay the development of that code in the browser.</p>
<p>The author can add comments at important points in the evolution of the code. The comments can include text, hand drawn pictures, screenshots, and audio and video recordings. This allows the author to add visualizations that we have in our heads but don't make it into the code itself. The tool is called <a href="https://github.com/markm208/storyteller">Storyteller</a>.</p>
<img src="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/67df75cfc82238bba0f330b3/82dcb5c8-999f-432f-bd60-adcb3d8b9889.png" alt="82dcb5c8-999f-432f-bd60-adcb3d8b9889" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Here are a few examples of a 'playback':</p>
<ul>
<li><p><a href="https://playbackpress.com/books/pybook/chapter/2/10">Enlarging a Picture (Python)</a></p>
</li>
<li><p><a href="https://playbackpress.com/books/cppbook/chapter/8/8">Dynamic Variables and Pointers (C++)</a></p>
</li>
</ul>
<p>These work best on a big screen. If you are viewing a playback on a small screen you can view it in 'blog' mode (there is button in the top right to switch from 'code' mode to 'blog' mode).</p>
<p>I have created groups of these guided code walk-throughs to help me teach different topics to my students. These are all free and hosted on a website I created called <a href="https://playbackpress.com/books">Playback Press</a>. Here are some of the 'books' I have created so far:</p>
<ul>
<li><p><a href="https://playbackpress.com/books/cppbook/">An Animated Introduction to Programming in C++</a></p>
</li>
<li><p><a href="https://playbackpress.com/books/pybook/">An Animated Introduction to Programming with Python</a></p>
</li>
<li><p><a href="https://playbackpress.com/books/webdevbook/">An Introduction to Web Development from Back to Front</a></p>
</li>
<li><p><a href="https://playbackpress.com/books/cljbook/">An Animated Introduction to Clojure</a></p>
</li>
<li><p><a href="https://playbackpress.com/books/exbook/">An Animated Introduction to Elixir</a></p>
</li>
<li><p><a href="https://playbackpress.com/books/sqlbook/">Database Design and SQL for Beginners</a></p>
</li>
<li><p><a href="https://playbackpress.com/books/flutterbook/">Mobile App Development with Dart and Flutter</a></p>
</li>
<li><p><a href="https://playbackpress.com/books/patternsbook/">OO Design Patterns with Java</a></p>
</li>
</ul>
<p>I usually assign these as readings in my classes instead of using expensive textbooks. It is a lot easier for me to write several programs than it is to find a perfect textbook.</p>
<p>I also use them for in-class demos instead of writing code live. This makes code demos flow much faster and smoother. If I make an interesting mistake while preparing the code I can still highlight it with a comment. If I make an uninteresting or embarrassing mistake I can just ignore it and the students won't focus on it.</p>
<h3 id="heading-the-advantages-of-code-playbacks"><strong>The Advantages of Code Playbacks:</strong></h3>
<ul>
<li><p>The primary focus is on the code. It is always visible and easy to search and navigate.</p>
</li>
<li><p>Since the code is so accessible, the explanation of it tends to be short and concise.</p>
</li>
<li><p>The narrative can include whiteboard style drawings, screenshots, or videos of running code in addition to a text explanation.</p>
</li>
<li><p>As an author, I can review the code several times and add/edit comments each time I go through it. I don't have to give a perfect performance like I do with a video.</p>
</li>
<li><p>Comment points highlight when the author wants the viewer to take a moment to really think about the code and reflect on it. The playback only moves forward when the viewer is ready.</p>
</li>
<li><p>The code mentioned in a comment can be highlighted so the viewer knows exactly where they should be looking.</p>
</li>
<li><p>The code can be downloaded at any point in the playback. Then a viewer can run it, change it, and add to it.</p>
</li>
<li><p>The tool is a language independent editor plug-in and can be used to describe programs in any language.</p>
</li>
<li><p>Viewers only need a web browser to go through a playback.</p>
</li>
</ul>
<p>Recently, I've been exploring how to make playbacks even more useful for learners.</p>
<h2 id="heading-ai-as-an-infinitely-patient-tutor"><strong>AI as an Infinitely Patient Tutor</strong></h2>
<p>I have extended code playbacks to include an AI tutor. One thing I've learned in my years of teaching is that students often hesitate to ask questions. They worry about looking foolish, or they don't want to slow down the class, or they simply can't articulate what's confusing them.</p>
<p>What if every student had access to a patient tutor who never got frustrated with repeated questions and could explain concepts in multiple ways until something clicked?</p>
<p>I've integrated AI directly into the playback experience. As students work through a playback, they can ask questions about anything they're seeing. This might be a specific line of code, a concept I mentioned in a comment, or how something connects to material from earlier in the playback. The AI has full context. It can see the code, it understands where the student is in the playback, and it can provide explanations tailored to that exact moment. The AI is right there <em>with</em> the student, looking at the same code, understanding the same context.</p>
<div class="embed-wrapper"><iframe width="560" height="315" src="https://www.youtube.com/embed/WAPql5KZFR4?si=jFnCqidSTtfaZA4e" frameborder="0" allowfullscreen="" title="Embedded content" loading="lazy"></iframe></div>

<p>The AI can also generate self-grading multiple choice questions based on the code and comments in a playback. These low-stakes quizzes make the learning experience more engaging and help learners check their understanding as they go.</p>
<p>Let me be clear: the AI doesn't replace me as an instructor. I still create the playbacks. I still decide what concepts to cover, what order to present them, and what examples best illustrate the ideas. The AI is an extension of my teaching, not a replacement for it.</p>
<p>Note: The AI features are available to registered users on <a href="https://playbackpress.com/books">Playback Press</a>. Registration is free but logging in is required to access the AI tutor. If you want to see what this feels like, try one of the playbacks linked above and ask the AI a question about what you're seeing.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>My goal has always been to help people learn to code. Books gave us depth but demanded commitment. Blogs gave us accessibility but fragmented the code. Videos gave us narrative but took away control. Playbacks keep the code front and center while letting learners move at their own pace and reflect when they need to. Adding AI doesn't change that philosophy, it just means there's always someone available to answer questions. Together, they get closer to the experience of having an expert sit beside you and walk you through a program. That's what I've been trying to build, and I think we're getting there.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Optimize PySpark Jobs: Real-World Scenarios for Understanding Logical Plans ]]>
                </title>
                <description>
                    <![CDATA[ In the world of big data, performance isn't just about bigger clusters – it's about smarter code. Spark is deceptively simple to write but notoriously difficult to optimize, because what you write isn't what Spark executes. Between your transformatio... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-optimize-pyspark-jobs-handbook/</link>
                <guid isPermaLink="false">69851d7be613661950e00d8f</guid>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data-engineering ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analysis ]]>
                    </category>
                
                    <category>
                        <![CDATA[ spark ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ PySpark ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS Glue ]]>
                    </category>
                
                    <category>
                        <![CDATA[ handbook ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Sameer Shukla ]]>
                </dc:creator>
                <pubDate>Thu, 05 Feb 2026 22:45:15 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1770331493095/d569e168-d3ba-40e0-a500-7f682bbef693.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In the world of big data, performance isn't just about bigger clusters – it's about smarter code. Spark is deceptively simple to write but notoriously difficult to optimize, because what you write isn't what Spark executes. Between your transformations and actual computation lies an invisible translation layer – the logical plan – that determines whether your job runs in minutes or hours.</p>
<p>Most engineers never look at this layer, which is why they spend days tuning configurations that don't address the real problem: inefficient transformations that generate bloated plans.</p>
<p>This handbook teaches you to read, interpret, and control those plans, transforming you from someone who writes PySpark code into someone who architects efficient data pipelines with precision and confidence.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-background-information">Background Information</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-1-the-spark-mindset-why-plans-matter">Chapter 1: The Spark Mindset: Why Plans Matter</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-2-understanding-the-spark-execution-flow">Chapter 2: Understanding the Spark Execution Flow</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-3-reading-and-debugging-plans-like-a-pro">Chapter 3: Reading and Debugging Plans Like a Pro</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-4-writing-efficient-transformations">Chapter 4: Writing Efficient Transformations</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-scenario-1-rename-in-one-pass-withcolumnrenamed-vs-todf">Scenario 1: Rename in One Pass: withColumnRenamed() vs toDF()</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-scenario-2-reusing-expressions">Scenario 2: Reusing expressions</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-scenario-3-batch-column-ops">Scenario 3: Batch column ops</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-scenario-4-early-filter-vs-late-filter">Scenario 4: Early Filter vs Late Filter</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-scenario-5-column-pruning">Scenario 5: Column Pruning</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-scenario-6-filter-pushdown-vs-full-scan">Scenario 6: Filter pushdown vs full scan</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-scenario-7-de-duplicate-right">Scenario 7: De-duplicate right</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-scenario-8-count-smarter">Scenario 8: Count Smarter</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-scenario-9-window-wisely">Scenario 9: Window wisely</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-scenario-10-incremental-aggregations-with-cache-and-persist">Scenario 10: Incremental Aggregations with Cache and Persist</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-scenario-11-reduce-shuffles">Scenario 11: Reduce shuffles</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-scenario-12-know-your-shuffle-triggers">Scenario 12: Know Your Shuffle Triggers</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-scenario-13-tune-parallelism-shuffle-partitions-amp-aqe">Scenario 13: Tune Parallelism: Shuffle Partitions &amp; AQE</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-scenario-14-handle-skew-smartly">Scenario 14: Handle Skew Smartly</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-scenario-15-sort-efficiently-orderby-vs-sortwithinpartitions">Scenario 15: Sort Efficiently (orderBy vs sortWithinPartitions)</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-background-information">Background Information</h2>
<h3 id="heading-what-this-handbook-is-really-about">What This Handbook is Really About</h3>
<p>This is not a tutorial about Spark internals, cluster tuning, or PySpark syntax or APIs.</p>
<p>This is a handbook about writing PySpark code that generates efficient logical plans.</p>
<p>Because when your code produces clean, optimized plans, Spark pushes filters correctly, shuffles reduce instead of multiply, projections stay shallow, and the DAG (<a target="_blank" href="https://en.wikipedia.org/wiki/Directed_acyclic_graph">Directed Acyclic Graph</a>) becomes predictable, lean, and fast.</p>
<p>When your code produces messy plans, Spark shuffles more than necessary, and projects pile up into deep, expensive stacks. Filters arrive late instead of early, joins explode into wide, slow operations, and the DAG becomes tangled and expensive.</p>
<p>The difference between a fast job and a slow job is not “faster hardware.” It’s the structure of the plan Spark generates from your code. This handbook teaches you to shape that plan deliberately through scenarios.</p>
<h3 id="heading-who-this-handbook-is-for">Who This Handbook Is For</h3>
<p>This handbook is written for:</p>
<ul>
<li><p><strong>Data engineers</strong> building production ETL pipelines who want to move beyond trial-and-error tuning and understand <em>why</em> jobs perform the way they do</p>
</li>
<li><p><strong>Analytics engineers</strong> working with large datasets in Databricks, EMR, or Glue who need to optimize Spark jobs but don't have time for thousand-page reference manuals</p>
</li>
<li><p><strong>Data scientists</strong> transitioning from pandas to PySpark who find themselves writing code that technically runs but takes forever</p>
</li>
<li><p><strong>Anyone</strong> who has stared at the Spark UI, seen mysterious "Exchange" nodes in the DAG, and wondered, <em>"Why is this shuffling so much data?"</em></p>
</li>
</ul>
<p>You should already be comfortable writing basic PySpark code , creating DataFrames, applying transformations, running aggregations. This handbookbook won't teach you Spark syntax. Instead, it teaches you how to write transformations that work <em>with</em> the optimizer, not against it.</p>
<h3 id="heading-how-this-handbook-is-structured">How This Handbook Is Structured</h3>
<p>We’ll start with foundations, then move on to real-world scenarios.</p>
<p>Chapters 1-3 build your mental model. You'll learn what logical plans are, how they connect to physical execution, and how to read the plan output that Spark shows you. These chapters are short and focused – just enough theory to make the practical scenarios meaningful.</p>
<p>Chapter 4 is the heart of the handbook. It contains 15 real-world scenarios, organized by category. Each scenario shows you a common performance problem, explains what's happening in the logical plan, and demonstrates the better approach. You'll see before-and-after code, plan comparisons, and clear explanations of why one approach outperforms another.</p>
<h3 id="heading-what-youll-learn">What You'll Learn</h3>
<p>By the end of this handbook, you'll be able to:</p>
<ul>
<li><p>Read and interpret Spark's logical, optimized, and physical plans</p>
</li>
<li><p>Identify expensive operations before running your code</p>
</li>
<li><p>Restructure transformations to minimize shuffles</p>
</li>
<li><p>Choose the right join strategies for your data</p>
</li>
<li><p>Avoid common pitfalls that cause memory issues and slow performance</p>
</li>
<li><p>Debug production issues by examining execution plans</p>
</li>
</ul>
<p>More importantly, you'll develop a Spark mindset, an intuition for how your code translates to cluster operations. You'll stop writing code that "should work" and start writing code that you <em>know</em> will work efficiently.</p>
<h3 id="heading-technical-prerequisites">Technical Prerequisites</h3>
<p>I assume that you’re familiar with the following concepts before proceeding:</p>
<ol>
<li><p>Python fundamentals</p>
</li>
<li><p>PySpark basics</p>
<ul>
<li><p>Creating DataFrames and reading data from files</p>
</li>
<li><p>Basic DataFrame operations: select, filter, withColumn, groupBy, join</p>
</li>
<li><p>Writing DataFrames back to storage</p>
</li>
</ul>
</li>
<li><p>Basic Spark concepts</p>
<ul>
<li><p>Basic understanding of Spark applications, jobs, stages, and tasks</p>
</li>
<li><p>Basic understanding of the difference between transformations and actions</p>
</li>
<li><p>Understanding. of partitions and shuffles</p>
</li>
</ul>
</li>
<li><p>AWS Glue (Good to have)</p>
</li>
</ol>
<h2 id="heading-chapter-1-the-spark-mindset-why-plans-matter">Chapter 1: The Spark Mindset: Why Plans Matter</h2>
<p>This chapter isn’t about Spark theory or internals. It’s about understanding Spark Plans, and seeing Spark the way the engine sees your code. Once you understand how Spark builds and optimizes a logical plan, optimization stops being trial and error and becomes intentional engineering.</p>
<p>Behind every simple transformation, Spark quietly redraws its internal blueprint. Every transformation you write from "<em>withColumn</em>" to join changes that plan. When the plan is efficient, Spark flies, but when it’s messy, Spark crawls.</p>
<h3 id="heading-the-invisible-layer-behind-every-transformation">The Invisible Layer Behind Every Transformation</h3>
<p>When you write PySpark code, it feels like you’re chaining operations step by step. In reality, Spark isn’t executing those lines. It’s quietly building a blueprint, a logical plan describing <em>what</em> to do, not <em>how</em>.</p>
<p>Once this plan is built, the Catalyst Optimizer analyzes it, rearranges operations, eliminates redundancies, and produces an optimized plan. Catalyst is Spark’s query optimization engine.</p>
<p>Every DataFrame or SQL operation we write, such as select, filter, join, groupBy, is first converted into a logical plan. Catalyst then analyzes and transforms this plan using a set of rule-based optimizations, such as predicate pushdown, column pruning, constant folding, and join reordering. The result is an optimized logical plan, which Spark later converts into a physical execution plan. Finally, Spark translates that into a physical plan of what your cluster actually runs. This invisible planning layer decides the job’s performance more than any configuration setting.</p>
<h3 id="heading-from-logical-to-optimized-to-physical-plans">From Logical to Optimized to Physical Plans</h3>
<p>When you run <code>df.explain(True)</code>, Spark actually shows you four stages of reasoning:</p>
<h4 id="heading-1-logical-plan">1. Logical Plan</h4>
<p>The logical plan is the first stage where the initial translation of the code results in a tree structure that shows what operations need to happen, without worrying about how to execute them efficiently. It’s a blueprint of the query’s logic before any optimization or physical planning occurs.</p>
<p>This:</p>
<pre><code class="lang-python">df.filter(col(<span class="hljs-string">'age'</span>) &gt; <span class="hljs-number">25</span>) \
  .select(<span class="hljs-string">'firstname'</span>, <span class="hljs-string">'country'</span>) \
  .groupby(<span class="hljs-string">'country'</span>) \
  .count() \
  .explain(<span class="hljs-literal">True</span>)
</code></pre>
<p>results in the following logical plan:</p>
<pre><code class="lang-python">== Parsed Logical Plan ==
<span class="hljs-string">'Aggregate ['</span>country], [<span class="hljs-string">'country, '</span>count(<span class="hljs-number">1</span>) AS count<span class="hljs-comment">#108]</span>
+- Project [firstname<span class="hljs-comment">#95, country#97]</span>
   +- Filter (age<span class="hljs-comment">#96L &gt; cast(25 as bigint))</span>
      +- LogicalRDD [firstname<span class="hljs-comment">#95, age#96L, country#97], false</span>
</code></pre>
<h4 id="heading-2-analyzed-logical-plan">2. Analyzed Logical Plan</h4>
<p>The analyzed logical plan is the second stage in Spark’s query optimization. In this stage, Spark validates the query by checking if tables and columns actually exist in the Catalog and resolving all references. It converts all the unresolved logical plans into a resolved one with correct data types and column bindings before optimization.</p>
<h4 id="heading-3-optimized-logical-plan">3. Optimized Logical Plan</h4>
<p>The optimized logical plan is where Spark's Catalyst optimizer improves the logical plan by applying smart rules like filtering data early, removing unnecessary columns, and combining operations to reduce computation. It's the smarter, more efficient version of your original plan that will execute faster and use fewer resources.</p>
<p>Let’s understand using a simple code example:</p>
<pre><code class="lang-python">df.select(<span class="hljs-string">'firstname'</span>, <span class="hljs-string">'country'</span>) \
  .groupby(<span class="hljs-string">'country'</span>) \
  .count() \
  .filter(col(<span class="hljs-string">'country'</span>) == <span class="hljs-string">'USA'</span>) \
  .explain(<span class="hljs-literal">True</span>)
</code></pre>
<p>Here’s the parsed logical plan:</p>
<pre><code class="lang-python">== Parsed Logical Plan ==
<span class="hljs-string">'Filter '</span>`=`(<span class="hljs-string">'country, USA)
+- Aggregate [country#97], [country#97, count(1) AS count#122L]
   +- Project [firstname#95, country#97]
      +- LogicalRDD [firstname#95, age#96L, country#97], false</span>
</code></pre>
<p>What this means:</p>
<ul>
<li><p>Spark first projects firstname and country</p>
</li>
<li><p>Then aggregates by country</p>
</li>
<li><p>Then applies the filter country = 'USA' <strong>after</strong> aggregation</p>
</li>
</ul>
<p>(because that’s how you wrote it).</p>
<p>Here’s the optimized logical plan:</p>
<pre><code class="lang-python">== Optimized Logical Plan ==
Aggregate [country<span class="hljs-comment">#97], [country#97, count(1) AS count#122L]</span>
+- Project [country<span class="hljs-comment">#97]</span>
   +- Filter (isnotnull(country<span class="hljs-comment">#97) AND (country#97 = USA))</span>
      +- LogicalRDD [firstname<span class="hljs-comment">#95, age#96L, country#97], false</span>
</code></pre>
<p>Key improvements Catalyst applied:</p>
<ul>
<li><p>Filter pushdown: The filter country = 'USA' is pushed below the aggregation, so Spark only groups U.S. rows.</p>
</li>
<li><p>Column pruning: “firstname” is automatically removed because it’s never used in the final output.</p>
</li>
<li><p>Cleaner projection: Intermediate columns are dropped early, reducing I/O and in-memory footprint.</p>
</li>
</ul>
<h4 id="heading-4-physical-plan">4. Physical Plan</h4>
<p>The physical plan is Spark's final execution blueprint that shows exactly how the query will run: which specific algorithms to use, how to distribute work across machines, and the order of low-level operations. It's the concrete, executable version of the optimized logical plan, translated into actual Spark operations like “ShuffleExchange”, “HashAggregate”, and “FileScan” that will run on your cluster.</p>
<p>Catalyst may, for example:</p>
<ul>
<li><p>Fold constants (col("x") * 1 → col("x"))</p>
</li>
<li><p>Push filters closer to the data source</p>
</li>
<li><p>Replace a regular join with a broadcast join when data fits in memory</p>
</li>
</ul>
<p>Once the physical plan is finalized, Spark’s scheduler converts it into a DAG of stages and tasks that run across the cluster. Understanding that lineage, from your code → plan → DAG, is what separates fast jobs from slow ones.</p>
<h3 id="heading-how-to-read-a-logical-plan">How to Read a Logical Plan</h3>
<p>A logical plan prints as a tree: the bottom is your data source, and each higher node represents a transformation.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Node</strong></td><td><strong>Meaning</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Relation / LogicalRDD</td><td>Data source, the initial DataFrame</td></tr>
<tr>
<td>Project</td><td>Column selection and transformation (select, withColumn)</td></tr>
<tr>
<td>Filter</td><td>Row filtering based on conditions (where, filter)</td></tr>
<tr>
<td>Join</td><td>Combining two DataFrames (join, union)</td></tr>
<tr>
<td>Aggregate</td><td>GroupBy and aggregation operations (groupBy, agg)</td></tr>
<tr>
<td>Exchange</td><td>Shuffle operation (data redistribution across partitions)</td></tr>
<tr>
<td>Sort</td><td>Ordering data (orderBy, sort)</td></tr>
</tbody>
</table>
</div><p>Each node represents a transformation. Execution flows from the bottom up. Let's understand with a basic example:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> pyspark.sql <span class="hljs-keyword">import</span> SparkSession
<span class="hljs-keyword">from</span> pyspark.sql.functions <span class="hljs-keyword">import</span> *
<span class="hljs-keyword">from</span> pyspark.sql.types <span class="hljs-keyword">import</span> *

spark = SparkSession.builder.appName(<span class="hljs-string">"Practice"</span>).getOrCreate()

employees_data = [
    (<span class="hljs-number">1</span>, <span class="hljs-string">"John"</span>, <span class="hljs-string">"Doe"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">80000</span>, <span class="hljs-number">28</span>, <span class="hljs-string">"2020-01-15"</span>, <span class="hljs-string">"USA"</span>),
    (<span class="hljs-number">2</span>, <span class="hljs-string">"Jane"</span>, <span class="hljs-string">"Smith"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">85000</span>, <span class="hljs-number">32</span>, <span class="hljs-string">"2019-03-20"</span>, <span class="hljs-string">"USA"</span>),
    (<span class="hljs-number">3</span>, <span class="hljs-string">"Alice"</span>, <span class="hljs-string">"Johnson"</span>, <span class="hljs-string">"Sales"</span>, <span class="hljs-number">60000</span>, <span class="hljs-number">25</span>, <span class="hljs-string">"2021-06-10"</span>, <span class="hljs-string">"UK"</span>),
    (<span class="hljs-number">4</span>, <span class="hljs-string">"Bob"</span>, <span class="hljs-string">"Brown"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">90000</span>, <span class="hljs-number">35</span>, <span class="hljs-string">"2018-07-01"</span>, <span class="hljs-string">"USA"</span>),
    (<span class="hljs-number">5</span>, <span class="hljs-string">"Charlie"</span>, <span class="hljs-string">"Wilson"</span>, <span class="hljs-string">"Sales"</span>, <span class="hljs-number">65000</span>, <span class="hljs-number">29</span>, <span class="hljs-string">"2020-11-05"</span>, <span class="hljs-string">"UK"</span>),
    (<span class="hljs-number">6</span>, <span class="hljs-string">"David"</span>, <span class="hljs-string">"Lee"</span>, <span class="hljs-string">"HR"</span>, <span class="hljs-number">55000</span>, <span class="hljs-number">27</span>, <span class="hljs-string">"2021-01-20"</span>, <span class="hljs-string">"USA"</span>),
    (<span class="hljs-number">7</span>, <span class="hljs-string">"Eve"</span>, <span class="hljs-string">"Davis"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">95000</span>, <span class="hljs-number">40</span>, <span class="hljs-string">"2017-04-12"</span>, <span class="hljs-string">"Canada"</span>),
    (<span class="hljs-number">8</span>, <span class="hljs-string">"Frank"</span>, <span class="hljs-string">"Miller"</span>, <span class="hljs-string">"Sales"</span>, <span class="hljs-number">70000</span>, <span class="hljs-number">33</span>, <span class="hljs-string">"2019-09-25"</span>, <span class="hljs-string">"UK"</span>),
    (<span class="hljs-number">9</span>, <span class="hljs-string">"Grace"</span>, <span class="hljs-string">"Taylor"</span>, <span class="hljs-string">"HR"</span>, <span class="hljs-number">58000</span>, <span class="hljs-number">26</span>, <span class="hljs-string">"2021-08-15"</span>, <span class="hljs-string">"Canada"</span>),
    (<span class="hljs-number">10</span>, <span class="hljs-string">"Henry"</span>, <span class="hljs-string">"Anderson"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">88000</span>, <span class="hljs-number">31</span>, <span class="hljs-string">"2020-02-28"</span>, <span class="hljs-string">"USA"</span>)
]

df = spark.createDataFrame(employees_data,  
    [<span class="hljs-string">"id"</span>, <span class="hljs-string">"firstname"</span>, <span class="hljs-string">"lastname"</span>, <span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>, <span class="hljs-string">"age"</span>, <span class="hljs-string">"hire_date"</span>, <span class="hljs-string">"country"</span>])
</code></pre>
<h4 id="heading-version-a-withcolumn-filter">Version A: withColumn → filter</h4>
<p>In this version, we’re using a derived column "withColumn" and then applying a filter to the dataset. This ordering is logically correct and produces the expected result: it shows how introducing derived columns early affects the logical plan. This example shows what happens when Spark is asked to compute a new column before any rows are eliminated.</p>
<pre><code class="lang-python">df_filtered = df \
.withColumn(<span class="hljs-string">'bonus'</span>, col(<span class="hljs-string">'salary'</span>) * <span class="hljs-number">82</span>) \
.filter(col(<span class="hljs-string">'age'</span>) &gt; <span class="hljs-number">35</span>) \
.explain(<span class="hljs-literal">True</span>)
</code></pre>
<h4 id="heading-parsed-logical-plan-simplified">Parsed Logical Plan (Simplified)</h4>
<pre><code class="lang-python">Filter (age &gt; <span class="hljs-number">35</span>)
└─ Project [*, (salary * <span class="hljs-number">82</span>) AS bonus]
   └─ LogicalRDD [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>So what’s going on here? Execution flows from the bottom up.</p>
<ul>
<li><p>Spark first reads the LogicalRDD.</p>
</li>
<li><p>Then applies the Project node, keeping all columns and adding bonus.</p>
</li>
<li><p>Finally, the Filter removes rows where age ≤ 35.</p>
</li>
</ul>
<p>This means Spark computes the bonus for every employee, even those who are later filtered out. It's harmless here, but costly on millions of rows, more computation, more I/O, more shuffle volume.</p>
<h4 id="heading-version-b-filter-project">Version B: Filter → Project</h4>
<p>In this version, we apply the filter before introducing the derived column. The idea is to show how pushing row-reducing operations earlier allows Catalyst to produce a leaner logical plan. Compared to Version A, this example demonstrates that the same logic, written in a different order, can significantly reduce the amount of work Spark needs to perform.</p>
<pre><code class="lang-python">df_filtered = df \
.filter(col(<span class="hljs-string">'age'</span>) &gt; <span class="hljs-number">35</span>) \
.withColumn(<span class="hljs-string">'bonus'</span>, col(<span class="hljs-string">'salary'</span>) * <span class="hljs-number">82</span>) \
.explain(<span class="hljs-literal">True</span>)
</code></pre>
<h4 id="heading-parsed-logical-plan-simplified-1">Parsed Logical Plan (Simplified)</h4>
<pre><code class="lang-python">Project [*, (salary * <span class="hljs-number">82</span>) AS bonus]

└─ Filter (age &gt; <span class="hljs-number">35</span>)

└─ LogicalRDD [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>So what’s going on here?</p>
<ul>
<li><p>Spark starts from the LogicalRDD.</p>
</li>
<li><p>It immediately applies the Filter, reducing the dataset to only employees with age &gt; 35.</p>
</li>
<li><p>Then the Project node adds the derived column bonus for this smaller subset.</p>
</li>
</ul>
<p>Now the Filter sits below the Project in the plan, cutting data movement and minimizing computation. Spark prunes data first, then derives new columns. This order reduces both the volume of data processed and the amount transferred, leading to a lighter and faster plan.</p>
<h3 id="heading-why-you-should-look-at-the-plan-every-time-by-running-dfexplaintrue">Why You Should Look at the Plan Every Time by running <code>df.explain(True)</code></h3>
<p>This is the quickest way to spot performance issues <em>before</em> they hit production. It shows:</p>
<ul>
<li><p>Whether filters sit in the right place.</p>
</li>
<li><p>How many Project nodes exist (each adds overhead).</p>
</li>
<li><p>Where Exchange nodes appear (these are shuffle boundaries).</p>
</li>
<li><p>If Catalyst pushed filters or rewrote joins as expected.</p>
</li>
</ul>
<p>A quick <code>explain()</code> takes seconds, while debugging a bad shuffle in production takes hours. Run <code>explain()</code> whenever you add or reorder transformations. The plan never lies.</p>
<h4 id="heading-what-spark-does-under-the-hood">What Spark Does Under the Hood</h4>
<p>Catalyst can sometimes reorder simple filters automatically, but once you use UDFs, nested logic, or joins, it often can’t. That’s why the best habit is to write transformations in a way that already makes sense to the optimizer. Filter early, avoid redundant projections, and keep plans as shallow as possible.</p>
<p>Optimizing Spark isn’t about tuning cluster configs – it’s about writing code that yields efficient plans. If your plan shows late filters, too many projections, or multiple Exchange nodes, it’s already explaining why your job will run slow.</p>
<h2 id="heading-chapter-2-understanding-the-spark-execution-flow">Chapter 2: Understanding the Spark Execution Flow</h2>
<p>In Chapter 1, you learned how Spark interprets your transformations into logical plans – blueprints of what the job intends to do.</p>
<p>But Spark doesn't stop there. It must translate those plans into distributed actions across a cluster of executors, coordinate data movement, and handle any failures that may occur.</p>
<p>This chapter reveals what happens when that plan leaves the driver: how Spark breaks your job into stages, tasks, and a directed acyclic graph (DAG) that actually runs.</p>
<p>By the end, you’ll understand why some operations shuffle terabytes while others fly, and how to predict it before execution begins.</p>
<h3 id="heading-from-plans-to-stages-to-tasks">From Plans to Stages to Tasks</h3>
<p>A Spark job evolves through three conceptual layers:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Layer</strong></td><td><strong>What It Represents</strong></td><td><strong>Example View</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Plan</td><td>The optimized logical + physical representation of your query</td><td>Read → Filter → Join → Aggregate</td></tr>
<tr>
<td>Stage</td><td>A contiguous set of operations that can run without shuffling data</td><td>“Map Stage” or “Reduce Stage”</td></tr>
<tr>
<td>Task</td><td>The smallest unit of work, one per partition per stage</td><td>“Process Partition 7 of Stage 3”</td></tr>
</tbody>
</table>
</div><h4 id="heading-the-execution-trigger-actions-vs-transformations">The Execution Trigger: Actions vs Transformations</h4>
<p>Here's the critical distinction that determines when execution actually begins:</p>
<pre><code class="lang-python">df1 = spark.paraquet(<span class="hljs-string">"data.paraquet"</span>)
df2 = spark.filter(col(<span class="hljs-string">"age"</span>) &gt; <span class="hljs-number">25</span>)
df3 = spark.groupby(<span class="hljs-string">"city"</span>).count()
</code></pre>
<p>Nothing executes yet! Spark just builds up the logical plan, adding each transformation as a node in the plan tree. No data is read, no filters run, no shuffles happen.</p>
<h4 id="heading-actions-trigger-execution">Actions Trigger Execution</h4>
<p>Spark transformations are lazy. When a sequence of DataFrame operations is defined, a logical plan is created, but no computation takes place. It’s only when Spark encounters an action, an operation that needs a result to be returned to the driver or written out, that execution takes place.</p>
<p>For example:</p>
<pre><code class="lang-python">result = df3.collect()
</code></pre>
<p>At this stage, Spark materializes the logical plan, applies optimizations, creates a physical plan, and executes the job. Until Spark is asked to <strong>act</strong>, such as collect(), count(), or write(), it’s just describing what it needs to do – but it’s not actually doing it.</p>
<h4 id="heading-the-complete-execution-flow">The Complete Execution Flow</h4>
<p>Spark execution is initiated after the execution of an operation such as collect(). The driver then sends the optimized physical plan to the SparkContext, which is then forwarded to the DAG Scheduler. The physical plan is analyzed to determine shuffle boundaries created by wide operations such as <em>groupBy</em> or <em>orderBy</em>.</p>
<p>The plan is then divided into stages that contain narrow operations. These stages are sent to the Task Scheduler as a TaskSet. Each stage has a single task per partition.</p>
<p>The tasks are then assigned to the cores of the executor based on data locality. The execution of the tasks is then initiated. The execution of the stages is initiated after the completion of the previous stage. The final stage is initiated after the completion of the previous stage. The results of the final stage are then returned to the driver or stored.</p>
<h4 id="heading-what-triggers-a-shuffle">What Triggers a Shuffle</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769457412199/308bc894-66a9-4c01-aae1-9ae42e64d32c.png" alt="Comparison of Spark shuffle behavior before and after groupBy" class="image--center mx-auto" width="1920" height="992" loading="lazy"></p>
<p>A shuffle occurs when Spark needs to redistribute data across partitions, typically because the operation requires grouping, joining, or repartitioning data in a way that can’t be done locally within existing partitions.</p>
<p>Common shuffle triggers:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Operation</strong></td><td><strong>Why it Shuffles</strong></td></tr>
</thead>
<tbody>
<tr>
<td>groupBy(), reduceByKey()</td><td>Data with the same key must co-locate for aggregation</td></tr>
<tr>
<td>join()</td><td>Matching keys may reside in different partitions</td></tr>
<tr>
<td>orderBy() / sort()</td><td>Requires global ordering across all partitions</td></tr>
<tr>
<td>distinct()</td><td>Needs comparison of all values across partitions</td></tr>
<tr>
<td>repartition(n)</td><td>Explicit redistribution to a new number of partitions</td></tr>
</tbody>
</table>
</div><pre><code class="lang-python">df.groupBy(<span class="hljs-string">"user_id”) \
  .agg(sum("</span>amount<span class="hljs-string">"))</span>
</code></pre>
<p>In Stage 1 (Map), each task performs a partial aggregation on its partition and writes a shuffle file to disk. During the shuffle, each executor retrieves these files across the network such that all records with the same hash(user_id) % numPartitions are colocated.</p>
<p>In Stage 2 (Reduce), each task performs a final aggregation on its partitioned data and writes back to disk. Because Spark has tracked this process as a DAG, a failed task can re-read only the affected shuffle files instead of re-computing the entire DAG.</p>
<p>In practice, a healthy job has 2-6 stages. Seeing 20+ stages for such simple logic usually means unnecessary shuffles or bad partitioning.</p>
<h4 id="heading-why-shuffles-create-stage-boundaries">Why Shuffles Create Stage Boundaries</h4>
<p>Shuffles force data to move across the network between executors. Spark cannot continue processing until:</p>
<ul>
<li><p>All tasks in the current stage write their shuffle output to disk</p>
</li>
<li><p>The shuffle data is available for the next stage to read over the network</p>
</li>
</ul>
<p>This dependency creates a natural boundary – so a new stage begins after every shuffle. The DAG Scheduler uses these boundaries to determine where stages must wait for previous stages to complete.</p>
<h4 id="heading-common-performance-bottlenecks">Common Performance Bottlenecks</h4>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Bottleneck Type</strong></td><td><strong>Symptom</strong></td><td><strong>Solution</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Data skew</td><td>Few tasks run much longer</td><td>Use salting, split hot keys, or AQE skew join</td></tr>
<tr>
<td>Small files</td><td>Too many tasks, high overhead</td><td>Coalesce or repartition after read</td></tr>
<tr>
<td>Large shuffle</td><td>High network I/O, spill to disk</td><td>Filter early, broadcast small tables, reduce cardinality</td></tr>
<tr>
<td>Unnecessary stages</td><td>Extra Exchange nodes in plan</td><td>Combine operations, remove redundant repartitions</td></tr>
<tr>
<td>Inefficient file formats</td><td>Slow reads, no predicate pushdown</td><td>Use Parquet or ORC with partitioning</td></tr>
<tr>
<td>Complex data types</td><td>Serialization overhead, large objects</td><td>Use simple types, cache in serialized form</td></tr>
</tbody>
</table>
</div><p>Let’s ground this with a small but realistic pattern using the same employees DataFrame. <strong>Goal:</strong> average salary per department and country, only for employees older than 30.</p>
<p>Naïve approach:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> pyspark.sql.functions <span class="hljs-keyword">import</span> col, when, avg

df_dept_country = df.select(<span class="hljs-string">"department"</span>, <span class="hljs-string">"country"</span>).distinct()

df_result = (
    df.withColumn(
        <span class="hljs-string">"age_group"</span>,
        when(col(<span class="hljs-string">"age"</span>) &lt; <span class="hljs-number">30</span>, <span class="hljs-string">"junior"</span>)
        .when(col(<span class="hljs-string">"age"</span>) &lt; <span class="hljs-number">40</span>, <span class="hljs-string">"mid"</span>)
        .otherwise(<span class="hljs-string">"senior"</span>)
    )
    .join(df_dept_country, [<span class="hljs-string">"department"</span>], <span class="hljs-string">"inner"</span>)
    .groupBy(<span class="hljs-string">"department"</span>, <span class="hljs-string">"country"</span>)
    .agg(avg(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"avg_salary"</span>))
</code></pre>
<p>This looks harmless, but:</p>
<ul>
<li><p>The join on "department" introduces a wide dependency → shuffle #1.</p>
</li>
<li><p>The groupBy("department", "country") introduces another wide dependency → shuffle #2.</p>
</li>
</ul>
<p>So we have two shuffles for what should be a simple aggregation. If you run explain on the df_result, you’ll see two exchange nodes, each marking a shuffle and stage boundary.</p>
<h4 id="heading-optimized-approach">Optimized Approach</h4>
<p>We can do better by filtering early, broadcasting the small dimension (df_dept_country), and keeping only one global shuffle for aggregation.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> pyspark.sql.functions <span class="hljs-keyword">import</span> broadcast

df_dept_country = df.select(<span class="hljs-string">"department"</span>, <span class="hljs-string">"country"</span>).distinct()

df_result_optimized = (
    df.filter(col(<span class="hljs-string">"age"</span>) &gt; <span class="hljs-number">30</span>)
        .join(broadcast(df_dept_country), [<span class="hljs-string">"department"</span>], <span class="hljs-string">"inner"</span>)
        .groupBy(<span class="hljs-string">"department"</span>, <span class="hljs-string">"country"</span>)
        .agg(avg(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"avg_salary"</span>))
)
</code></pre>
<p>What changed:</p>
<ul>
<li><p>filter(col("age") &gt; 30) is narrow and runs before any shuffle.</p>
</li>
<li><p>broadcast(df_dept_country) avoids a shuffle for the join.</p>
</li>
<li><p>Only the groupBy("department", "country") causes a single shuffle.</p>
</li>
</ul>
<p>Now explain shows just one Exchange.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Version</strong></td><td><strong>Shuffles</strong></td><td><strong>Stages</strong></td><td><strong>Notes</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Naïve</td><td>2</td><td>~4 (2 map + 2 reduce)</td><td>Join shuffle + groupBy shuffle = double overhead</td></tr>
<tr>
<td>Optimized</td><td>1</td><td>~2 (1 map + 1 reduce)</td><td>Broadcast join avoids shuffle. Only groupBy shuffles</td></tr>
</tbody>
</table>
</div><h2 id="heading-chapter-3-reading-and-debugging-plans-like-a-pro">Chapter 3: Reading and Debugging Plans Like a Pro</h2>
<p>As explained in Chapter 1, Spark executes transformations based on three levels: the logical plan, the optimized logical plan (Catalyst), and the physical plan. This chapter will expand on this explanation and concentrate on the impact of the logical plan on shuffle and execution performance.</p>
<p>By now, you understand how Spark builds and <em>executes</em> plans. But reading those plans and instantly spotting inefficiencies is the real superpower of a performance-focused data engineer.</p>
<p>Spark’s explain() output isn’t random jargon. It’s a precise log of Spark’s thought process. Once you learn to read it, every optimization becomes obvious.</p>
<h3 id="heading-three-layers-in-spark"><strong>Three Layers in Spark</strong></h3>
<p>As we talked about above, every Spark plan has three key views, printed when you call df.explain(True). Let’s review them now:</p>
<ol>
<li><p>Parsed Logical Plan: The raw intent Spark inferred from your code. It may include unresolved column names or expressions.</p>
</li>
<li><p>Analyzed / Optimized Logical Plan: After Spark applies Catalyst optimizations: constant folding, predicate pushdown, column pruning, and plan rearrangements.</p>
</li>
<li><p>Physical Plan: What your executors actually run: joins, shuffles, exchanges, scans, and code-generated operators.</p>
</li>
</ol>
<p>Each stage narrows the gap between what you <em>asked</em> Spark to do and what Spark decides to do.</p>
<pre><code class="lang-python">df_avg = df.filter(col(<span class="hljs-string">"age"</span>) &gt; <span class="hljs-number">30</span>)
        .groupBy(<span class="hljs-string">"department"</span>)
        .agg(avg(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"avg_salary"</span>))

df_avg.explain(<span class="hljs-literal">True</span>)
</code></pre>
<p><strong>1. Parsed Logical Plan</strong></p>
<pre><code class="lang-python">== Parsed Logical Plan ==
<span class="hljs-string">'Aggregate ['</span>department], [<span class="hljs-string">'department, '</span>avg(<span class="hljs-string">'salary) AS avg_salary#8]
+- Filter (age#5L &gt; cast(30 as bigint))
   +- LogicalRDD [id#0L, firstname#1, lastname#2, department#3, salary#4L, age#5L, hire_date#6, country#7], false</span>
</code></pre>
<p>How to read this</p>
<ul>
<li><p>Bottom → data source (LogicalRDD).</p>
</li>
<li><p>Middle → Filter: Spark hasn’t yet optimized column references.</p>
</li>
<li><p>Top → Aggregate: high-level grouping intent.</p>
</li>
</ul>
<p>At this stage, the plan may include unresolved symbols (like 'department or 'avg('salary)), meaning Spark hasn’t yet validated column existence or data types.</p>
<p><strong>2. Optimized Logical Plan</strong></p>
<pre><code class="lang-python">
== Optimized Logical Plan ==
Aggregate [department<span class="hljs-comment">#3], [department#3, avg(salary#4L) AS avg_salary#8]</span>
+- Project [department<span class="hljs-comment">#3, salary#4L]</span>
   +- Filter (isnotnull(age<span class="hljs-comment">#5L) AND (age#5L &gt; 30))</span>
      +- LogicalRDD [id<span class="hljs-comment">#0L, firstname#1, lastname#2, department#3, salary#4L, age#5L, hire_date#6, country#7], false</span>
</code></pre>
<p>Here, Catalyst has done its job:</p>
<ul>
<li><p>Column IDs (#11, #12L) are resolved.</p>
</li>
<li><p>Unused columns are pruned – no need to carry them forward.</p>
</li>
<li><p>The plan now accurately reflects Spark’s optimized logical intent.</p>
</li>
</ul>
<p>If you ever wonder whether Spark pruned columns or pushed filters, this is the section to check.</p>
<p><strong>3. Physical Plan</strong></p>
<pre><code class="lang-python">== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[department<span class="hljs-comment">#3], functions=[avg(salary#4L)], output=[department#3, avg_salary#8])</span>
   +- Exchange hashpartitioning(department<span class="hljs-comment">#3, 200), ENSURE_REQUIREMENTS, [plan_id=19]</span>
      +- HashAggregate(keys=[department<span class="hljs-comment">#3], functions=[partial_avg(salary#4L)], output=[department#3, sum#20, count#21L])</span>
         +- Project [department<span class="hljs-comment">#3, salary#4L]</span>
            +- Filter (isnotnull(age<span class="hljs-comment">#5L) AND (age#5L &gt; 30))</span>
               +- Scan ExistingRDD[id<span class="hljs-comment">#0L,firstname#1,lastname#2,department#3,salary#4L,age#5L,hire_date#6,country#7]</span>
</code></pre>
<p><strong>Breakdown</strong></p>
<ul>
<li><p>Scan ExistingRDD → Spark reading from the in-memory DataFrame.</p>
</li>
<li><p>Filter → narrow transformation, no shuffle.</p>
</li>
<li><p>HashAggregate → partial aggregation per partition.</p>
</li>
<li><p>Exchange → wide dependency: data is shuffled by department.</p>
</li>
<li><p>Top HashAggregate → final aggregation after shuffle.</p>
</li>
</ul>
<p>This structure – partial agg → shuffle → final agg – is Spark’s default two-phase aggregation pattern.</p>
<h4 id="heading-recognizing-common-nodes">Recognizing Common Nodes</h4>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Node / Operator</strong></td><td><strong>Meaning</strong></td><td><strong>Optimization Hint</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Project</td><td>Column selection or computed fields</td><td>Combine multiple withColumn() into one select()</td></tr>
<tr>
<td>Filter</td><td>Predicate on rows</td><td>Push filters as low as possible in the plan</td></tr>
<tr>
<td>Join</td><td>Combine two DataFrames</td><td>Broadcast smaller side if &lt; 10 MB</td></tr>
<tr>
<td>Aggregate</td><td>GroupBy, sum, avg, count</td><td>Filter before aggregating to reduce cardinality</td></tr>
<tr>
<td>Exchange</td><td>Shuffle / data redistribution</td><td>Minimize by filtering early, using broadcast join</td></tr>
<tr>
<td>Sort</td><td>OrderBy, sort</td><td>Avoid global sorts; use within partitions if possible</td></tr>
<tr>
<td>Window</td><td>Windowed analytics (row_number, rank)</td><td>Partition on selective keys to reduce shuffle</td></tr>
</tbody>
</table>
</div><p>Repeated invocations of withColumn stack multiple Project nodes, which increases the plan depth. Instead, combine these invocations using select.</p>
<p>Multiple Exchange nodes imply repeated data shuffles. You can eliminate these by broadcasting the data or filtering.</p>
<p>Multiple scans of the same table within a single operation imply that some caching of strategic intermediates is lacking.</p>
<p>And frequent SortMergeJoin operations imply that Spark is unnecessarily sorting and shuffling the data. You can eliminate these by broadcasting the smaller dataframe or bucketing.</p>
<h4 id="heading-debugging-strategy-read-plans-from-top-to-bottom">Debugging Strategy: Read Plans from Top to Bottom</h4>
<p>Remember: Spark <em>executes</em> plans from bottom up (from data source to final result). But when you're debugging, you read from the top down (from the output schema back to the root cause). This reversal is intentional: you start with what's wrong at the output level, then trace backward through the plan to find where the inefficiency was introduced.</p>
<p>When debugging a slow job:</p>
<ul>
<li><p>Start at the top: Identify output schema and major operators (HashAggregate, Join, and so on).</p>
</li>
<li><p>Scroll for Exchanges: Count them. Each = stage boundary. Ask “Why do I need this shuffle?”</p>
</li>
<li><p>Trace backward: See if filters or projections appear below or above joins.</p>
</li>
<li><p>Look for duplication: Same scan twice? Missing cache? Re-derived columns?</p>
</li>
<li><p>Check join strategy: If it’s SortMergeJoin but one table is small, force a broadcast().</p>
</li>
<li><p>Re-run explain after optimization: You should literally see the extra nodes disappear.</p>
</li>
</ul>
<h4 id="heading-catalyst-optimizer-in-action">Catalyst Optimizer in Action</h4>
<p>Catalyst applies dozens of rules automatically. Knowing a few helps you interpret what changed:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Optimization Rule</strong></td><td><strong>Example Transformation</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Predicate Pushdown</td><td>Moves filters below joins/scans</td></tr>
<tr>
<td>Constant Folding</td><td>Replaces salary * 1 with salary</td></tr>
<tr>
<td>Column Pruning</td><td>Drops unused columns early</td></tr>
<tr>
<td>Combine Filters</td><td>Merges consecutive filters into one</td></tr>
<tr>
<td>Simplify Casts</td><td>Removes redundant type casts</td></tr>
<tr>
<td>Reorder Joins / Join Reordering</td><td>Changes join order for cheaper plan</td></tr>
</tbody>
</table>
</div><p>Putting it all together: every plan tells a story:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769458525411/64fa30a4-b16e-4aed-8c04-d12b476d9ae6.png" alt="Spark Plans and Stages" class="image--center mx-auto" width="1920" height="404" loading="lazy"></p>
<p>As you progress through the practical scenarios in Chapter 4, read every plan before and after. Your goal isn't memorization – it's intuition.</p>
<h2 id="heading-chapter-4-writing-efficient-transformations">Chapter 4: Writing Efficient Transformations</h2>
<p>Every Spark job tells a story, not in code, but in plans. By now, you've seen how Spark interprets transformations (Chapter 1), how it executes them through stages and tasks (Chapter 2), and how to read plans like a detective (Chapter 3). Now comes the part where you apply that knowledge: writing transformations that yield efficient logical plans.</p>
<p>This chapter is the heart of the handbook. It's where we move from understanding Spark's mind to writing code that speaks its language fluently.</p>
<h3 id="heading-why-transformations-matter">Why Transformations Matter</h3>
<p>In PySpark, most performance issues don’t start in clusters or configurations. They start in transformations: the way we chain, filter, rename, or join data. Every transformation reshapes the logical plan, influencing how Spark optimizes, when it shuffles, and whether the final DAG is streamlined or tangled.</p>
<p>A good transformation sequence:</p>
<ul>
<li><p>Keeps plans shallow, not nested.</p>
</li>
<li><p>Applies filters early, not after computation.</p>
</li>
<li><p>Reduces data movement, not just data size.</p>
</li>
<li><p>Let’s Catalyst and AQE optimize freely, without user-induced constraints.</p>
</li>
</ul>
<p>A bad one can double runtime, and you won't see it in your code, only in your plan.</p>
<h3 id="heading-the-goal-of-this-chapter">The Goal of this Chapter</h3>
<p>We’ll explore a series of real-world optimization scenarios, drawn from production ETL and analytical pipelines, each showing how a small change in code can completely reshape the logical plan and execution behavior.</p>
<p>Each scenario is practical and short, following a consistent structure. By the end of this chapter, you’ll be able to <em>see</em> optimization opportunities the moment you write code, because you’ll know exactly how they alter the logical plan beneath.</p>
<h3 id="heading-before-you-dive-in">Before You Dive In:</h3>
<p>Open a Spark shell or notebook. Load your familiar employees DataFrame. Run every example, and compare the explain("formatted") output before and after the fix. Because in this chapter, performance isn’t about more theory, it’s about seeing the difference in the plan and feeling the difference in execution time.</p>
<h3 id="heading-scenario-1-rename-in-one-pass-withcolumnrenamed-vs-todf">Scenario 1: Rename in One Pass: withColumnRenamed() vs toDF()</h3>
<p>If you’ve worked with PySpark DataFrames, you’ve probably had to rename columns, either by calling withColumnRenamed() repeatedly or by using toDF() in one shot.</p>
<p>At first glance, both approaches produce identical results: the columns have the new names you wanted. But beneath the surface, Spark treats them very differently – and that difference shows up directly in your logical plan.</p>
<pre><code class="lang-python">df_renamed = (df.withColumnRenamed(<span class="hljs-string">"id"</span>, <span class="hljs-string">"emp_id"</span>)
    .withColumnRenamed(<span class="hljs-string">"firstname"</span>, <span class="hljs-string">"first_name"</span>)
    .withColumnRenamed(<span class="hljs-string">"lastname"</span>, <span class="hljs-string">"last_name"</span>)
    .withColumnRenamed(<span class="hljs-string">"department"</span>, <span class="hljs-string">"dept"</span>)
    .withColumnRenamed(<span class="hljs-string">"salary"</span>, <span class="hljs-string">"base_salary"</span>)
    .withColumnRenamed(<span class="hljs-string">"age"</span>, <span class="hljs-string">"age_years"</span>)
    .withColumnRenamed(<span class="hljs-string">"hire_date"</span>, <span class="hljs-string">"hired_on"</span>)
    .withColumnRenamed(<span class="hljs-string">"country"</span>, <span class="hljs-string">"country_code"</span>)
)
</code></pre>
<p>This is simple and readable. But Spark builds the plan step by step, adding one Project node for every rename. Each Project node copies all existing columns, plus the newly renamed one. In large schemas (hundreds of columns), this silently bloats the plan.</p>
<h4 id="heading-logical-plan-impact">Logical Plan Impact:</h4>
<pre><code class="lang-python">Project [emp_id, first_name, last_name, dept, base_salary, age_years, hired_on, country_code]

└─ Project [id, first_name, last_name, dept, base_salary, age_years, hired_on, country_code]

└─ Project [id, firstname, last_name, dept, base_salary, age_years, hired_on, country_code]

└─ Project [id, firstname, lastname, dept, base_salary, age_years, hire_date, country_code]

└─ Project [id, firstname, lastname, department, base_salary, age_years, hire_date, country]

└─ Project [id, firstname, lastname, department, salary, age_years, hire_date, country]

└─ Project [id, firstname, lastname, department, salary, age, hire_date, country]

└─ LogicalRDD [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>Each rename adds a new Project layer, deepening the DAG. Spark now has to materialize intermediate projections before applying the next one. You can see this by running: <em>df.explain(True).</em></p>
<h4 id="heading-the-better-approach-rename-once-with-todf">The Better Approach: Rename Once with toDF()</h4>
<p>Instead of chaining multiple renames, rename all columns in a single pass:</p>
<pre><code class="lang-python">new_cols = [<span class="hljs-string">"id"</span>, <span class="hljs-string">"first_name"</span>, <span class="hljs-string">"last_name"</span>, <span class="hljs-string">"department"</span>,
            <span class="hljs-string">"salary"</span>, <span class="hljs-string">"age"</span>, <span class="hljs-string">"hired_on"</span>, <span class="hljs-string">"country"</span>]

df_renamed = df.toDF(*new_cols)
</code></pre>
<h4 id="heading-logical-plan-impact-1">Logical Plan Impact:</h4>
<pre><code class="lang-python">Project [id, first_name, last_name, department, salary, age, hired_on, country]

└─ LogicalRDD [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>Now there’s just one Project node, which means one projection over the source data. This gives us a flatter, more efficient plan.</p>
<h4 id="heading-under-the-hood-what-spark-actually-does">Under the Hood: What Spark Actually Does</h4>
<p>Every time you call withColumnRenamed(), Spark rewrites the entire projection list. Catalyst treats the rename as a full column re-selection from the previous node, not as a light-weight alias update. When you chain several renames, Catalyst duplicates internal column metadata for each intermediate step.</p>
<p>By contrast, toDF() rebases the schema in a single action. Catalyst interprets it as a single schema rebinding, so no redundant metadata trees are created.</p>
<h4 id="heading-real-world-timing-glue-job-benchmark">Real-World Timing: Glue Job Benchmark</h4>
<p>To see if chained withColumnRenamed calls add real overhead, here's a simple timing test performed on a Glue job using a DataFrame with 1M rows. First using withColumnRenamed:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> time
<span class="hljs-keyword">from</span> pyspark.sql <span class="hljs-keyword">import</span> SparkSession

spark = SparkSession.builder.appName(<span class="hljs-string">"MillionRowsRenameTest"</span>).getOrCreate()

employees_data = [
    (<span class="hljs-number">1</span>, <span class="hljs-string">"John"</span>, <span class="hljs-string">"Doe"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">80000</span>, <span class="hljs-number">28</span>, <span class="hljs-string">"2020-01-15"</span>, <span class="hljs-string">"USA"</span>),
    (<span class="hljs-number">2</span>, <span class="hljs-string">"Jane"</span>, <span class="hljs-string">"Smith"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">85000</span>, <span class="hljs-number">32</span>, <span class="hljs-string">"2019-03-20"</span>, <span class="hljs-string">"USA"</span>),
    (<span class="hljs-number">3</span>, <span class="hljs-string">"Alice"</span>, <span class="hljs-string">"Johnson"</span>, <span class="hljs-string">"Sales"</span>, <span class="hljs-number">60000</span>, <span class="hljs-number">25</span>, <span class="hljs-string">"2021-06-10"</span>, <span class="hljs-string">"UK"</span>),
    (<span class="hljs-number">4</span>, <span class="hljs-string">"Bob"</span>, <span class="hljs-string">"Brown"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">90000</span>, <span class="hljs-number">35</span>, <span class="hljs-string">"2018-07-01"</span>, <span class="hljs-string">"USA"</span>),
    (<span class="hljs-number">5</span>, <span class="hljs-string">"Charlie"</span>, <span class="hljs-string">"Wilson"</span>, <span class="hljs-string">"Sales"</span>, <span class="hljs-number">65000</span>, <span class="hljs-number">29</span>, <span class="hljs-string">"2020-11-05"</span>, <span class="hljs-string">"UK"</span>),
    (<span class="hljs-number">6</span>, <span class="hljs-string">"David"</span>, <span class="hljs-string">"Lee"</span>, <span class="hljs-string">"HR"</span>, <span class="hljs-number">55000</span>, <span class="hljs-number">27</span>, <span class="hljs-string">"2021-01-20"</span>, <span class="hljs-string">"USA"</span>),
    (<span class="hljs-number">7</span>, <span class="hljs-string">"Eve"</span>, <span class="hljs-string">"Davis"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">95000</span>, <span class="hljs-number">40</span>, <span class="hljs-string">"2017-04-12"</span>, <span class="hljs-string">"Canada"</span>),
    (<span class="hljs-number">8</span>, <span class="hljs-string">"Frank"</span>, <span class="hljs-string">"Miller"</span>, <span class="hljs-string">"Sales"</span>, <span class="hljs-number">70000</span>, <span class="hljs-number">33</span>, <span class="hljs-string">"2019-09-25"</span>, <span class="hljs-string">"UK"</span>),
    (<span class="hljs-number">9</span>, <span class="hljs-string">"Grace"</span>, <span class="hljs-string">"Taylor"</span>, <span class="hljs-string">"HR"</span>, <span class="hljs-number">58000</span>, <span class="hljs-number">26</span>, <span class="hljs-string">"2021-08-15"</span>, <span class="hljs-string">"Canada"</span>),
    (<span class="hljs-number">10</span>, <span class="hljs-string">"Henry"</span>, <span class="hljs-string">"Anderson"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">88000</span>, <span class="hljs-number">31</span>, <span class="hljs-string">"2020-02-28"</span>, <span class="hljs-string">"USA"</span>)
]

multiplied_data = [(i, <span class="hljs-string">f"firstname_<span class="hljs-subst">{i}</span>"</span>, <span class="hljs-string">f"lastname_<span class="hljs-subst">{i}</span>"</span>,
                    employees_data[i % <span class="hljs-number">10</span>][<span class="hljs-number">3</span>],  <span class="hljs-comment"># department</span>
                    employees_data[i % <span class="hljs-number">10</span>][<span class="hljs-number">4</span>],  <span class="hljs-comment"># salary</span>
                    employees_data[i % <span class="hljs-number">10</span>][<span class="hljs-number">5</span>],  <span class="hljs-comment"># age</span>
                    employees_data[i % <span class="hljs-number">10</span>][<span class="hljs-number">6</span>],  <span class="hljs-comment"># hire_date</span>
                    employees_data[i % <span class="hljs-number">10</span>][<span class="hljs-number">7</span>])  <span class="hljs-comment"># country</span>
                   <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, <span class="hljs-number">1</span>_000_001)]

df = spark.createDataFrame(multiplied_data,
                           [<span class="hljs-string">"id"</span>, <span class="hljs-string">"firstname"</span>, <span class="hljs-string">"lastname"</span>, <span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>, <span class="hljs-string">"age"</span>, <span class="hljs-string">"hire_date"</span>, <span class="hljs-string">"country"</span>])

start = time.time()
df1 = (df
       .withColumnRenamed(<span class="hljs-string">"firstname"</span>, <span class="hljs-string">"first_name"</span>)
       .withColumnRenamed(<span class="hljs-string">"lastname"</span>, <span class="hljs-string">"last_name"</span>)
       .withColumnRenamed(<span class="hljs-string">"department"</span>, <span class="hljs-string">"dept_name"</span>)
       .withColumnRenamed(<span class="hljs-string">"salary"</span>, <span class="hljs-string">"annual_salary"</span>)
       .withColumnRenamed(<span class="hljs-string">"age"</span>, <span class="hljs-string">"emp_age"</span>)
       .withColumnRenamed(<span class="hljs-string">"hire_date"</span>, <span class="hljs-string">"hired_on"</span>)
       .withColumnRenamed(<span class="hljs-string">"country"</span>, <span class="hljs-string">"work_country"</span>))

print(<span class="hljs-string">"withColumnRenamed Count:"</span>, df1.count())
print(<span class="hljs-string">"withColumnRenamed time:"</span>, round(time.time() - start, <span class="hljs-number">2</span>), <span class="hljs-string">"seconds"</span>)
</code></pre>
<p>Using toDF:</p>
<pre><code class="lang-python">start = time.time()
df2 = df.toDF(<span class="hljs-string">"id"</span>, <span class="hljs-string">"first_name"</span>, <span class="hljs-string">"last_name"</span>, <span class="hljs-string">"dept_name"</span>, <span class="hljs-string">"annual_salary"</span>, <span class="hljs-string">"emp_age"</span>, <span class="hljs-string">"hired_on"</span>, <span class="hljs-string">"work_country"</span>)
print(<span class="hljs-string">"toDF Count:"</span>, df2.count())
print(<span class="hljs-string">"toDF time:"</span>, round(time.time() - start, <span class="hljs-number">2</span>), <span class="hljs-string">"seconds"</span>)

spark.stop()
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Approach</strong></td><td><strong>Number of Project Nodes</strong></td><td><strong>Glue Execution Time (1M rows)</strong></td><td><strong>Plan Complexity</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Chained withColumnRenamed()</td><td>8 nodes</td><td>~12 seconds</td><td>Deep, nested</td></tr>
<tr>
<td>Single toDF()</td><td>1 node</td><td>~8 seconds</td><td>Flat, simple</td></tr>
</tbody>
</table>
</div><p>The difference becomes important at larger sizes or in complex pipelines, especially on managed runtimes such as AWS Glue (where planning overhead becomes important), or when tens of millions of rows are involved, where each additional Project increases column resolution, metadata work, and DAG height. And since Spark can’t collapse chained projections when column names are changed, renaming all columns in one go with toDF() results in a flatter logical and physical plan: one rename, one projection, and faster execution.</p>
<h3 id="heading-scenario-2-reusing-expressions">Scenario 2: Reusing Expressions</h3>
<p>Sometimes Spark jobs run slower, not because of shuffles or joins, but because the same computation is performed repeatedly within the logical plan. Every time you repeat an expression, say, col("salary") * 0.1 in multiple places, Spark treats it as a <em>new</em> derived column, expanding the logical plan and forcing redundant work.</p>
<h4 id="heading-the-problem-repeated-expressions">The Problem: Repeated Expressions</h4>
<p>Let’s say we’re calculating bonus and total compensation for employees:</p>
<pre><code class="lang-python">df_expr = (
    df.withColumn(<span class="hljs-string">"bonus"</span>, col(<span class="hljs-string">"salary"</span>) * <span class="hljs-number">0.10</span>)
      .withColumn(<span class="hljs-string">"total_comp"</span>, col(<span class="hljs-string">"salary"</span>) + (col(<span class="hljs-string">"salary"</span>) * <span class="hljs-number">0.10</span>))
)
</code></pre>
<p>At first glance, it’s simple enough. But Spark’s optimizer doesn’t automatically know that the (col("salary") * 0.10) in the second column is identical to the one computed in the first. Both get evaluated separately in the logical plan.</p>
<p><strong>Simplified Logical Plan:</strong></p>
<pre><code class="lang-python">Project [id, firstname, lastname, department,

salary, age, hire_date, country,

(salary * <span class="hljs-number">0.10</span>) AS bonus,

(salary + (salary * <span class="hljs-number">0.10</span>)) AS total_comp]

└─ LogicalRDD [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>While this looks compact, Spark must compute (salary * 0.10) twice, once for bonus, again inside total_comp. For a large dataset (say 100 M rows), that’s two full column evaluations. The waste compounds when your expression is complex, imagine parsing JSON, applying UDFs, or running date arithmetic multiple times.</p>
<h4 id="heading-the-better-approach-compute-once-reuse-everywhere">The Better Approach: Compute Once, Reuse Everywhere</h4>
<p>Compute the expression once, store it as a column, and reference it later:</p>
<pre><code class="lang-python">df_expr = (
    df.withColumn(<span class="hljs-string">"bonus"</span>, col(<span class="hljs-string">"salary"</span>) * <span class="hljs-number">0.10</span>)
      .withColumn(<span class="hljs-string">"total_comp"</span>, col(<span class="hljs-string">"salary"</span>) + col(<span class="hljs-string">"bonus"</span>))
)
</code></pre>
<p><strong>Simplified Logical Plan:</strong></p>
<pre><code class="lang-python">Project [id, firstname, lastname, department,

salary, age, hire_date, country,

(salary * <span class="hljs-number">0.10</span>) AS bonus,

(salary + bonus) AS total_comp]

└─ LogicalRDD [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>Now Spark calculates (salary * 0.10) once, stores it in the bonus column, and reuses that column when computing total_comp. This single change cuts CPU cost and memory usage.</p>
<h4 id="heading-under-the-hood-why-repetition-hurts">Under the Hood: Why Repetition Hurts</h4>
<p>Spark’s Catalyst optimizer doesn’t automatically factor out repeated expressions across different columns. Each withColumn() creates a new Project node with its own expression tree. If multiple nodes reuse the same arithmetic or function, Catalyst re-evaluates them independently.</p>
<p>On small DataFrames, this cost is invisible. On wide, computation-heavy jobs (think feature engineering pipelines), it can add hundreds of milliseconds per task.</p>
<p>Each redundant expression increases:</p>
<ul>
<li><p>Catalyst’s internal expression resolution time</p>
</li>
<li><p>The size of generated Java code in WholeStageCodegen</p>
</li>
<li><p>CPU cycles per row, since Spark cannot share intermediate results between columns in the same node</p>
</li>
</ul>
<h4 id="heading-real-world-benchmark-aws-glue">Real-World Benchmark: AWS Glue</h4>
<p>We tested this pattern on AWS Glue (Spark 3.3) with 10 million rows and a simulated expensive computation on the similar dataset we used in Scenario 1.</p>
<pre><code class="lang-python">df = spark.createDataFrame(multiplied_data,
                           [<span class="hljs-string">"id"</span>, <span class="hljs-string">"firstname"</span>, <span class="hljs-string">"lastname"</span>, <span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>, <span class="hljs-string">"age"</span>, <span class="hljs-string">"hire_date"</span>, <span class="hljs-string">"country"</span>])

expr = sqrt(exp(log(col(<span class="hljs-string">"salary"</span>) + <span class="hljs-number">1</span>)))

start = time.time()

df_repeated = (
    df.withColumn(<span class="hljs-string">"metric_a"</span>, expr)
      .withColumn(<span class="hljs-string">"metric_b"</span>, expr * <span class="hljs-number">2</span>)
      .withColumn(<span class="hljs-string">"metric_c"</span>, expr / <span class="hljs-number">10</span>)
)

df_repeated.count()
time_repeated = round(time.time() - start, <span class="hljs-number">2</span>)

start = time.time()

df_reused = (
    df.withColumn(<span class="hljs-string">"metric"</span>, expr)
      .withColumn(<span class="hljs-string">"metric_a"</span>, col(<span class="hljs-string">"metric"</span>))
      .withColumn(<span class="hljs-string">"metric_b"</span>, col(<span class="hljs-string">"metric"</span>) * <span class="hljs-number">2</span>)
      .withColumn(<span class="hljs-string">"metric_c"</span>, col(<span class="hljs-string">"metric"</span>) / <span class="hljs-number">10</span>)
)

df_reused.count()

print(<span class="hljs-string">"Repeated expr time:"</span>, time_repeated, <span class="hljs-string">"seconds"</span>)
print(<span class="hljs-string">"Reused expr time:"</span>, round(time.time() - start, <span class="hljs-number">2</span>), <span class="hljs-string">"seconds"</span>)

spark.stop()
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Approach</strong></td><td><strong>Project Nodes</strong></td><td><strong>Execution Time (10M rows)</strong></td><td><strong>Expression Evaluations</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Repeated expression</td><td>Multiple (nested)</td><td>~18 seconds</td><td>3x per row</td></tr>
<tr>
<td>Compute once, reuse</td><td>Single</td><td>~11 seconds</td><td>1x per row</td></tr>
</tbody>
</table>
</div><p>The performance gap widens further with genuinely expensive expressions (like regex extraction, JSON parsing, or UDFs).</p>
<h4 id="heading-physical-plan-implication">Physical Plan Implication</h4>
<p>In the physical plan, repeated expressions expand into multiple Java blocks within the same WholeStageCodegen node:</p>
<pre><code class="lang-python">*(<span class="hljs-number">1</span>) Project [sqrt(exp(log(salary + <span class="hljs-number">1</span>))) AS metric_a,

(sqrt(exp(log(salary + <span class="hljs-number">1</span>))) * <span class="hljs-number">2</span>) AS metric_b,

(sqrt(exp(log(salary + <span class="hljs-number">1</span>))) / <span class="hljs-number">10</span>) AS metric_c, ...]
</code></pre>
<p>Spark literally embeds three copies of the same logic.</p>
<p>Each is JIT-compiled separately, leading to:</p>
<ul>
<li><p>Larger generated Java classes</p>
</li>
<li><p>Higher CPU utilization</p>
</li>
<li><p>Longer code-generation time before tasks even start</p>
</li>
</ul>
<p>When reusing a column, Spark generates one expression and references it by name, dramatically shrinking the codegen footprint. If you have complex transformations (nested when, UDFs, regex extractions, and so on), compute them once and reuse them with col("alias"). For even heavier expressions that appear across multiple pipelines, consider persisting the intermediate.</p>
<p>DataFrame:</p>
<pre><code class="lang-python">df_features = df.withColumn(<span class="hljs-string">"complex_feature"</span>, complex_logic)

df_features.cache()
</code></pre>
<p>That cache can save multiple recomputations across downstream steps.</p>
<h3 id="heading-scenario-3-batch-column-ops">Scenario 3: Batch Column Ops</h3>
<p>Most PySpark pipelines don’t die because of one big, obvious mistake. They slow down from a thousand tiny cuts: one extra withColumn() here, another there, until the logical plan turns into a tall stack of projections.</p>
<p>On its own, withColumn() is fine. The problem is how we use it:</p>
<ul>
<li><p>10–30 chained calls in a row</p>
</li>
<li><p>Re-deriving similar expressions</p>
</li>
<li><p>Spreading logic across many tiny steps</p>
</li>
</ul>
<p>This scenario shows how batching column operations into a single select() produces a flatter, cleaner logical plan that scales better and is easier to reason about.</p>
<h4 id="heading-the-problem-chaining-withcolumn-forever">The Problem: Chaining withColumn() Forever</h4>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> pyspark.sql.functions <span class="hljs-keyword">import</span> col, concat_ws, when, lit

df_transformed = (
    df.withColumn(<span class="hljs-string">"full_name"</span>, concat_ws(<span class="hljs-string">" "</span>, col(<span class="hljs-string">"firstname"</span>), col(<span class="hljs-string">"lastname"</span>)))
      .withColumn(<span class="hljs-string">"is_senior"</span>, when(col(<span class="hljs-string">"age"</span>) &gt;= <span class="hljs-number">35</span>, lit(<span class="hljs-number">1</span>)).otherwise(lit(<span class="hljs-number">0</span>)))
      .withColumn(<span class="hljs-string">"salary_k"</span>, col(<span class="hljs-string">"salary"</span>) / <span class="hljs-number">1000.0</span>)
      .withColumn(<span class="hljs-string">"experience_band"</span>,
                  when(col(<span class="hljs-string">"age"</span>) &lt; <span class="hljs-number">30</span>, <span class="hljs-string">"junior"</span>)
                  .when((col(<span class="hljs-string">"age"</span>) &gt;= <span class="hljs-number">30</span>) &amp; (col(<span class="hljs-string">"age"</span>) &lt; <span class="hljs-number">40</span>), <span class="hljs-string">"mid"</span>)
                  .otherwise(<span class="hljs-string">"senior"</span>))
      .withColumn(<span class="hljs-string">"country_upper"</span>, col(<span class="hljs-string">"country"</span>).upper())
)
</code></pre>
<p>It reads nicely, it runs, and everyone moves on. But under the hood, Spark builds this as multiple Project nodes, one per withColumn() call.</p>
<p><strong>Simplified Logical Plan (Chained): Conceptually</strong></p>
<pre><code class="lang-python">Project [..., country_upper]

└─ Project [..., experience_band]

   └─ Project [..., salary_k]

      └─ Project [..., is_senior]

         └─ Project [..., full_name]

            └─ LogicalRDD [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>Each layer re-selects all existing columns, adds one more derived column, and deepens the plan.</p>
<h4 id="heading-the-better-approach-batch-with-select">The Better Approach: Batch with select()</h4>
<p>Instead of incrementally patching the schema, build it once.</p>
<pre><code class="lang-python">df_transformed = df.select(
    col(<span class="hljs-string">"id"</span>),
    col(<span class="hljs-string">"firstname"</span>),
    col(<span class="hljs-string">"lastname"</span>),
    col(<span class="hljs-string">"department"</span>),
    col(<span class="hljs-string">"salary"</span>),
    col(<span class="hljs-string">"age"</span>),
    col(<span class="hljs-string">"hire_date"</span>),
    col(<span class="hljs-string">"country"</span>),
    concat_ws(<span class="hljs-string">" "</span>, col(<span class="hljs-string">"firstname"</span>), col(<span class="hljs-string">"lastname"</span>)).alias(<span class="hljs-string">"full_name"</span>),
    when(col(<span class="hljs-string">"age"</span>) &gt;= <span class="hljs-number">35</span>, lit(<span class="hljs-number">1</span>)).otherwise(lit(<span class="hljs-number">0</span>)).alias(<span class="hljs-string">"is_senior"</span>),
    (col(<span class="hljs-string">"salary"</span>) / <span class="hljs-number">1000.0</span>).alias(<span class="hljs-string">"salary_k"</span>),
    when(col(<span class="hljs-string">"age"</span>) &lt; <span class="hljs-number">30</span>, <span class="hljs-string">"junior"</span>)
        .when((col(<span class="hljs-string">"age"</span>) &gt;= <span class="hljs-number">30</span>) &amp; (col(<span class="hljs-string">"age"</span>) &lt; <span class="hljs-number">40</span>), <span class="hljs-string">"mid"</span>)
        .otherwise(<span class="hljs-string">"senior"</span>).alias(<span class="hljs-string">"experience_band"</span>),
    col(<span class="hljs-string">"country"</span>).upper().alias(<span class="hljs-string">"country_upper"</span>)
)
</code></pre>
<p><strong>Simplified Logical Plan (Batched):</strong></p>
<pre><code class="lang-python">Project [id, firstname, lastname, department, salary, age, hire_date, country,

         full_name, is_senior, salary_k, experience_band, country_upper]

└─ LogicalRDD [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>One Project. All derived columns <em>are</em> defined together. Flatter DAG. Cleaner plan.</p>
<h4 id="heading-under-the-hood-why-this-matters">Under the Hood: Why This Matters</h4>
<p>Each withColumn() is syntactic sugar for: “Take the previous plan, and create a new Project on top of it.” So 10 withColumn() calls = 10 projections wrapped on top of each other.</p>
<p>Catalyst can sometimes collapse adjacent Project nodes, but:</p>
<ul>
<li><p>Not always (especially when aliases shadow each other).</p>
</li>
<li><p>Not when expressions become complex or interdependent.</p>
</li>
<li><p>Not when UDFs or analysis barriers appear.</p>
</li>
</ul>
<p>Batching with select():</p>
<ul>
<li><p>Gives Catalyst a single, complete view of all expressions.</p>
</li>
<li><p>Enables more aggressive optimizations (constant folding, expression reuse, pruning).</p>
</li>
<li><p>Keeps expression trees shallower and codegen output smaller.</p>
</li>
</ul>
<p>Think of it as the difference between editing a sentence 10 times in a row and writing the final sentence once, cleanly.</p>
<h4 id="heading-real-world-example-using-the-employees-df-at-scale">Real-World Example: Using the Employees DF at Scale:</h4>
<p>Chained version (many withColumn()):</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> pyspark.sql.functions <span class="hljs-keyword">import</span> col, concat_ws, when, lit, upper
<span class="hljs-keyword">import</span> time

start = time.time()
df_chain = (
    df.withColumn(<span class="hljs-string">"full_name"</span>, concat_ws(<span class="hljs-string">" "</span>, col(<span class="hljs-string">"firstname"</span>), col(<span class="hljs-string">"lastname"</span>)))
      .withColumn(<span class="hljs-string">"is_senior"</span>, when(col(<span class="hljs-string">"age"</span>) &gt;= <span class="hljs-number">35</span>, <span class="hljs-number">1</span>).otherwise(<span class="hljs-number">0</span>))
      .withColumn(<span class="hljs-string">"salary_k"</span>, col(<span class="hljs-string">"salary"</span>) / <span class="hljs-number">1000.0</span>)
      .withColumn(<span class="hljs-string">"high_earner"</span>, when(col(<span class="hljs-string">"salary"</span>) &gt;= <span class="hljs-number">90000</span>, <span class="hljs-number">1</span>).otherwise(<span class="hljs-number">0</span>))
      .withColumn(<span class="hljs-string">"experience_band"</span>,
                  when(col(<span class="hljs-string">"age"</span>) &lt; <span class="hljs-number">30</span>, <span class="hljs-string">"junior"</span>)
                  .when((col(<span class="hljs-string">"age"</span>) &gt;= <span class="hljs-number">30</span>) &amp; (col(<span class="hljs-string">"age"</span>) &lt; <span class="hljs-number">40</span>), <span class="hljs-string">"mid"</span>)
                  .otherwise(<span class="hljs-string">"senior"</span>))
      .withColumn(<span class="hljs-string">"country_upper"</span>, upper(col(<span class="hljs-string">"country"</span>)))
)

df_chain.count()
time_chain = round(time.time() - start, <span class="hljs-number">2</span>)
</code></pre>
<p>Batched version (single select()):</p>
<pre><code class="lang-python">start = time.time()
df_batch = df.select(
    <span class="hljs-string">"id"</span>, <span class="hljs-string">"firstname"</span>, <span class="hljs-string">"lastname"</span>, <span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>, <span class="hljs-string">"age"</span>, <span class="hljs-string">"hire_date"</span>, <span class="hljs-string">"country"</span>,
    concat_ws(<span class="hljs-string">" "</span>, col(<span class="hljs-string">"firstname"</span>), col(<span class="hljs-string">"lastname"</span>)).alias(<span class="hljs-string">"full_name"</span>),
    when(col(<span class="hljs-string">"age"</span>) &gt;= <span class="hljs-number">35</span>, <span class="hljs-number">1</span>).otherwise(<span class="hljs-number">0</span>).alias(<span class="hljs-string">"is_senior"</span>),
    (col(<span class="hljs-string">"salary"</span>) / <span class="hljs-number">1000.0</span>).alias(<span class="hljs-string">"salary_k"</span>),
    when(col(<span class="hljs-string">"salary"</span>) &gt;= <span class="hljs-number">90000</span>, <span class="hljs-number">1</span>).otherwise(<span class="hljs-number">0</span>).alias(<span class="hljs-string">"high_earner"</span>),
    when(col(<span class="hljs-string">"age"</span>) &lt; <span class="hljs-number">30</span>, <span class="hljs-string">"junior"</span>)
        .when((col(<span class="hljs-string">"age"</span>) &gt;= <span class="hljs-number">30</span>) &amp; (col(<span class="hljs-string">"age"</span>) &lt; <span class="hljs-number">40</span>), <span class="hljs-string">"mid"</span>)
        .otherwise(<span class="hljs-string">"senior"</span>).alias(<span class="hljs-string">"experience_band"</span>),
    upper(col(<span class="hljs-string">"country"</span>)).alias(<span class="hljs-string">"country_upper"</span>)
)

df_batch.count()
time_batch = round(time.time() - start, <span class="hljs-number">2</span>)
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Approach</strong></td><td><strong>Logical Shape</strong></td><td><strong>Glue Execution Time (1M rows)</strong></td><td><strong>Notes</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Chained withColumn()</td><td>6 nested Projects</td><td>~14 seconds</td><td>Deep plan, more Catalyst work</td></tr>
<tr>
<td>Single select()</td><td>1 Project</td><td>~9 seconds</td><td>Flat planning, cleaner DAG</td></tr>
</tbody>
</table>
</div><p>The distinction is most evident when there are more derived columns, more complex expressions (UDFs, window functions), or when executing on managed runtimes such as AWS Glue.</p>
<p>In the chained cases, there are more Project nodes, code generation is fragmented, and expression evaluation is less amenable to global optimization.</p>
<p>In the batched cases, Spark generates a single Project node, more work is consolidated into a single WholeStageCodegen pipeline, code generation is reduced, the JVM is less stressed, and the plan is flatter and more amenable to optimization. This is not only cleaner, but it’s also faster, more reliable, and friendlier to Spark’s optimizer.</p>
<h3 id="heading-scenario-4-early-filter-vs-late-filter">Scenario 4: Early Filter vs Late Filter</h3>
<p>Many pipelines apply transformations first, adding columns, joining datasets, or calculating derived metrics, before filtering records. That order looks harmless in code but can double or triple the workload at execution.</p>
<h4 id="heading-problem-late-filtering">Problem: Late Filtering</h4>
<pre><code class="lang-python">df_late = (
    df.withColumn(<span class="hljs-string">"bonus"</span>, col(<span class="hljs-string">"salary"</span>) * <span class="hljs-number">0.1</span>)
      .withColumn(<span class="hljs-string">"salary_k"</span>, col(<span class="hljs-string">"salary"</span>) / <span class="hljs-number">1000</span>)
      .filter(col(<span class="hljs-string">"age"</span>) &gt; <span class="hljs-number">35</span>)
)
</code></pre>
<p>This means Spark first computes all columns for every employee, then discards most rows.</p>
<p><strong>Simplified Logical Plan:</strong></p>
<pre><code class="lang-python">Filter (age &gt; <span class="hljs-number">35</span>)

└─ Project [id, firstname, lastname, department, salary, age, hire_date, country,

            (salary * <span class="hljs-number">0.1</span>) AS bonus,

            (salary / <span class="hljs-number">1000</span>) AS salary_k]

   └─ LogicalRDD [...]
</code></pre>
<p>Catalyst can sometimes reorder this automatically, but when it can't (due to UDFs or complex logic), you're doing unnecessary work on data that's thrown away.</p>
<h4 id="heading-better-approach-early-filtering">Better Approach: Early Filtering</h4>
<pre><code class="lang-python">df_early = (
    df.filter(col(<span class="hljs-string">"age"</span>) &gt; <span class="hljs-number">35</span>)
      .withColumn(<span class="hljs-string">"bonus"</span>, col(<span class="hljs-string">"salary"</span>) * <span class="hljs-number">0.1</span>)
      .withColumn(<span class="hljs-string">"salary_k"</span>, col(<span class="hljs-string">"salary"</span>) / <span class="hljs-number">1000</span>)
)
</code></pre>
<p><strong>Simplified Logical Plan:</strong></p>
<pre><code class="lang-python">Project [id, firstname, lastname, department, salary, age, hire_date, country,

         (salary * <span class="hljs-number">0.1</span>) AS bonus,

         (salary / <span class="hljs-number">1000</span>) AS salary_k]

└─ Filter (age &gt; <span class="hljs-number">35</span>)

   └─ LogicalRDD [...]
</code></pre>
<p>Now Spark prunes the dataset first, then applies transformations. The result: smaller intermediate data, less codegen, shorter logical plan, shorter DAG, and smaller shuffle footprint.</p>
<h4 id="heading-real-world-benchmark-aws-glue-1">Real-World Benchmark: AWS Glue</h4>
<p>Late Filtering:</p>
<pre><code class="lang-python">df = spark.createDataFrame(
    multiplied_data,
    [<span class="hljs-string">"id"</span>, <span class="hljs-string">"firstname"</span>, <span class="hljs-string">"lastname"</span>, <span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>, <span class="hljs-string">"age"</span>, <span class="hljs-string">"hire_date"</span>, <span class="hljs-string">"country"</span>]
)

start_late = time.time()

df_late = (
    df.withColumn(<span class="hljs-string">"bonus"</span>, col(<span class="hljs-string">"salary"</span>) * <span class="hljs-number">0.1</span>)
      .withColumn(<span class="hljs-string">"salary_k"</span>, col(<span class="hljs-string">"salary"</span>) / <span class="hljs-number">1000</span>)
      .filter(col(<span class="hljs-string">"age"</span>) &gt; <span class="hljs-number">35</span>)   
)

df_late.count()
time_late = round(time.time() - start_late, <span class="hljs-number">2</span>)
</code></pre>
<p>Early Filtering:</p>
<pre><code class="lang-python">start_early = time.time()

df_early = (
    df.filter(col(<span class="hljs-string">"age"</span>) &gt; <span class="hljs-number">35</span>)    
      .withColumn(<span class="hljs-string">"bonus"</span>, col(<span class="hljs-string">"salary"</span>) * <span class="hljs-number">0.1</span>)
      .withColumn(<span class="hljs-string">"salary_k"</span>, col(<span class="hljs-string">"salary"</span>) / <span class="hljs-number">1000</span>)
)

df_early.count()
time_early = round(time.time() - start_early, <span class="hljs-number">2</span>)

print(<span class="hljs-string">"Late Filter Time:"</span>, time_late, <span class="hljs-string">"seconds"</span>)
print(<span class="hljs-string">"Early Filter Time:"</span>, time_early, <span class="hljs-string">"seconds"</span>)

spark.stop()
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Approach</strong></td><td><strong>Rows Processed Before Filter</strong></td><td><strong>Execution Time (approx)</strong></td><td><strong>Notes</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Late filter</td><td>1,000,000 (all rows)</td><td>~14 seconds</td><td>Computes bonus and salary_k for all rows, then filters</td></tr>
<tr>
<td>Early filter</td><td>300,000 (filtered subset)</td><td>~9 seconds</td><td>Filters first, computes only for age &gt; 35</td></tr>
</tbody>
</table>
</div><p>The early filter approach processes significantly less data before the projection, leading to faster execution and less memory pressure.</p>
<p>Always filter as early as possible, before joins, aggregations, expensive transformations (such as UDFs or window functions), and even during file reads via Parquet/ORC pushdown, since filtering at the source touches fewer partitions and leads to faster jobs.</p>
<h3 id="heading-scenario-5-column-pruning">Scenario 5: Column Pruning</h3>
<p>When working with Spark DataFrames, convenience often wins over correctness and nothing feels more convenient than select("*"). It’s quick, flexible, and perfect for exploration.</p>
<p>But in production pipelines, that little star silently costs CPU, memory, network bandwidth, and runtime efficiency. Every time you write select("*"), Spark expands it into <em>every</em> column from your schema, even if you’re using just one or two later.</p>
<p>Those extra attributes flow through every stage of the plan, from filters and joins to aggregations and shuffles. The result: inflated logical plans, bigger shuffle files, and slower queries.</p>
<h4 id="heading-the-problem-the-lazy-star">The Problem: “The Lazy Star”</h4>
<pre><code class="lang-python">df_star = (
    df.select(<span class="hljs-string">"*"</span>)
      .filter(col(<span class="hljs-string">"department"</span>) == <span class="hljs-string">"Engineering"</span>)
      .groupBy(<span class="hljs-string">"country"</span>)
      .agg(avg(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"avg_salary"</span>))
)
</code></pre>
<p>At first glance, this seems harmless. But the problem is: only two columns (country and salary) are needed for the aggregation, but Spark carries all eight (id, firstname, lastname, department, salary, age, hire_date, country) through every transformation.</p>
<p><strong>Simplified Logical Plan:</strong></p>
<pre><code class="lang-python">Aggregate [country], [avg(salary) AS avg_salary]

└─ Filter (department = Engineering)

   └─ Project [id, firstname, lastname, department, salary, age, hire_date, country]

      └─ LogicalRDD [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>Every node in this tree carries all columns. Catalyst can’t prune them because you explicitly asked for "*". The excess attributes are serialized, shuffled, and deserialized across the cluster, even though they serve no purpose in the final result.</p>
<h4 id="heading-the-fix-select-only-what-you-need">The Fix: Select Only What You Need</h4>
<p>Be deliberate with your projections. Select the minimal schema required for the task.</p>
<pre><code class="lang-python">df_pruned = (
    df.select(<span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>, <span class="hljs-string">"country"</span>)
      .filter(col(<span class="hljs-string">"department"</span>) == <span class="hljs-string">"Engineering"</span>)
      .groupBy(<span class="hljs-string">"country"</span>)
      .agg(avg(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"avg_salary"</span>))
)
</code></pre>
<p><strong>Simplified Logical Plan:</strong></p>
<pre><code class="lang-python">Aggregate [country], [avg(salary) AS avg_salary]

└─ Filter (department = Engineering)

   └─ Project [department, salary, country]

      └─ LogicalRDD [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>Now Spark reads and processes only the three required columns: department, salary, and country. The plan is narrower, the DAG simpler, and execution faster.</p>
<h4 id="heading-real-world-benchmark-aws-glue-2">Real-World Benchmark: AWS Glue</h4>
<p>Wide Projection:</p>
<pre><code class="lang-python">df = spark.createDataFrame(multiplied_data,
                           [<span class="hljs-string">"id"</span>, <span class="hljs-string">"firstname"</span>, <span class="hljs-string">"lastname"</span>, <span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>, <span class="hljs-string">"age"</span>, <span class="hljs-string">"hire_date"</span>, <span class="hljs-string">"country"</span>])

start = time.time()
df_star = (
    df.select(<span class="hljs-string">"*"</span>)
      .filter(col(<span class="hljs-string">"department"</span>) == <span class="hljs-string">"Engineering"</span>)
      .groupBy(<span class="hljs-string">"country"</span>)
      .agg(avg(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"avg_salary"</span>))
)

df_star.count()
time_star = round(time.time() - start, <span class="hljs-number">2</span>)
</code></pre>
<p>Pruned Projection:</p>
<pre><code class="lang-python">start = time.time()

df_pruned = (
    df.select(<span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>, <span class="hljs-string">"country"</span>)
      .filter(col(<span class="hljs-string">"department"</span>) == <span class="hljs-string">"Engineering"</span>)
      .groupBy(<span class="hljs-string">"country"</span>)
      .agg(avg(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"avg_salary"</span>))
)

df_pruned.count()
time_pruned = round(time.time() - start, <span class="hljs-number">2</span>)

print(<span class="hljs-string">f"select('*') time: <span class="hljs-subst">{time_star}</span>s"</span>)
print(<span class="hljs-string">f"pruned columns time: <span class="hljs-subst">{time_pruned}</span>s"</span>)

spark.stop()
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Approach</strong></td><td><strong>Columns Processed</strong></td><td><strong>Execution Time (1M rows)</strong></td><td><strong>Observation</strong></td></tr>
</thead>
<tbody>
<tr>
<td>select("*")</td><td>8</td><td>~26.54 s</td><td>Spark carries all columns through the plan.</td></tr>
<tr>
<td>Pruned projection</td><td>3</td><td>~2.21 s</td><td>Only needed columns processed → faster and lighter.</td></tr>
</tbody>
</table>
</div><h4 id="heading-under-the-hood-how-catalyst-handles-columns">Under the Hood: How Catalyst Handles Columns</h4>
<p>When you call select("*"), Catalyst resolves <em>every attribute</em> into the logical plan. Each subsequent transformation inherits that full attribute list, increasing plan depth and overhead.</p>
<p>Catalyst includes a rule called ColumnPruning, which removes unused attributes but it only works when Spark <em>can see</em> which columns are necessary. If you use "*" or dynamically reference df.columns, Catalyst loses visibility.</p>
<p><strong>Works:</strong></p>
<pre><code class="lang-python">df \
    .select(<span class="hljs-string">"salary"</span>, <span class="hljs-string">"country"</span>) \
    .groupBy(<span class="hljs-string">"country"</span>) \
    .agg(avg(<span class="hljs-string">"salary"</span>))
</code></pre>
<p><strong>Doesn’t Work:</strong></p>
<pre><code class="lang-python">cols = df.columns

df.select(cols) \
  .groupBy(<span class="hljs-string">"country"</span>) \
  .agg(avg(<span class="hljs-string">"salary"</span>))
</code></pre>
<p>In the second case, Catalyst can’t prune anything because cols might include everything.</p>
<h4 id="heading-physical-plan-differences">Physical Plan Differences</h4>
<pre><code class="lang-python">Wide Projection (select(<span class="hljs-string">"*"</span>)):

*(<span class="hljs-number">1</span>) HashAggregate(keys=[country], functions=[avg(salary)])

+- *(<span class="hljs-number">1</span>) Project [id, firstname, lastname, department, salary, age, hire_date, country]

   +- *(<span class="hljs-number">1</span>) Filter (department = Engineering)

      +- *(<span class="hljs-number">1</span>) Scan parquet ...
</code></pre>
<p>Pruned Projection:</p>
<pre><code class="lang-python">*(<span class="hljs-number">1</span>) HashAggregate(keys=[country], functions=[avg(salary)])

+- *(<span class="hljs-number">1</span>) Project [department, salary, country]

   +- *(<span class="hljs-number">1</span>) Filter (department = Engineering)

      +- *(<span class="hljs-number">1</span>) Scan parquet [department, salary, country]
</code></pre>
<p>Notice the last line: Spark physically scans only the three referenced columns from Parquet. That’s genuine I/O reduction, not just logical simplification. Using select(*) increases shuffle file sizes, memory usage during serialization, Catalyst planning time, and I/O and network traffic, and the solution requires no more than specifying the necessary columns.</p>
<p>But in managed environments like AWS Glue or Databricks, this simple practice can greatly reduce ETL time, particularly for Parquet or Delta files, due to effective column pruning during explicit projection. It’s one of the easiest and highest-impact Spark optimization techniques, starting with typing fewer asterisks.</p>
<h3 id="heading-scenario-6-filter-pushdown-vs-full-scan">Scenario 6: Filter Pushdown vs Full Scan</h3>
<p>When a Spark job feels slow right from the start, even before joins or aggregations, the culprit is often hidden at the data-read layer. Spark spends seconds (or minutes) scanning every record, even though most rows are useless for the query.</p>
<p>That’s where filter pushdown comes in. It tells Spark to <em>push your filter logic down to the file reader</em> so that Parquet / ORC / Delta formats return only the relevant rows from disk. Done right, this optimization can reduce scan size significantly. Done wrong, Spark performs a full scan, reading everything before filtering in memory.</p>
<h4 id="heading-the-problem-late-filters-and-full-scans">The Problem: Late Filters and Full Scans</h4>
<pre><code class="lang-python">employees_df = spark.read.parquet(<span class="hljs-string">"s3://data/employee_data/"</span>)

df_full = (
    employees_df
        .select(<span class="hljs-string">"*"</span>)  <span class="hljs-comment"># reads all columns</span>
        .filter(col(<span class="hljs-string">"country"</span>) == <span class="hljs-string">"Canada"</span>)
)
</code></pre>
<p>Looks fine, right? But Spark can’t push this filter to the Parquet reader because it’s applied <em>after</em> the select("*") projection step. Catalyst sees the filter as operating on a projected DataFrame, not the raw scan, so the pushdown boundary is lost.</p>
<p><strong>Simplified Logical Plan:</strong></p>
<pre><code class="lang-python">Filter (country = Canada)

└─ Project [id, firstname, lastname, department, salary, age, hire_date, country]

   └─ Scan parquet employee_data [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>Every record from every Parquet file is read into memory before the filter executes. In large tables, this means scanning terabytes when you only need megabytes.</p>
<h4 id="heading-the-fix-filter-early-and-project-light">The Fix: Filter Early and Project Light</h4>
<p>Move filters as close as possible to the data source and limit columns before Spark reads them:</p>
<pre><code class="lang-python">df_pushdown = (
    spark.read.parquet(<span class="hljs-string">"s3://data/employee_data/"</span>)
        .select(<span class="hljs-string">"id"</span>, <span class="hljs-string">"firstname"</span>, <span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>, <span class="hljs-string">"country"</span>)
        .filter(col(<span class="hljs-string">"country"</span>) == <span class="hljs-string">"Canada"</span>)
)
</code></pre>
<p><strong>Simplified Logical Plan:</strong></p>
<pre><code class="lang-python">Project [id, firstname, department, salary, country]

└─ Scan parquet employee_data [id, firstname, department, salary, country]
</code></pre>
<p>PushedFilters: [country = Canada]</p>
<p>Notice the difference: PushedFilters appears in the plan. That means the Parquet reader handles the predicate, returning only matching blocks and rows.</p>
<h4 id="heading-under-the-hood-what-actually-happens">Under the Hood: What Actually Happens</h4>
<p>When Spark performs filter pushdown, it leverages the Parquet metadata (min/max statistics and row-group indexes) stored in file footers.</p>
<ul>
<li><p>Spark inspects file-level metadata for the predicate column (country).</p>
</li>
<li><p>It skips any row group whose values don’t match (country ≠ Canada).</p>
</li>
<li><p>It reads only the necessary row groups and columns from disk.</p>
</li>
<li><p>Those records enter the DAG directly – no in-memory filtering required.</p>
</li>
</ul>
<p>This optimization happens entirely before Spark begins executing stages, reducing both I/O and network transfer.</p>
<h4 id="heading-real-world-benchmark-aws-glue-3">Real-World Benchmark: AWS Glue</h4>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> time
<span class="hljs-keyword">from</span> pyspark.sql <span class="hljs-keyword">import</span> SparkSession
<span class="hljs-keyword">from</span> pyspark.sql.functions <span class="hljs-keyword">import</span> col

spark = SparkSession.builder.appName(<span class="hljs-string">"FilterPushdownBenchmark"</span>).getOrCreate()

start = time.time()
df_full = (
    spark.read.parquet(<span class="hljs-string">"s3://data/employee_data/"</span>)
        .select(<span class="hljs-string">"*"</span>)                         <span class="hljs-comment"># all columns</span>
        .filter(col(<span class="hljs-string">"country"</span>) == <span class="hljs-string">"Canada"</span>)  
)
df_full.count()
time_full = round(time.time() - start, <span class="hljs-number">2</span>)

start = time.time()
df_pushdown = (
    spark.read.parquet(<span class="hljs-string">"s3://data/employee_data/"</span>)
        .select(<span class="hljs-string">"id"</span>, <span class="hljs-string">"firstname"</span>, <span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>, <span class="hljs-string">"country"</span>)
        .filter(col(<span class="hljs-string">"country"</span>) == <span class="hljs-string">"Canada"</span>)  
)
df_pushdown.count()
time_push = round(time.time() - start, <span class="hljs-number">2</span>)

print(<span class="hljs-string">"Full Scan Time:"</span>, time_full, <span class="hljs-string">"sec"</span>)
print(<span class="hljs-string">"Filter Pushdown Time:"</span>, time_push, <span class="hljs-string">"sec"</span>)

spark.stop()
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Approach</strong></td><td><strong>Execution Time (1 M rows)</strong></td><td><strong>Observation</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Full Scan</td><td>14.2 s</td><td>All files scanned and filtered in memory.</td></tr>
<tr>
<td>Filter Pushdown</td><td>3.8 s</td><td>Only relevant row groups and columns read.</td></tr>
</tbody>
</table>
</div><p><strong>Physical Plan Comparison</strong></p>
<p>Full Scan:</p>
<pre><code class="lang-python">*(<span class="hljs-number">1</span>) Filter (country = Canada)

+- *(<span class="hljs-number">1</span>) ColumnarToRow

   +- *(<span class="hljs-number">1</span>) FileScan parquet [id, firstname, lastname, department, salary, age, hire_date, country]

      Batched: true, DataFilters: [], PushedFilters: []
</code></pre>
<p>Pushdown:</p>
<pre><code class="lang-python">*(<span class="hljs-number">1</span>) ColumnarToRow

+- *(<span class="hljs-number">1</span>) FileScan parquet [id, firstname, department, salary, country]

   Batched: true, DataFilters: [isnotnull(country)], PushedFilters: [country = Canada]
</code></pre>
<p>The difference is clear: PushedFilters confirms that Spark applied predicate pushdown, skipping unnecessary row groups at the scan stage.</p>
<h4 id="heading-reflection-why-pushdown-matters">Reflection: Why Pushdown Matters</h4>
<p>Pushdown isn’t a micro-optimization. It’s actually often the single biggest performance lever in Spark ETL. In data lakes with hundreds of files, full scans waste hours and inflate AWS S3 I/O costs. By filtering and projecting early, Spark prunes both rows and columns before execution even begins.</p>
<p>Apply filters as early as possible in the read pipeline, combine filter pushdown with column pruning, verify PushedFilters in explain("formatted"), avoid UDFs and select("*") at read time, and let pushdown turn “read everything and discard most” into “read only what you need.”</p>
<h3 id="heading-scenario-7-de-duplicate-right">Scenario 7: De-duplicate Right</h3>
<h4 id="heading-the-problem-all-row-deduplication-and-why-it-hurts">The Problem: “All-Row Deduplication” and Why It Hurts</h4>
<p>When we use this:</p>
<pre><code class="lang-python">df.dropDuplicates()
</code></pre>
<p>Spark removes identical rows across all columns. It sounds simple, but this operation forces Spark to treat every column as part of the deduplication key.</p>
<p>Internally, it means:</p>
<ul>
<li><p>Every attribute is serialized and hashed.</p>
</li>
<li><p>Every unique combination of all columns is shuffled across the cluster to ensure global uniqueness.</p>
</li>
<li><p>Even small changes in a non-essential field (like hire_date) cause new keys and destroy aggregation locality.</p>
</li>
</ul>
<p>In wide tables, this is one of the heaviest shuffle operations Spark can perform: df.dropDuplicates()</p>
<p><strong>Simplified Logical Plan:</strong></p>
<pre><code class="lang-python">Aggregate [id, firstname, lastname, department, salary, age, hire_date, country], [first(id) AS id, ...]

└─ Exchange hashpartitioning(id, firstname, lastname, department, salary, age, hire_date, country, <span class="hljs-number">200</span>)

   └─ LogicalRDD [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>Notice the Exchange: that’s a full shuffle across all columns. Spark must send every record to the partition responsible for its unique combination of all fields. This is slow, memory-intensive, and scales poorly as columns grow.</p>
<h4 id="heading-the-better-approach-key-based-deduplication">The Better Approach: Key-Based Deduplication</h4>
<p>In most real datasets, duplicates are determined by a primary or business key, not all attributes. For example, if id uniquely identifies an employee, we only need to keep one record per id.</p>
<pre><code class="lang-python">df.dropDuplicates([<span class="hljs-string">"id"</span>])
</code></pre>
<p>Now Spark deduplicates based only on the id column.</p>
<pre><code class="lang-python">Aggregate [id], [first(id) AS id, first(firstname) AS firstname, ...]

└─ Exchange hashpartitioning(id, <span class="hljs-number">200</span>)

   └─ LogicalRDD [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>The shuffle is dramatically narrower. Instead of hashing across all columns, Spark redistributes data only by id. Fewer bytes, smaller shuffle files, faster reduce stage</p>
<h4 id="heading-real-world-benchmark-aws-glue-4">Real-World Benchmark: AWS Glue</h4>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> time
<span class="hljs-keyword">from</span> pyspark.sql.functions <span class="hljs-keyword">import</span> exp, log, sqrt, col, concat_ws, when, upper, avg
<span class="hljs-keyword">from</span> pyspark.sql <span class="hljs-keyword">import</span> SparkSession

spark = SparkSession.builder.appName(<span class="hljs-string">"MillionRowsRenameTest"</span>).getOrCreate()

employees_data = [
    (<span class="hljs-number">1</span>, <span class="hljs-string">"John"</span>, <span class="hljs-string">"Doe"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">80000</span>, <span class="hljs-number">28</span>, <span class="hljs-string">"2020-01-15"</span>, <span class="hljs-string">"USA"</span>),
    (<span class="hljs-number">2</span>, <span class="hljs-string">"Jane"</span>, <span class="hljs-string">"Smith"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">85000</span>, <span class="hljs-number">32</span>, <span class="hljs-string">"2019-03-20"</span>, <span class="hljs-string">"USA"</span>),
    (<span class="hljs-number">3</span>, <span class="hljs-string">"Alice"</span>, <span class="hljs-string">"Johnson"</span>, <span class="hljs-string">"Sales"</span>, <span class="hljs-number">60000</span>, <span class="hljs-number">25</span>, <span class="hljs-string">"2021-06-10"</span>, <span class="hljs-string">"UK"</span>),
    (<span class="hljs-number">4</span>, <span class="hljs-string">"Bob"</span>, <span class="hljs-string">"Brown"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">90000</span>, <span class="hljs-number">35</span>, <span class="hljs-string">"2018-07-01"</span>, <span class="hljs-string">"USA"</span>),
    (<span class="hljs-number">5</span>, <span class="hljs-string">"Charlie"</span>, <span class="hljs-string">"Wilson"</span>, <span class="hljs-string">"Sales"</span>, <span class="hljs-number">65000</span>, <span class="hljs-number">29</span>, <span class="hljs-string">"2020-11-05"</span>, <span class="hljs-string">"UK"</span>),
    (<span class="hljs-number">6</span>, <span class="hljs-string">"David"</span>, <span class="hljs-string">"Lee"</span>, <span class="hljs-string">"HR"</span>, <span class="hljs-number">55000</span>, <span class="hljs-number">27</span>, <span class="hljs-string">"2021-01-20"</span>, <span class="hljs-string">"USA"</span>),
    (<span class="hljs-number">7</span>, <span class="hljs-string">"Eve"</span>, <span class="hljs-string">"Davis"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">95000</span>, <span class="hljs-number">40</span>, <span class="hljs-string">"2017-04-12"</span>, <span class="hljs-string">"Canada"</span>),
    (<span class="hljs-number">8</span>, <span class="hljs-string">"Frank"</span>, <span class="hljs-string">"Miller"</span>, <span class="hljs-string">"Sales"</span>, <span class="hljs-number">70000</span>, <span class="hljs-number">33</span>, <span class="hljs-string">"2019-09-25"</span>, <span class="hljs-string">"UK"</span>),
    (<span class="hljs-number">9</span>, <span class="hljs-string">"Grace"</span>, <span class="hljs-string">"Taylor"</span>, <span class="hljs-string">"HR"</span>, <span class="hljs-number">58000</span>, <span class="hljs-number">26</span>, <span class="hljs-string">"2021-08-15"</span>, <span class="hljs-string">"Canada"</span>),
    (<span class="hljs-number">10</span>, <span class="hljs-string">"Henry"</span>, <span class="hljs-string">"Anderson"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">88000</span>, <span class="hljs-number">31</span>, <span class="hljs-string">"2020-02-28"</span>, <span class="hljs-string">"USA"</span>)
]

multiplied_data = [(i, <span class="hljs-string">f"firstname_<span class="hljs-subst">{i}</span>"</span>, <span class="hljs-string">f"lastname_<span class="hljs-subst">{i}</span>"</span>,
                    employees_data[i % <span class="hljs-number">10</span>][<span class="hljs-number">3</span>],   <span class="hljs-comment"># department</span>
                    employees_data[i % <span class="hljs-number">10</span>][<span class="hljs-number">4</span>],   <span class="hljs-comment"># salary</span>
                    employees_data[i % <span class="hljs-number">10</span>][<span class="hljs-number">5</span>],   <span class="hljs-comment"># age</span>
                    employees_data[i % <span class="hljs-number">10</span>][<span class="hljs-number">6</span>],   <span class="hljs-comment"># hire_date</span>
                    employees_data[i % <span class="hljs-number">10</span>][<span class="hljs-number">7</span>]    <span class="hljs-comment"># country</span>
                    )
                   <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, <span class="hljs-number">1</span>_000_001)]

df = spark.createDataFrame(
    multiplied_data,
    [<span class="hljs-string">"id"</span>, <span class="hljs-string">"firstname"</span>, <span class="hljs-string">"lastname"</span>, <span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>, <span class="hljs-string">"age"</span>, <span class="hljs-string">"hire_date"</span>, <span class="hljs-string">"country"</span>]
)

start = time.time()
dedup_full = df.dropDuplicates()
dedup_full.count()
time_full = round(time.time() - start, <span class="hljs-number">2</span>)

start = time.time()
dedup_key = df.dropDuplicates([<span class="hljs-string">"id"</span>])
dedup_key.count()
time_key = round(time.time() - start, <span class="hljs-number">2</span>)

print(<span class="hljs-string">f"Full-row dedup time: <span class="hljs-subst">{time_full}</span>s"</span>)
print(<span class="hljs-string">f"Key-based dedup time: <span class="hljs-subst">{time_key}</span>s"</span>)

spark.stop()
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Approach</strong></td><td><strong>Execution Time (1M rows)</strong></td><td><strong>Observation</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Full-Row Dedup</td><td>27.6 s</td><td>Shuffle across all attributes, large hash table</td></tr>
<tr>
<td>Key-Based Dedup (["id"])</td><td>2.06 s</td><td>10× faster, minimal shuffle width</td></tr>
</tbody>
</table>
</div><h4 id="heading-under-the-hood-what-catalyst-does">Under the Hood: What Catalyst Does</h4>
<p>When you specify a key list, Catalyst rewrites dropDuplicates(keys) into a partial + final aggregate plan, just like a groupBy:</p>
<p>HashAggregate(keys=[id], functions=[first(...)])</p>
<p>This allows Spark to:</p>
<ul>
<li><p>Perform map-side partial aggregation on each partition (before shuffle).</p>
</li>
<li><p>Exchange only the grouping key (id).</p>
</li>
<li><p>Perform a final aggregation on the reduced data.</p>
</li>
</ul>
<p>The all-column version can’t do that optimization because every column participates in uniqueness Spark must ensure <em>complete</em> data redistribution.</p>
<h4 id="heading-best-practices-for-deduplication">Best Practices for Deduplication</h4>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Practice</strong></td><td><strong>Why It Matters</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Always deduplicate by key columns</td><td>Reduces shuffle width and data movement</td></tr>
<tr>
<td>Use deterministic keys (id, email, ssn)</td><td>Ensures predictable grouping</td></tr>
<tr>
<td>Avoid dropDuplicates() without arguments</td><td>Forces global shuffle across all attributes</td></tr>
<tr>
<td>Combine with column pruning</td><td>Keep only necessary fields before deduplication</td></tr>
<tr>
<td>For “latest record” logic, use window functions</td><td>Allows targeted deduplication (row_number() with order)</td></tr>
<tr>
<td>Cache intermediate datasets if reused</td><td>Avoids recomputation of expensive dedup stages</td></tr>
</tbody>
</table>
</div><h4 id="heading-combining-deduplication-amp-aggregation">Combining Deduplication &amp; Aggregation</h4>
<p>You can merge deduplication with aggregation for even better results:</p>
<pre><code class="lang-python">df_dedup_agg = (
    df.dropDuplicates([<span class="hljs-string">"id"</span>])
        .groupBy(<span class="hljs-string">"department"</span>)
        .agg(avg(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"avg_salary"</span>))
)
</code></pre>
<p>Spark now reuses the same shuffle partitioning for both operations, one shuffle instead of two. The plan will show:</p>
<pre><code class="lang-python">HashAggregate(keys=[department], functions=[avg(salary)])

└─ HashAggregate(keys=[id], functions=[first(...), first(department)])

   └─ Exchange hashpartitioning(id, <span class="hljs-number">200</span>)
</code></pre>
<p>Prefer dropDuplicates(["key_col"]) over dropDuplicates() to deduplicate by business or surrogate keys rather than the entire schema. Combine deduplication with projection to reduce I/O, and remember that one narrow shuffle is always better than a wide shuffle. Deduplication isn’t just cleanup – it’s an optimization strategy. Choose your keys wisely, and Spark will reward you with faster jobs and lighter DAGs.</p>
<h3 id="heading-scenario-8-count-smarter">Scenario 8: Count Smarter</h3>
<p>In production, one of the most common performance pitfalls is the simplest line of code:</p>
<pre><code class="lang-python"><span class="hljs-keyword">if</span> df.count() &gt; <span class="hljs-number">0</span>:
</code></pre>
<p>At first glance, this seems harmless. You just want to know whether the DataFrame has any data before writing, joining, or aggregating. But in Spark, count() is not metadata lookup, it’s a full cluster-wide job.</p>
<p><strong>What Really Happens with count()</strong><br>When you call df.count(), Spark executes a complete action:</p>
<ul>
<li><p>It scans every partition.</p>
</li>
<li><p>Deserializes every row.</p>
</li>
<li><p>Counts records locally on each executor.</p>
</li>
<li><p>Reduces the counts to the driver.</p>
</li>
</ul>
<p>That means your “empty check” runs a full distributed computation, even when the dataset has billions of rows or lives in S3.</p>
<pre><code class="lang-python">df.count()
</code></pre>
<p><strong>Simplified Logical Plan:</strong></p>
<pre><code class="lang-python">*(<span class="hljs-number">1</span>) HashAggregate(keys=[], functions=[count(<span class="hljs-number">1</span>)])

+- *(<span class="hljs-number">1</span>) ColumnarToRow

   +- *(<span class="hljs-number">1</span>) FileScan parquet [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>Every record is read, aggregated, and returned just to produce a single integer.</p>
<p>Now imagine this runs in the middle of your Glue job, before a write, before a filter, or inside a loop. You’ve just added a full-table scan to your DAG for no reason.</p>
<h4 id="heading-the-smarter-way-limit1-or-head1">The Smarter Way: limit(1) or head(1)</h4>
<p>If all you need to know is whether data exists, you don’t need to count every record. You just need to know if there’s <em>at least one</em>.</p>
<p>Two efficient alternatives</p>
<pre><code class="lang-python">df.head(<span class="hljs-number">1</span>)
<span class="hljs-comment">#or</span>
df.limit(<span class="hljs-number">1</span>).collect()
</code></pre>
<p>Both execute a lazy scan that stops as soon as one record is found.</p>
<p><strong>Simplified Logical Plan:</strong></p>
<pre><code class="lang-python">TakeOrderedAndProject(limit=<span class="hljs-number">1</span>)

└─ *(<span class="hljs-number">1</span>) FileScan parquet [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<ul>
<li><p>No global aggregation.</p>
</li>
<li><p>No shuffle.</p>
</li>
<li><p>No full scan.</p>
</li>
</ul>
<h4 id="heading-real-world-benchmark-aws-glue-5">Real-World Benchmark: AWS Glue</h4>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> time
<span class="hljs-keyword">from</span> pyspark.sql.functions <span class="hljs-keyword">import</span> exp, log, sqrt, col, concat_ws, when, upper, avg
<span class="hljs-keyword">from</span> pyspark.sql <span class="hljs-keyword">import</span> SparkSession

<span class="hljs-comment"># Initialize Spark session</span>
spark = SparkSession.builder.appName(<span class="hljs-string">"MillionRowsRenameTest"</span>).getOrCreate()

<span class="hljs-comment"># Base dataset (10 sample employees)</span>
employees_data = [
    (<span class="hljs-number">1</span>, <span class="hljs-string">"John"</span>, <span class="hljs-string">"Doe"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">80000</span>, <span class="hljs-number">28</span>, <span class="hljs-string">"2020-01-15"</span>, <span class="hljs-string">"USA"</span>),
    (<span class="hljs-number">2</span>, <span class="hljs-string">"Jane"</span>, <span class="hljs-string">"Smith"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">85000</span>, <span class="hljs-number">32</span>, <span class="hljs-string">"2019-03-20"</span>, <span class="hljs-string">"USA"</span>),
    (<span class="hljs-number">3</span>, <span class="hljs-string">"Alice"</span>, <span class="hljs-string">"Johnson"</span>, <span class="hljs-string">"Sales"</span>, <span class="hljs-number">60000</span>, <span class="hljs-number">25</span>, <span class="hljs-string">"2021-06-10"</span>, <span class="hljs-string">"UK"</span>),
    (<span class="hljs-number">4</span>, <span class="hljs-string">"Bob"</span>, <span class="hljs-string">"Brown"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">90000</span>, <span class="hljs-number">35</span>, <span class="hljs-string">"2018-07-01"</span>, <span class="hljs-string">"USA"</span>),
    (<span class="hljs-number">5</span>, <span class="hljs-string">"Charlie"</span>, <span class="hljs-string">"Wilson"</span>, <span class="hljs-string">"Sales"</span>, <span class="hljs-number">65000</span>, <span class="hljs-number">29</span>, <span class="hljs-string">"2020-11-05"</span>, <span class="hljs-string">"UK"</span>),
    (<span class="hljs-number">6</span>, <span class="hljs-string">"David"</span>, <span class="hljs-string">"Lee"</span>, <span class="hljs-string">"HR"</span>, <span class="hljs-number">55000</span>, <span class="hljs-number">27</span>, <span class="hljs-string">"2021-01-20"</span>, <span class="hljs-string">"USA"</span>),
    (<span class="hljs-number">7</span>, <span class="hljs-string">"Eve"</span>, <span class="hljs-string">"Davis"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">95000</span>, <span class="hljs-number">40</span>, <span class="hljs-string">"2017-04-12"</span>, <span class="hljs-string">"Canada"</span>),
    (<span class="hljs-number">8</span>, <span class="hljs-string">"Frank"</span>, <span class="hljs-string">"Miller"</span>, <span class="hljs-string">"Sales"</span>, <span class="hljs-number">70000</span>, <span class="hljs-number">33</span>, <span class="hljs-string">"2019-09-25"</span>, <span class="hljs-string">"UK"</span>),
    (<span class="hljs-number">9</span>, <span class="hljs-string">"Grace"</span>, <span class="hljs-string">"Taylor"</span>, <span class="hljs-string">"HR"</span>, <span class="hljs-number">58000</span>, <span class="hljs-number">26</span>, <span class="hljs-string">"2021-08-15"</span>, <span class="hljs-string">"Canada"</span>),
    (<span class="hljs-number">10</span>, <span class="hljs-string">"Henry"</span>, <span class="hljs-string">"Anderson"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">88000</span>, <span class="hljs-number">31</span>, <span class="hljs-string">"2020-02-28"</span>, <span class="hljs-string">"USA"</span>)
]

<span class="hljs-comment"># Create 1 million rows</span>
multiplied_data = [
    (i, <span class="hljs-string">f"firstname_<span class="hljs-subst">{i}</span>"</span>, <span class="hljs-string">f"lastname_<span class="hljs-subst">{i}</span>"</span>,
     employees_data[i % <span class="hljs-number">10</span>][<span class="hljs-number">3</span>],
     employees_data[i % <span class="hljs-number">10</span>][<span class="hljs-number">4</span>],
     employees_data[i % <span class="hljs-number">10</span>][<span class="hljs-number">5</span>],
     employees_data[i % <span class="hljs-number">10</span>][<span class="hljs-number">6</span>],
     employees_data[i % <span class="hljs-number">10</span>][<span class="hljs-number">7</span>])
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, <span class="hljs-number">1</span>_000_001)
]

df = spark.createDataFrame(
    multiplied_data,
    [<span class="hljs-string">"id"</span>, <span class="hljs-string">"firstname"</span>, <span class="hljs-string">"lastname"</span>, <span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>, <span class="hljs-string">"age"</span>, <span class="hljs-string">"hire_date"</span>, <span class="hljs-string">"country"</span>]
)
<span class="hljs-comment"># Create DataFrame</span>
df = spark.createDataFrame(
    multiplied_data,
    [<span class="hljs-string">"id"</span>, <span class="hljs-string">"firstname"</span>, <span class="hljs-string">"lastname"</span>, <span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>, <span class="hljs-string">"age"</span>, <span class="hljs-string">"hire_date"</span>, <span class="hljs-string">"country"</span>]
)

start = time.time()
df.count()
count_time = round(time.time() - start, <span class="hljs-number">2</span>)

start = time.time()
df.limit(<span class="hljs-number">1</span>).collect()
limit_time = round(time.time() - start, <span class="hljs-number">2</span>)

start = time.time()
df.head(<span class="hljs-number">1</span>)
head_time = round(time.time() - start, <span class="hljs-number">2</span>)

spark.stop()
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Method</strong></td><td><strong>Plan Type</strong></td><td><strong>Execution Time (1M rows)</strong></td><td><strong>Notes</strong></td></tr>
</thead>
<tbody>
<tr>
<td>count()</td><td>HashAggregate + Exchange</td><td>26.33 s</td><td>Full scan + aggregation</td></tr>
<tr>
<td>limit(1)</td><td>TakeOrderedAndProject</td><td>0.62 s</td><td>Stops after first record</td></tr>
<tr>
<td>head(1)</td><td>TakeOrderedAndProject</td><td>0.42 s</td><td>Fastest, single partition</td></tr>
</tbody>
</table>
</div><p>The difference is significant for the same logical check.</p>
<p>So why does this difference exist? Spark’s execution model treats every action as a trigger for computation. count() is an aggregation action, requiring global communication, and limit(1) and head(1) are sampling actions, short-circuiting the job after fetching the first record. Catalyst generates a TakeOrderedAndProject node instead of HashAggregate, and the scheduler terminates once one task finishes.</p>
<p><strong>Plan comparison:</strong></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Action</strong></td><td><strong>Simplified Plan</strong></td><td><strong>Type</strong></td><td><strong>Behavior</strong></td></tr>
</thead>
<tbody>
<tr>
<td>count()</td><td>HashAggregate → Exchange → FileScan</td><td>Global</td><td>Full scan, wide dependency</td></tr>
<tr>
<td>limit(1)</td><td>TakeOrderedAndProject → FileScan</td><td>Local</td><td>Early stop, narrow dependency</td></tr>
<tr>
<td>head(1)</td><td>TakeOrderedAndProject → FileScan</td><td>Local</td><td>Early stop, single task</td></tr>
</tbody>
</table>
</div><p>Avoid using count() to check emptiness since it triggers a full scan. Use limit(1) or head(1) for lightweight existence checks. And reserve count() only when the total is required, because Spark will always process all data unless explicitly told to stop. Other alternatives</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><code>df.take(1)</code></td><td>Similar to head() returns array</td></tr>
</thead>
<tbody>
<tr>
<td><code>df.first()</code></td><td>Returns first Row or None</td></tr>
<tr>
<td><code>df.isEmpty()</code></td><td>Returns true if DataFrame has no rows</td></tr>
<tr>
<td><code>df.rdd.isEmpty()</code></td><td>RDD-level check</td></tr>
</tbody>
</table>
</div><h3 id="heading-scenario-9-window-wisely">Scenario 9: Window Wisely</h3>
<p>Window functions (rank(), dense_rank(), lag(), avg() with over(), and so on) are essential in analytics. They let you calculate running totals, rankings, or time-based metrics.</p>
<p>But in Spark, they’re not cheap, because they rely on shuffles and ordering.</p>
<p>Each window operation:</p>
<ul>
<li><p>Requires all rows for the same partition key to be co-located on the same node.</p>
</li>
<li><p>Requires sorting those rows by the orderBy() clause within each partition.</p>
</li>
</ul>
<p>If you omit partitionBy() (or use it with too broad a key), Spark treats the entire dataset as one partition, triggering a massive shuffle and global sort.</p>
<h4 id="heading-global-window-the-wrong-way">Global Window: The Wrong Way</h4>
<p>Let’s compute employee rankings by salary without partitioning:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> pyspark.sql.window <span class="hljs-keyword">import</span> Window
<span class="hljs-keyword">from</span> pyspark.sql.functions <span class="hljs-keyword">import</span> rank, col

window_spec = Window.orderBy(col(<span class="hljs-string">"salary"</span>).desc())

df_ranked = df.withColumn(<span class="hljs-string">"salary_rank"</span>, rank().over(window_spec))
</code></pre>
<p><strong>Simplified Logical Plan:</strong></p>
<pre><code class="lang-python">Window [rank() windowspecdefinition(orderBy=[salary DESC]) AS salary_rank]

└─ Sort [salary DESC], true

   └─ Exchange rangepartitioning(salary DESC, <span class="hljs-number">200</span>)

      └─ LogicalRDD [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>Spark must shuffle and sort the entire dataset globally, a full sort across all rows. Every executor gets a slice of this single global range, and all data must move through the network.</p>
<h4 id="heading-partition-by-a-selective-key-the-better-way">Partition by a Selective Key: The Better Way</h4>
<p>Most analytics don’t need a global ranking. You likely want rankings within a department or group, not across the entire company.</p>
<pre><code class="lang-python">window_spec = Window.partitionBy(<span class="hljs-string">"department"</span>).orderBy(col(<span class="hljs-string">"salary"</span>).desc())

df_ranked = df.withColumn(<span class="hljs-string">"salary_rank"</span>, rank().over(window_spec))
</code></pre>
<p>Now Spark builds separate windows per department. Each partition’s data stays local, dramatically reducing shuffle size.</p>
<p><strong>Simplified Logical Plan:</strong></p>
<pre><code class="lang-python">Window [rank() windowspecdefinition(partitionBy=[department], orderBy=[salary DESC]) AS salary_rank]

└─ Sort [department ASC, salary DESC], false

   └─ Exchange hashpartitioning(department, <span class="hljs-number">200</span>)

      └─ LogicalRDD [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>The Exchange now partitions data only by department. The shuffle boundary is narrower, fewer bytes transferred, fewer sort comparisons, and smaller spill risk.</p>
<h4 id="heading-real-world-benchmark-aws-glue-6">Real-World Benchmark: AWS Glue</h4>
<p>We can execute the windows function on the same 1 million row dataset:</p>
<pre><code class="lang-python">df = spark.createDataFrame(multiplied_data,
[<span class="hljs-string">"id"</span>, <span class="hljs-string">"firstname"</span>, <span class="hljs-string">"lastname"</span>, <span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>, <span class="hljs-string">"age"</span>,
 <span class="hljs-string">"hire_date"</span>, <span class="hljs-string">"country"</span>])

start = time.time()
window_global = Window.orderBy(col(<span class="hljs-string">"salary"</span>).desc())
df_global = df.withColumn(<span class="hljs-string">"salary_rank"</span>, rank().over(window_global))
df_global.count()
global_time = round(time.time() - start, <span class="hljs-number">2</span>)
print(<span class="hljs-string">f'global_time:<span class="hljs-subst">{global_time}</span>'</span>)

start = time.time()
window_local = Window.partitionBy(<span class="hljs-string">"department"</span>).orderBy(col(<span class="hljs-string">"salary"</span>).desc())
df_local = df.withColumn(<span class="hljs-string">"salary_rank"</span>, rank().over(window_local))
df_local.count()
local_time = round(time.time() - start, <span class="hljs-number">2</span>)
print(<span class="hljs-string">f'local_time:<span class="hljs-subst">{local_time}</span>'</span>)

spark.stop()
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Approach</strong></td><td><strong>Stage Count</strong></td><td><strong>Execution Time (1M rows)</strong></td><td><strong>Observation</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Global Window (no partition)</td><td>5</td><td>30.21 s</td><td>Full dataset shuffle + global sort</td></tr>
<tr>
<td>Partitioned Window (by department)</td><td>3</td><td>1.74 s</td><td>Localized sort, fewer shuffle files</td></tr>
</tbody>
</table>
</div><p>Partitioning the window reduces shuffle data volume significantly and runtime as well. The difference grows exponentially as data scales.</p>
<h4 id="heading-under-the-hood-what-spark-actually-does-1">Under the Hood: What Spark Actually Does</h4>
<p>Each Window transformation adds a physical plan node like:</p>
<p>WindowExec [rank() windowspecdefinition(...)], frame=RangeFrame</p>
<p>This node is non-pipelined – it materializes input partitions before computing window metrics. Catalyst optimizer can’t push filters or projections inside WindowExec, which means:</p>
<ul>
<li><p>If you rank before filtering, Spark computes ranks for all rows.</p>
</li>
<li><p>If you order globally, Spark must sort everything before starting.</p>
</li>
</ul>
<p>That’s why window placement in your code matters almost as much as partition keys.</p>
<h4 id="heading-common-anti-patterns">Common Anti-Patterns:</h4>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Anti-Pattern</strong></td><td><strong>Why It Hurts</strong></td><td><strong>Fix</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Missing partitionBy()</td><td>Global sort across dataset</td><td>Partition by key columns</td></tr>
<tr>
<td>Overly broad partition key</td><td>Creates too many small partitions</td><td>Use selective, not unique keys</td></tr>
<tr>
<td>Wide, unbounded window frame</td><td>Retains all rows in memory per key</td><td>Use bounded ranges (for example, rowsBetween(-3, 0))</td></tr>
<tr>
<td>Filtering after window</td><td>Computes unnecessary metrics</td><td>Filter first, then window</td></tr>
<tr>
<td>Multiple chained windows</td><td>Each triggers new sort</td><td>Combine window metrics in one spec</td></tr>
</tbody>
</table>
</div><p>Partition on selective keys to reduce shuffle volume, and avoid global windows that force full sorts and shuffles. Prefer bounded frames to keep state in memory and limit disk spill, and filter early while combining metrics to minimize unnecessary data flowing through WindowExec. Windows are powerful, but unbounded ones can silently crush performance. In Spark, partitioning isn’t optional. It’s the line between analytics and overhead.</p>
<h3 id="heading-scenario-10-incremental-aggregations-with-cache-and-persist">Scenario 10: Incremental Aggregations with Cache and Persist</h3>
<p>When multiple actions depend on the same expensive base computation, don’t recompute it every time. Materialize it once with cache() or persist(), then reuse it. Most Spark teams get this wrong in two ways:</p>
<ul>
<li><p>They never cache, so Spark recomputes long lineages (filters, joins, window ops) for every action.</p>
</li>
<li><p>They cache everything, blowing executor memory and making things worse.</p>
</li>
</ul>
<p>This scenario shows how to do it intelligently.</p>
<h4 id="heading-the-problem-recomputing-the-same-work-for-every-metric">The Problem: Recomputing the Same Work for Every Metric</h4>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> pyspark.sql.functions <span class="hljs-keyword">import</span> col, avg, max <span class="hljs-keyword">as</span> max_, count

base = (
    df.filter(col(<span class="hljs-string">"department"</span>) == <span class="hljs-string">"Engineering"</span>)
      .filter(col(<span class="hljs-string">"country"</span>) == <span class="hljs-string">"USA"</span>)
      .filter(col(<span class="hljs-string">"salary"</span>) &gt; <span class="hljs-number">70000</span>)
)

avg_salary = base.groupBy(<span class="hljs-string">"department"</span>).agg(avg(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"avg_salary"</span>))
max_salary = base.groupBy(<span class="hljs-string">"department"</span>).agg(max_(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"max_salary"</span>))
cnt_salary = base.groupBy(<span class="hljs-string">"department"</span>).agg(count(<span class="hljs-string">"*"</span>).alias(<span class="hljs-string">"cnt"</span>))

Looks totally fine at a glance. But remember: Spark <span class="hljs-keyword">is</span> lazy.
Every time you trigger an action:

avg_salary.show()
max_salary.show()
cnt_salary.show()
</code></pre>
<p>Spark walks back to the same base definition and re-runs all filters and shuffles for each metric – unless you persist.</p>
<p>So instead of 1 filtered + shuffled dataset reused 3 times, you effectively get:</p>
<ul>
<li><p>3 jobs</p>
</li>
<li><p>3 scans / filter chains</p>
</li>
<li><p>3 groupBy shuffles</p>
</li>
</ul>
<p>for the same input slice.</p>
<p><strong>Simplified Logical Plan Shape (Without Cache):</strong></p>
<pre><code class="lang-python">HashAggregate [department], [avg/max/count]

└─ Exchange hashpartitioning(department)

   └─ Filter (department = <span class="hljs-string">'Engineering'</span> AND country = <span class="hljs-string">'USA'</span> AND salary &gt; <span class="hljs-number">70000</span>)

      └─ Scan ...
</code></pre>
<p>And Spark builds this three times. Even though the filter logic is identical, each action triggers a new job with:</p>
<ul>
<li><p>new stages,</p>
</li>
<li><p>new shuffles, and</p>
</li>
<li><p>new scans.</p>
</li>
</ul>
<p>On large datasets (hundreds of GBs), this is brutal.</p>
<h4 id="heading-the-better-approach-cache-the-shared-base">The Better Approach: Cache the Shared Base</h4>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> pyspark.sql <span class="hljs-keyword">import</span> StorageLevel

base = (
    df.filter(col(<span class="hljs-string">"department"</span>) == <span class="hljs-string">"Engineering"</span>)
      .filter(col(<span class="hljs-string">"country"</span>) == <span class="hljs-string">"USA"</span>)
      .filter(col(<span class="hljs-string">"salary"</span>) &gt; <span class="hljs-number">70000</span>)
)

base = base.persist(StorageLevel.MEMORY_AND_DISK)

base.count()

avg_salary = base.groupBy(<span class="hljs-string">"department"</span>).agg(avg(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"avg_salary"</span>))
max_salary = base.groupBy(<span class="hljs-string">"department"</span>).agg(max_(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"max_salary"</span>))
cnt_salary = base.groupBy(<span class="hljs-string">"department"</span>).agg(count(<span class="hljs-string">"*"</span>).alias(<span class="hljs-string">"cnt"</span>))

avg_salary.show()
max_salary.show()
cnt_salary.show()

base.unpersist()
</code></pre>
<p>Now, the filters and initial scan run once, the results are cached, and all subsequent aggregates read from cached data instead of recomputing upstream logic.</p>
<p><strong>Logical Plan Shape (With Cache):</strong></p>
<p>Before materialization (base.count()), the plan still shows the lineage. Afterward, subsequent actions operate off the cached node.</p>
<pre><code class="lang-python">InMemoryRelation [department, salary, country, ...]

   └─ * Cached <span class="hljs-keyword">from</span>:

      Filter (department = <span class="hljs-string">'Engineering'</span> AND country = <span class="hljs-string">'USA'</span> AND salary &gt; <span class="hljs-number">70000</span>)

      └─ Scan parquet employees_large ...
</code></pre>
<p>Then:</p>
<pre><code class="lang-python">HashAggregate [department], [avg/max/count]

└─ InMemoryRelation [...]
</code></pre>
<p>One heavy pipeline, many cheap reads. The DAG becomes flatter:</p>
<ul>
<li><p>Expensive scan &amp; filter &amp; shuffle: once.</p>
</li>
<li><p>Cheap aggregations: N times from memory/disk.</p>
</li>
</ul>
<h4 id="heading-real-world-benchmark-aws-glue-7">Real-World Benchmark: AWS Glue</h4>
<pre><code class="lang-python">df = spark.createDataFrame(multiplied_data,
[<span class="hljs-string">"id"</span>, <span class="hljs-string">"firstname"</span>, <span class="hljs-string">"lastname"</span>, <span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>, <span class="hljs-string">"age"</span>,
<span class="hljs-string">"hire_date"</span>, <span class="hljs-string">"country"</span>])

base = (
    df.filter(col(<span class="hljs-string">"department"</span>) == <span class="hljs-string">"Engineering"</span>)
      .filter(col(<span class="hljs-string">"country"</span>) == <span class="hljs-string">"USA"</span>)
      .filter(col(<span class="hljs-string">"salary"</span>) &gt; <span class="hljs-number">85000</span>)
)


start = time.time()

avg_salary = base.groupBy(<span class="hljs-string">"department"</span>).agg(avg(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"avg_salary"</span>))
max_salary = base.groupBy(<span class="hljs-string">"department"</span>).agg(max_(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"max_salary"</span>))
cnt = base.groupBy(<span class="hljs-string">"department"</span>).agg(count(<span class="hljs-string">"*"</span>).alias(<span class="hljs-string">"emp_count"</span>))

print(<span class="hljs-string">"---- Without Cache ----"</span>)
avg_salary.show()
max_salary.show()
cnt.show()

no_cache_time = round(time.time() - start, <span class="hljs-number">2</span>)
print(<span class="hljs-string">f"Total time without cache: <span class="hljs-subst">{no_cache_time}</span> seconds"</span>)


<span class="hljs-keyword">from</span> pyspark.sql <span class="hljs-keyword">import</span> DataFrame

base_cached = base.persist(StorageLevel.MEMORY_AND_DISK)
base_cached.count()  <span class="hljs-comment"># materialize cache</span>

start = time.time()

avg_salary_c = base_cached.groupBy(<span class="hljs-string">"department"</span>).agg(avg(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"avg_salary"</span>))
max_salary_c = base_cached.groupBy(<span class="hljs-string">"department"</span>).agg(max_(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"max_salary"</span>))
cnt_c = base_cached.groupBy(<span class="hljs-string">"department"</span>).agg(count(<span class="hljs-string">"*"</span>).alias(<span class="hljs-string">"emp_count"</span>))

print(<span class="hljs-string">"---- With Cache ----"</span>)
avg_salary_c.show()
max_salary_c.show()
cnt_c.show()

cache_time = round(time.time() - start, <span class="hljs-number">2</span>)
print(<span class="hljs-string">f"Total time with cache: <span class="hljs-subst">{cache_time}</span> seconds"</span>)

<span class="hljs-comment"># Cleanup</span>
base_cached.unpersist()

print(<span class="hljs-string">"\n==== Summary ===="</span>)
print(<span class="hljs-string">f"Without cache: <span class="hljs-subst">{no_cache_time}</span>s | With cache: <span class="hljs-subst">{cache_time}</span>s"</span>)
print(<span class="hljs-string">"================="</span>)

spark.stop()
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Approach</strong></td><td><strong>Execution Time (1M rows)</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Without Cache</td><td>30.75 s</td></tr>
<tr>
<td>With Cache</td><td>3.34 s</td></tr>
</tbody>
</table>
</div><h4 id="heading-under-the-hood-why-this-works"><strong>Under the Hood: Why This Works</strong></h4>
<p>Using cache() or persist() in Spark inserts an InMemoryRelation / InMemoryTableScanExec node so that expensive intermediate results are stored in executor memory (or memory+disk). This allows future jobs to reuse cached blocks instead of re-scanning sources or re-computing shuffles. This shortens downstream logical plans, reduces repeated shuffles, and lowers load on systems like S3, HDFS, or JDBC.</p>
<p>Without caching, every action replays the full lineage and Spark recomputes the data unless another operator or AQE optimization has already materialized part of it. But caching should not become “cache everything”. Rather, you should avoid caching very large DataFrames used only once, wide raw inputs instead of filtered/aggregated subsets, or long-lived caches that are never unpersisted.</p>
<p>A good rule of thumb is to cache only when the DataFrame is expensive to recompute (joins, filters, windows, UDFs), is used at least twice, and is reasonably sized after filtering so it can fit in memory or work with MEMORY_AND_DISK. Otherwise, allow Spark to recompute.</p>
<p>Conceptually, caching converts a tall, repetitive DAG such as repeated “HashAggregate → Exchange → Filter → Scan” sequences into a hub-and-spoke design where one heavy cached hub feeds multiple lightweight downstream aggregates.</p>
<p>When multiple actions depend on the same expensive computation, cache or persist the shared base to flatten the DAG, eliminate repeated scans and shuffles, and improve end-to-end performance. All this while being intentional by caching only when reuse is real, the data size is safe, and always calling <code>unpersist()</code> when done.</p>
<p>Don’t make Spark re-solve the same puzzle three times. Let it solve it once, remember the answer, and move on.</p>
<h3 id="heading-scenario-11-reduce-shuffles">Scenario 11: Reduce Shuffles</h3>
<p>Shuffles are Spark’s invisible tax collectors. Every time your data crosses executors, you pay in CPU, disk I/O, and network bandwidth.</p>
<p>Two of the most common yet misunderstood transformations that trigger or avoid shuffles are coalesce() and repartition(). Both change partition counts, but they do it in fundamentally different ways.</p>
<h4 id="heading-the-problem"><strong>The Problem</strong></h4>
<p>Writing <code>df_result = df.repartition(10)</code> and thinking “I’m just changing partitions so Spark won’t move data unnecessarily.” But that assumption is wrong. <code>repartition()</code> always performs a full shuffle, even when:</p>
<ul>
<li><p>You are reducing partitions (from 200 → 10), or</p>
</li>
<li><p>You are increasing partitions (from 10 → 200).</p>
</li>
</ul>
<p>In both cases, Spark redistributes every row across the cluster according to a new hash partitioning scheme. So even if your data is already partitioned optimally, repartition() will still reshuffle it, adding a stage boundary.</p>
<p><strong>Logical Plan:</strong></p>
<pre><code class="lang-python">Exchange hashpartitioning(...)

└─ LogicalRDD [...]
</code></pre>
<p>That Exchange node signals a wide dependency: Spark spills intermediate data to disk, transfers it over the network, and reloads it before the next stage. In short: repartition() = "new shuffle, no matter what."</p>
<h4 id="heading-the-better-approach-coalesce">The Better Approach: coalesce()</h4>
<p>If your goal is to reduce the number of partitions, for example, before writing results to S3 or Snowflake – use coalesce() instead.</p>
<p><code>df_result = df.coalesce(10)</code></p>
<p>coalesce() merges existing partitions locally within each executor, avoiding the costly reshuffle step. It uses a narrow dependency, meaning each output partition depends on one or more existing partitions <em>from the same node</em>.</p>
<p>Coalesce</p>
<p>└─ LogicalRDD [...]</p>
<ul>
<li><p>No Exchange.</p>
</li>
<li><p>No network shuffle.</p>
</li>
<li><p>Just local merges – fast and cheap.</p>
</li>
</ul>
<h4 id="heading-real-world-benchmark-aws-glue-8">Real-World Benchmark: AWS Glue</h4>
<pre><code class="lang-python">df = spark.createDataFrame(multiplied_data,
[<span class="hljs-string">"id"</span>, <span class="hljs-string">"firstname"</span>, <span class="hljs-string">"lastname"</span>, <span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>, <span class="hljs-string">"age"</span>, <span class="hljs-string">"hire_date"</span>, <span class="hljs-string">"country"</span>])

start = time.time()
df_repart = df.repartition(<span class="hljs-number">10</span>)
df_repart.count()
print(<span class="hljs-string">"Repartition time:"</span>, round(time.time() - start, <span class="hljs-number">2</span>), <span class="hljs-string">"sec"</span>)

start = time.time()
df_coalesced = df.coalesce(<span class="hljs-number">10</span>)
df_coalesced.count()
print(<span class="hljs-string">"Coalesce time:"</span>, round(time.time() - start, <span class="hljs-number">2</span>), <span class="hljs-string">"sec"</span>)

spark.stop()
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Operation</strong></td><td><strong>Plan Node</strong></td><td><strong>Shuffle Triggered</strong></td><td><strong>Glue Runtime</strong></td><td><strong>Observation</strong></td></tr>
</thead>
<tbody>
<tr>
<td>repartition(10)</td><td>Exchange</td><td>Yes</td><td>18.2 s</td><td>Full cluster reshuffle</td></tr>
<tr>
<td>coalesce(10)</td><td>Coalesce</td><td>No</td><td>1.99 s</td><td>Local partition merge only</td></tr>
</tbody>
</table>
</div><p>Even though both ended with 10 partitions, repartition() took significantly longer all because of the unnecessary shuffle.</p>
<h4 id="heading-why-this-matters">Why This Matters</h4>
<p>Each Exchange node in your logical plan creates a new stage in your DAG, meaning:</p>
<ul>
<li><p>Extra disk I/O</p>
</li>
<li><p>Extra serialization</p>
</li>
<li><p>Extra network transfer</p>
</li>
</ul>
<p>That’s why avoiding just one shuffle in a Glue ETL pipeline can save seconds to minutes per run, especially on wide datasets.</p>
<p><strong>When to use which:</strong></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Goal</strong></td><td><strong>Transformation</strong></td><td><strong>Reasoning</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Increase parallelism for heavy groupBy or join</td><td>repartition()</td><td>Distributes data evenly across executors</td></tr>
<tr>
<td>Reduce file count before writing</td><td>coalesce()</td><td>Avoids shuffle, merges partitions locally</td></tr>
<tr>
<td>Rebalance skewed data before a join</td><td>repartition(by="key")</td><td>Enables better key distribution</td></tr>
<tr>
<td>Optimize output after aggregation</td><td>coalesce()</td><td>Prevents too many small output files</td></tr>
</tbody>
</table>
</div><h4 id="heading-aqe-and-auto-coalescing">AQE and Auto Coalescing</h4>
<p>You can enable Adaptive Query Execution (AQE) in AWS Glue 3.0+ to let Spark merge small shuffle partitions automatically:</p>
<p><code>spark.conf.set("spark.sql.adaptive.enabled", "true")</code></p>
<p><code>spark.conf.set("spark.sql.adaptive.coalescePartitions.enabled", "true")</code></p>
<p>With AQE, Spark dynamically combines small partitions <em>after</em> shuffle to balance performance and I/O.</p>
<p>repartition() always triggers a shuffle, while coalesce() avoids shuffles and is ideal for local merges before writes. You should always inspect Exchange nodes to identify shuffle points. Note that in AWS Glue, avoiding even one shuffle can yield ~7× runtime improvement at the 1M-row scale. Finally, use AQE to enable dynamic partition coalescing in larger workflows.</p>
<h3 id="heading-scenario-12-know-your-shuffle-triggers">Scenario 12: Know Your Shuffle Triggers</h3>
<p>Much of Spark's performance comes from invisible data movement. Every shuffle boundary adds a new stage, a new write–read cycle, and sometimes minutes of extra execution time.</p>
<p>In Spark, any operation that requires rearranging data between partitions introduces a wide dependency, represented in the logical plan as an Exchange node.</p>
<p>Common shuffle triggers:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Operation</strong></td><td><strong>Why It Shuffles</strong></td><td><strong>Plan Node</strong></td></tr>
</thead>
<tbody>
<tr>
<td>join()</td><td>Records with the same key must be co-located for matching</td><td>Exchange (on join keys)</td></tr>
<tr>
<td>groupBy() / agg()</td><td>Keys must gather to a single partition for aggregation</td><td>Exchange</td></tr>
<tr>
<td>distinct()</td><td>Spark must compare all values across partitions</td><td>Exchange</td></tr>
<tr>
<td>orderBy()</td><td>Requires global ordering of data</td><td>Exchange</td></tr>
<tr>
<td>repartition()</td><td>Explicit reshuffle for partition balancing</td><td>Exchange</td></tr>
</tbody>
</table>
</div><p>Each Exchange means a shuffle stage: Spark writes partition data to disk, transfers it over the network, and reads it back into memory on the next stage. That’s your hidden performance cliff.</p>
<pre><code class="lang-python">df_result = (
    df.groupBy(<span class="hljs-string">"department"</span>)
      .agg(sum(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"total_salary"</span>))
      .join(df.select(<span class="hljs-string">"department"</span>, <span class="hljs-string">"country"</span>)
            .distinct(), <span class="hljs-string">"department"</span>)
      .orderBy(<span class="hljs-string">"total_salary"</span>, ascending=<span class="hljs-literal">False</span>)
)

df_result.explain(<span class="hljs-string">"formatted"</span>)
</code></pre>
<p><strong>Logical Plan Simplified:</strong></p>
<pre><code class="lang-python">Sort [total_salary DESC]

└─ Exchange (<span class="hljs-keyword">global</span> sort)

   └─ SortMergeJoin [department]

      ├─ Exchange (groupBy shuffle)

      │   └─ HashAggregate (sum salary)

      └─ Exchange (distinct shuffle)

          └─ Aggregate (department, country)
</code></pre>
<p>We can see three Exchange nodes, one for the aggregation, one for the distinct join, and one for the global sort. That’s three separate shuffles, three full dataset transfers.</p>
<h4 id="heading-better-approach">Better Approach</h4>
<p>Whenever possible, combine wide transformations into a single stage before an action. For instance, you can compute aggregates and join results in one consistent shuffle domain:</p>
<pre><code class="lang-python">agg_df = df.groupBy(<span class="hljs-string">"department"</span>) \
    .agg(sum(<span class="hljs-string">"salary"</span>) \
    .alias(<span class="hljs-string">"total_salary"</span>))

country_df = df.select(<span class="hljs-string">"department"</span>, <span class="hljs-string">"country"</span>).distinct()

df_result = (
    agg_df.join(country_df, <span class="hljs-string">"department"</span>)
          .sortWithinPartitions(<span class="hljs-string">"total_salary"</span>, ascending=<span class="hljs-literal">False</span>)
)
</code></pre>
<p><strong>Logical Plan Simplified:</strong></p>
<pre><code class="lang-python">SortWithinPartitions [total_salary DESC]

└─ SortMergeJoin [department]

   ├─ Exchange (shared shuffle <span class="hljs-keyword">for</span> join)

   └─ Exchange (shared shuffle <span class="hljs-keyword">for</span> distinct)
</code></pre>
<p>Now Spark reuses shuffle partitions across compatible operations – only one shuffle boundary remains. The rest execute as narrow transformations.</p>
<h4 id="heading-real-world-benchmark-aws-glue-1m">Real-World Benchmark: AWS Glue (1M)</h4>
<pre><code class="lang-python">df = spark.createDataFrame(multiplied_data,
[<span class="hljs-string">"id"</span>, <span class="hljs-string">"firstname"</span>, <span class="hljs-string">"lastname"</span>, <span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>, <span class="hljs-string">"age"</span>, <span class="hljs-string">"hire_date"</span>, <span class="hljs-string">"country"</span>]).repartition(<span class="hljs-number">20</span>)

<span class="hljs-keyword">from</span> pyspark.sql.functions <span class="hljs-keyword">import</span> sum <span class="hljs-keyword">as</span> sum_

start = time.time()

dept_salary = (
    df.groupBy(<span class="hljs-string">"department"</span>)
      .agg(sum_(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"total_salary"</span>))
)

dept_country = (
    df.select(<span class="hljs-string">"department"</span>, <span class="hljs-string">"country"</span>)
      .distinct()
)

naive_result = (
    dept_salary.join(dept_country, <span class="hljs-string">"department"</span>, <span class="hljs-string">"inner"</span>)
               .orderBy(col(<span class="hljs-string">"total_salary"</span>).desc())
)

naive_count = naive_result.count()
naive_time = round(time.time() - start, <span class="hljs-number">2</span>)


start = time.time()

dept_country_once = (
    df.select(<span class="hljs-string">"department"</span>, <span class="hljs-string">"country"</span>)
      .distinct()
)

optimized = (
    df.groupBy(<span class="hljs-string">"department"</span>)
      .agg(sum_(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"total_salary"</span>))
      .join(dept_country_once, <span class="hljs-string">"department"</span>, <span class="hljs-string">"inner"</span>)
      .sortWithinPartitions(col(<span class="hljs-string">"total_salary"</span>).desc())
      <span class="hljs-comment"># local ordering, avoids extra global shuffle</span>
)

opt_count = optimized.count()
opt_time = round(time.time() - start, <span class="hljs-number">2</span>)

print(<span class="hljs-string">"Optimized result count:"</span>, opt_count)
print(<span class="hljs-string">"Optimized pipeline time:"</span>, opt_time, <span class="hljs-string">"sec"</span>)

print(<span class="hljs-string">"\nOptimized plan:"</span>)
optimized.explain(<span class="hljs-string">"formatted"</span>)

spark.stop()
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Pipeline</strong></td><td><strong># of Shuffles</strong></td><td><strong>Glue Runtime (sec)</strong></td><td><strong>Observation</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Naive: groupBy + distinct + orderBy</td><td>3</td><td>28.99 s</td><td>Multiple wide stages</td></tr>
<tr>
<td>Optimized: combined agg + join + sortWithinPartitions</td><td>1</td><td>3.52 s</td><td>Single wide stage</td></tr>
</tbody>
</table>
</div><p>By merging compatible stages and using sortWithinPartitions() instead of global orderBy(), the job ran significantly faster on the same dataset, with fewer Exchange nodes and shorter lineage. Run df.explain and search for Exchange. Each one signals a full shuffle. You can also check Spark UI → SQL tab → Exchange Read/Write Size to see exactly how much data moved.</p>
<p>Every Exchange represents a shuffle, adding serialization, network I/O, and stage overhead, so avoid chaining wide operations back-to-back by combining them under a consistent partition key. Prefer sortWithinPartitions() over global orderBy() when ordering is local, monitor plan depth to catch consecutive wide dependencies, and note that in AWS Glue eliminating even one shuffle in a 1M-row job can significantly reduce runtime.</p>
<h3 id="heading-scenario-13-tune-parallelism-shuffle-partitions-amp-aqe">Scenario 13: Tune Parallelism: Shuffle Partitions &amp; AQE</h3>
<p>Most Spark jobs are either over-parallelized (thousands of tiny tasks doing almost nothing, flooding the driver and filesystem) or under-parallelized (a handful of huge tasks doing all the work, causing slow stages and skew-like behavior). Both waste resources. We can control this behavior using spark.sql.shuffle.partitions and Adaptive Query Execution (AQE).</p>
<p>By default (in many environments), the default value <code>spark.conf.get("spark.sql.shuffle.partitions")</code> is 200, meaning that every shuffle produces approximately 200 shuffle partitions, regardless of data size. That means every shuffle (groupBy, join, distinct, and so on) creates ~200 shuffle partitions. Whether this default is reasonable depends entirely on the workload:</p>
<ul>
<li><p>If you’re processing 2 GB, 200 partitions might be great.</p>
</li>
<li><p>If you’re processing 5 MB, 200 partitions is comedy – 200 tiny tasks, overhead &gt; work.</p>
</li>
<li><p>If you’re processing 2 TB, 200 partitions might be too few – tasks become huge and slow.</p>
</li>
</ul>
<h4 id="heading-example-a-the-default-plan-too-many-tiny-tasks">Example A: The Default Plan (Too Many Tiny Tasks)</h4>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> pyspark.sql <span class="hljs-keyword">import</span> SparkSession
<span class="hljs-keyword">from</span> pyspark.sql.functions <span class="hljs-keyword">import</span> sum <span class="hljs-keyword">as</span> sum_

spark = SparkSession.builder.appName(<span class="hljs-string">"ParallelismExample"</span>).getOrCreate()

spark.conf.get(<span class="hljs-string">"spark.sql.shuffle.partitions"</span>)  <span class="hljs-comment"># '200'</span>

data = [
    (<span class="hljs-number">1</span>, <span class="hljs-string">"John"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">90000</span>),
    (<span class="hljs-number">2</span>, <span class="hljs-string">"Alice"</span>, <span class="hljs-string">"Engineering"</span>, <span class="hljs-number">85000</span>),
    (<span class="hljs-number">3</span>, <span class="hljs-string">"Bob"</span>, <span class="hljs-string">"Sales"</span>, <span class="hljs-number">75000</span>),
    (<span class="hljs-number">4</span>, <span class="hljs-string">"Eve"</span>, <span class="hljs-string">"Sales"</span>, <span class="hljs-number">72000</span>),
    (<span class="hljs-number">5</span>, <span class="hljs-string">"Grace"</span>, <span class="hljs-string">"HR"</span>, <span class="hljs-number">65000</span>),
]

df = spark.createDataFrame(data, [<span class="hljs-string">"id"</span>, <span class="hljs-string">"name"</span>, <span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>])

agg_df = df.groupBy(<span class="hljs-string">"department"</span>).agg(sum_(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"total_salary"</span>))
agg_df.explain(<span class="hljs-string">"formatted"</span>)
</code></pre>
<p>Even though there are only 3 departments, Spark will still create 200 shuffle partitions – meaning 200 tasks for 3 groups of data.</p>
<p><strong>Effect:</strong> Each task has almost nothing to do. Spark spends more time planning and scheduling than actually computing.</p>
<h4 id="heading-example-b-tuned-plan-balanced-parallelism">Example B: Tuned Plan (Balanced Parallelism)</h4>
<pre><code class="lang-python">spark.conf.set(<span class="hljs-string">"spark.sql.shuffle.partitions"</span>, <span class="hljs-string">"8"</span>)
agg_df = df.groupBy(<span class="hljs-string">"department"</span>).agg(sum_(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"total_salary"</span>))
agg_df.explain(<span class="hljs-string">"formatted"</span>)
</code></pre>
<p>Now Spark launches only <strong>8 partitions</strong> still parallelized, but not wasteful. Even in this small example, you can visually feel the difference: one logical change, but a completely leaner physical plan.</p>
<h4 id="heading-the-real-problem-static-tuning-doesnt-scale">The Real Problem: Static Tuning Doesn’t Scale</h4>
<p>In production, job sizes vary:</p>
<ul>
<li><p>Today: 10 GB</p>
</li>
<li><p>Tomorrow: 500 GB</p>
</li>
<li><p>Next week: 200 MB (sampling run)</p>
</li>
</ul>
<p>Manually changing shuffle partitions for each run is neither practical nor reliable. That’s where Adaptive Query Execution (AQE) steps in.</p>
<h4 id="heading-adaptive-query-execution-aqe-smarter-dynamic-parallelism">Adaptive Query Execution (AQE): Smarter, Dynamic Parallelism</h4>
<p>AQE doesn’t guess. It measures actual shuffle statistics at runtime and rewrites the plan <em>while the job is running.</em></p>
<pre><code class="lang-python">spark.conf.set(<span class="hljs-string">"spark.sql.adaptive.enabled"</span>, <span class="hljs-string">"true"</span>)
spark.conf.set(<span class="hljs-string">"spark.sql.adaptive.coalescePartitions.enabled"</span>, <span class="hljs-string">"true"</span>)
spark.conf.set(<span class="hljs-string">"spark.sql.adaptive.coalescePartitions.minPartitionSize"</span>, <span class="hljs-string">"64m"</span>)
spark.conf.set(<span class="hljs-string">"spark.sql.adaptive.coalescePartitions.maxPartitionSize"</span>, <span class="hljs-string">"256m"</span>)
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Configuration</strong></td><td><strong>Shuffle Partitions</strong></td><td><strong>Task Distribution</strong></td><td><strong>Observation</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Default</td><td>200</td><td>200 tasks / 3 groups</td><td>Too granular, mostly idle</td></tr>
<tr>
<td>Tuned</td><td>8</td><td>8 tasks / 3 groups</td><td>Balanced execution</td></tr>
</tbody>
</table>
</div><p>AQE merges tiny shuffle partitions, or splits huge ones, based on <strong>real-time data metrics</strong>, not pre-set assumptions.</p>
<pre><code class="lang-python">df = spark.createDataFrame(multiplied_data,
    [<span class="hljs-string">"id"</span>, <span class="hljs-string">"firstname"</span>, <span class="hljs-string">"lastname"</span>, <span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>, <span class="hljs-string">"age"</span>,
     <span class="hljs-string">"hire_date"</span>, <span class="hljs-string">"country"</span>])

start = time.time()
agg_df = df.groupBy(<span class="hljs-string">"department"</span>).agg(sum_(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"total_salary"</span>))
agg_df.count()

print(<span class="hljs-string">f'Num Partitions df: <span class="hljs-subst">{df.rdd.getNumPartitions()}</span>'</span>)
print(<span class="hljs-string">f'Num Partitions aggdf: <span class="hljs-subst">{agg_df.rdd.getNumPartitions()}</span>'</span>)
print(<span class="hljs-string">"Execution time:"</span>, round(time.time() - start, <span class="hljs-number">2</span>), <span class="hljs-string">"sec"</span>)

spark.stop()
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Stage</strong></td><td><strong>Without AQE</strong></td><td><strong>With AQE</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Stage 3 (Aggregation)</td><td>200 shuffle partitions, each reading KBs</td><td>8–12 coalesced partitions</td></tr>
<tr>
<td>Stage 4 (Join Output)</td><td>200 shuffle files</td><td>Merged into balanced partitions</td></tr>
<tr>
<td><strong>Result</strong></td><td>Many small tasks, high overhead</td><td>Fewer, balanced tasks, faster runtime</td></tr>
</tbody>
</table>
</div><h4 id="heading-understanding-the-plan"><strong>Understanding the Plan</strong></h4>
<p>Before AQE (static):</p>
<p><code>Exchange hashpartitioning(department, 200)</code></p>
<p>With AQE: AdaptiveSparkPlan (coalesced)</p>
<p><code>HashAggregate(keys=[department], functions=[sum(salary)])</code></p>
<p><code>Exchange hashpartitioning(department, 200)</code>  <em># runtime coalesced to 12</em></p>
<p>The logical plan remains the same, but the physical execution plan is rewritten during runtime. Spark intelligently reduces or merges shuffle partitions based on data volume.</p>
<p>Spark’s default 200 shuffle partitions often misfit real workloads. Static tuning may work for predictable pipelines, but fails with variable data. On the other hand, AQE uses shuffle statistics to dynamically coalesce partitions at runtime, use it with sensible ceilings (for example, 400 partitions) and always verify in the Spark UI to catch over-partitioning (many tasks reading KBs) or under-partitioning (few tasks reading GBs).</p>
<h3 id="heading-scenario-14-handle-skew-smartly">Scenario 14: Handle Skew Smartly</h3>
<p>In an ideal Spark world, all partitions contain roughly equal amounts of data. But real datasets are rarely that kind. If one key (say "USA", "2024", or "customer_123") holds millions of rows while others have only a few, Spark ends up with one or two massive partitions. Those partitions take disproportionately longer to process, leaving other executors idle. That’s data skew: the silent killer of parallelism.</p>
<p>You’ll often spot it in Spark UI:</p>
<ul>
<li><p>198 tasks finish quickly.</p>
</li>
<li><p>2 tasks take 10× longer.</p>
</li>
<li><p>Stage stays stuck at 98% for minutes.</p>
</li>
</ul>
<h4 id="heading-example-a-the-skew-problem">Example A: The Skew Problem</h4>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> pyspark.sql <span class="hljs-keyword">import</span> SparkSession, functions <span class="hljs-keyword">as</span> F

spark = SparkSession.builder.appName(<span class="hljs-string">"DataSkewDemo"</span>).getOrCreate()

<span class="hljs-comment"># Create skewed dataset</span>
df = spark.range(<span class="hljs-number">0</span>, <span class="hljs-number">10000</span>).toDF(<span class="hljs-string">"id"</span>) \
    .withColumn(<span class="hljs-string">"department"</span>,
        F.when(F.col(<span class="hljs-string">"id"</span>) &lt; <span class="hljs-number">8000</span>, <span class="hljs-string">"Engineering"</span>)  <span class="hljs-comment"># 80% of data</span>
         .when(F.col(<span class="hljs-string">"id"</span>) &lt; <span class="hljs-number">9000</span>, <span class="hljs-string">"Sales"</span>)
         .otherwise(<span class="hljs-string">"HR"</span>)) \
    .withColumn(<span class="hljs-string">"salary"</span>, (F.rand() * <span class="hljs-number">100000</span>).cast(<span class="hljs-string">"int"</span>))

df.groupBy(<span class="hljs-string">"department"</span>).count().show()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769464257950/6963171b-92de-4721-9bb3-6951c68a2775.png" alt="6963171b-92de-4721-9bb3-6951c68a2775" class="image--center mx-auto" width="594" height="446" loading="lazy"></p>
<p>Spark will hash “Engineering” into just one reducer partition, making it heavier than others. That single task becomes a bottleneck, the shuffle has technically completed, but the stage waits for that one lagging task.</p>
<h4 id="heading-example-b-the-solution-salting-hot-keys">Example B: The Solution: Salting Hot Keys</h4>
<p>To handle skew, we the hot key (Engineering) into multiple pseudo-keys using a random salt. This redistributes that large partition across multiple reducers.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> pyspark.sql.functions <span class="hljs-keyword">import</span> rand, concat, lit, floor

salt_buckets = <span class="hljs-number">10</span>

df_salted = (
    df.withColumn(
        <span class="hljs-string">"department_salted"</span>,
        F.when(F.col(<span class="hljs-string">"department"</span>) == <span class="hljs-string">"Engineering"</span>,
            F.concat(F.col(<span class="hljs-string">"department"</span>), lit(<span class="hljs-string">"_"</span>),
                     (F.floor(rand() * salt_buckets))))
         .otherwise(F.col(<span class="hljs-string">"department"</span>))
    )
)

df_salted.groupBy(<span class="hljs-string">"department_salted"</span>).agg(F.avg(<span class="hljs-string">"salary"</span>))
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769464242395/c4ec0bc6-67bf-488c-b619-7130ceef878e.png" alt="c4ec0bc6-67bf-488c-b619-7130ceef878e" class="image--center mx-auto" width="536" height="468" loading="lazy"></p>
<p>Now “Engineering” isn’t one hot key – it’s <strong>10 smaller keys</strong> like Engineering_0, Engineering_1, ..., Engineering_9. Each one goes to a separate reducer partition, enabling parallel processing.</p>
<h4 id="heading-example-c-post-aggregation-desalting">Example C: Post-Aggregation Desalting</h4>
<p>After aggregating, recombine salted keys to get the original department names:</p>
<pre><code class="lang-python">df_final = (
    df_salted.groupBy(<span class="hljs-string">"department_salted"</span>)
        .agg(F.avg(<span class="hljs-string">"salary"</span>).alias(<span class="hljs-string">"avg_salary"</span>))
        .withColumn(<span class="hljs-string">"department"</span>, F.split(F.col(<span class="hljs-string">"department_salted"</span>), <span class="hljs-string">"_"</span>)
            .getItem(<span class="hljs-number">0</span>))
        .groupBy(<span class="hljs-string">"department"</span>)
        .agg(F.avg(<span class="hljs-string">"avg_salary"</span>).alias(<span class="hljs-string">"final_avg_salary"</span>))
)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769464321049/6349c2c3-a0e3-4f9e-be3e-c59639004128.png" alt="6349c2c3-a0e3-4f9e-be3e-c59639004128" class="image--center mx-auto" width="540" height="242" loading="lazy"></p>
<h4 id="heading-when-to-use-salting">When to Use Salting</h4>
<p>Use salting when:</p>
<ul>
<li><p>You observe stage skew (one or few long tasks).</p>
</li>
<li><p>Shuffle read sizes vary drastically between tasks.</p>
</li>
<li><p>The skew originates from a few dominant key values.</p>
</li>
</ul>
<p>Avoid it when:</p>
<ul>
<li><p>The dataset is small (&lt; 1 GB).</p>
</li>
<li><p>You already use partitioning or bucketing keys with uniform distribution.</p>
</li>
</ul>
<p><strong>Alternative approaches:</strong></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Technique</strong></td><td><strong>Use Case</strong></td><td><strong>Pros</strong></td><td><strong>Cons</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Salting (manual)</td><td>Skewed joins/aggregations</td><td>Full control</td><td>Requires extra logic to merge</td></tr>
<tr>
<td>Skew join hints (/*+ SKEWJOIN */)</td><td>Supported joins in Spark 3+</td><td>No extra columns needed</td><td>Works only on joins</td></tr>
<tr>
<td>Broadcast smaller side</td><td>One table ≪ other</td><td>Avoids shuffle on big side</td><td>Limited by broadcast size</td></tr>
<tr>
<td>AQE skew optimization</td><td>Spark 3.0+</td><td>Automatic handling</td><td>Needs AQE enabled</td></tr>
</tbody>
</table>
</div><h4 id="heading-glue-specific-tip">Glue-Specific Tip</h4>
<p>AWS Glue 3.0+ includes Spark 3.x, meaning you can also enable AQE’s built-in skew optimization:</p>
<p><code>spark.conf.set("spark.sql.adaptive.skewJoin.enabled", "true")</code></p>
<p><code>spark.conf.set("spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes", "128m")</code></p>
<p>Spark will automatically detect large shuffle partitions and split them, effectively auto-salting hot keys at runtime. Data skew causes uneven shuffle sizes across tasks and can be detected in the Spark UI or via shuffle read/write metrics. Mitigate heavy-key skew with manual salting (recombined later) or rely on AQE skew join optimization for mild cases, and always validate improvements in the Spark UI SQL tab by checking “Shuffle Read Size.”</p>
<h3 id="heading-scenario-15-sort-efficiently-orderby-vs-sortwithinpartitions">Scenario 15: Sort Efficiently (orderBy vs sortWithinPartitions)</h3>
<p>Most Spark jobs need sorted data at some point – for window functions, for writing ordered files, or for downstream processing. The instinct is to reach for orderBy(). But those instincts cost you a full shuffle every single time.</p>
<h4 id="heading-the-problem-global-sort-when-you-dont-need-it">The Problem: Global Sort When You Don't Need It</h4>
<p>Let's say you want to write employee data partitioned by department, sorted by salary within each department:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> pyspark.sql.functions <span class="hljs-keyword">import</span> col

<span class="hljs-comment"># Naive approach: global sort</span>
df_sorted = df.orderBy(col(<span class="hljs-string">"department"</span>), col(<span class="hljs-string">"salary"</span>).desc())

df_sorted.write.partitionBy(<span class="hljs-string">"department"</span>).parquet(<span class="hljs-string">"s3://output/employees/"</span>)
</code></pre>
<p>This looks reasonable. You're sorting by department and salary, then writing partitioned files. Clean and simple. But here's what Spark actually does:</p>
<p><strong>Simplified Logical Plan:</strong></p>
<pre><code class="lang-python">Sort [department ASC, salary DESC], true

└─ Exchange rangepartitioning(department ASC, salary DESC, <span class="hljs-number">200</span>)

   └─ LogicalRDD [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>That Exchange <code>rangepartitioning</code> is a full shuffle. So Spark:</p>
<ul>
<li><p>Samples the data to determine range boundaries</p>
</li>
<li><p>Redistributes every row across 200 partitions based on sort keys</p>
</li>
<li><p>Sorts each partition locally</p>
</li>
<li><p>Produces globally ordered output</p>
</li>
</ul>
<p>You just shuffled 1 million rows across the cluster to achieve global ordering – even though you're immediately partitioning by department on write, which destroys that global order anyway.</p>
<h4 id="heading-why-this-hurts">Why This Hurts</h4>
<p>Range partitioning for global sort is one of the most expensive shuffles Spark performs:</p>
<ul>
<li><p>Sampling overhead: Spark must scan data twice (once to sample, once to process)</p>
</li>
<li><p>Network transfer: Every row moves to a new executor based on range boundaries</p>
</li>
<li><p>Disk I/O: Shuffle files written and read from disk</p>
</li>
<li><p>Wasted work: Global ordering across departments is meaningless when you partition by department</p>
</li>
</ul>
<p>For 1M rows, this adds 8-12 seconds of pure shuffle overhead.</p>
<h4 id="heading-the-better-approach-sort-locally-within-partitions">The Better Approach: Sort Locally Within Partitions</h4>
<p>If you only need ordering <em>within</em> each department (or within each output partition), use sortWithinPartitions():</p>
<pre><code class="lang-python"><span class="hljs-comment"># Optimized approach: local sort only</span>
df_sorted = df.sortWithinPartitions(col(<span class="hljs-string">"department"</span>), col(<span class="hljs-string">"salary"</span>).desc())
df_sorted.write.partitionBy(<span class="hljs-string">"department"</span>).parquet(<span class="hljs-string">"s3://output/employees/"</span>)
</code></pre>
<p><strong>Simplified Logical Plan:</strong></p>
<pre><code class="lang-python">Sort [department ASC, salary DESC], false

└─ LogicalRDD [id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<ul>
<li><p>No Exchange.</p>
</li>
<li><p>No shuffle.</p>
</li>
<li><p>Just local sorting within existing partitions.</p>
</li>
</ul>
<p>Spark sorts each partition in-place, without moving data across the network. The false flag in the Sort node indicates this is a local sort, not a global one.</p>
<h4 id="heading-real-world-benchmark-aws-glue-9">Real-World Benchmark: AWS Glue</h4>
<p>Let's measure the difference on 1 million employee records: First, will start with Global Sort with orderBy:</p>
<pre><code class="lang-python">print(<span class="hljs-string">"\n--- Testing orderBy() (global sort) ---"</span>)

start = time.time()

df_global = df.orderBy(col(<span class="hljs-string">"department"</span>), col(<span class="hljs-string">"salary"</span>).desc())
df_global.write.mode(<span class="hljs-string">"overwrite"</span>).parquet(<span class="hljs-string">"/tmp/global_sort_output"</span>)

global_time = round(time.time() - start, <span class="hljs-number">2</span>)
print(<span class="hljs-string">f"orderBy() time: <span class="hljs-subst">{global_time}</span>s"</span>)
</code></pre>
<p>Local Sort:</p>
<pre><code class="lang-python">print(<span class="hljs-string">"\n--- Testing sortWithinPartitions() (local sort) ---"</span>)

start = time.time()

df_local = df.sortWithinPartitions(col(<span class="hljs-string">"department"</span>), col(<span class="hljs-string">"salary"</span>).desc())
df_local.write.mode(<span class="hljs-string">"overwrite"</span>).parquet(<span class="hljs-string">"/tmp/local_sort_output"</span>)

local_time = round(time.time() - start, <span class="hljs-number">2</span>)
print(<span class="hljs-string">f"sortWithinPartitions() time: <span class="hljs-subst">{local_time}</span>s"</span>)
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Approach</strong></td><td><strong>Plan Type</strong></td><td><strong>Execution Time (1M rows)</strong></td><td><strong>Observation</strong></td></tr>
</thead>
<tbody>
<tr>
<td>orderBy()</td><td>Exchange rangepartitioning</td><td>10.34 s</td><td>Full shuffle for global sort</td></tr>
<tr>
<td>sortWithinPartitions()</td><td>Local Sort (no Exchange)</td><td>2.18 s</td><td>In-place sorting, no network transfer</td></tr>
</tbody>
</table>
</div><p><strong>Physical Plan Differences:</strong></p>
<p><strong>orderBy() Physical Plan:</strong></p>
<pre><code class="lang-python">*(<span class="hljs-number">2</span>) Sort [department ASC NULLS FIRST, salary DESC NULLS LAST], true, <span class="hljs-number">0</span>

+- Exchange rangepartitioning(department ASC NULLS FIRST, salary DESC NULLS LAST, <span class="hljs-number">200</span>)

   +- *(<span class="hljs-number">1</span>) Project [id, firstname, lastname, department, salary, age, hire_date, country]

      +- *(<span class="hljs-number">1</span>) Scan ExistingRDD[id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>The Exchange rangepartitioning node marks the shuffle boundary. Spark must:</p>
<ul>
<li><p>Sample data to determine range splits</p>
</li>
<li><p>Redistribute all rows across executors</p>
</li>
<li><p>Sort within each range partition</p>
</li>
</ul>
<p><strong>sortWithinPartitions() Physical Plan:</strong></p>
<pre><code class="lang-python">*(<span class="hljs-number">1</span>) Sort [department ASC NULLS FIRST, salary DESC NULLS LAST], false, <span class="hljs-number">0</span>

+- *(<span class="hljs-number">1</span>) Project [id, firstname, lastname, department, salary, age, hire_date, country]

   +- *(<span class="hljs-number">1</span>) Scan ExistingRDD[id, firstname, lastname, department, salary, age, hire_date, country]
</code></pre>
<p>No Exchange. The false flag in Sort indicates local sorting only. Each partition is sorted independently, in parallel, without any data movement.</p>
<p><strong>When to Use Which:</strong></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Use Case</strong></td><td><strong>Method</strong></td><td><strong>Why</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Writing partitioned files (Parquet, Delta)</td><td>sortWithinPartitions()</td><td>Partition-level order is sufficient; global order wasted</td></tr>
<tr>
<td>Window functions with ROWS BETWEEN</td><td>sortWithinPartitions()</td><td>Only need order within each window partition</td></tr>
<tr>
<td>Top-N per group (rank, dense_rank)</td><td>sortWithinPartitions()</td><td>Ranking is local to each partition key</td></tr>
<tr>
<td>Final output must be globally ordered</td><td>orderBy()</td><td>Need total order across all partitions</td></tr>
<tr>
<td>Downstream system requires strict ordering</td><td>orderBy()</td><td>For example, time-series data for sequential processing</td></tr>
<tr>
<td>Sorting before coalesce() for fewer output files</td><td>sortWithinPartitions()</td><td>Maintains order within merged partitions</td></tr>
</tbody>
</table>
</div><h4 id="heading-common-anti-pattern">Common Anti-Pattern</h4>
<pre><code class="lang-python">df.orderBy(<span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>) \
  .write.partitionBy(<span class="hljs-string">"department"</span>) \
  .parquet(<span class="hljs-string">"output/"</span>)
</code></pre>
<p><strong>Problem:</strong> You're globally sorting by department, then immediately partitioning by department. The global order is destroyed during partitioning.</p>
<p>Here’s the fix:</p>
<pre><code class="lang-python">df.sortWithinPartitions(<span class="hljs-string">"department"</span>, <span class="hljs-string">"salary"</span>) \
  .write.partitionBy(<span class="hljs-string">"department"</span>) \
  .parquet(<span class="hljs-string">"output/"</span>)
</code></pre>
<p>Or even better, if you're partitioning by department anyway:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Best: let partitioning handle distribution</span>
df.write.partitionBy(<span class="hljs-string">"department"</span>) \
    .sortBy(<span class="hljs-string">"salary"</span>) \
    .parquet(<span class="hljs-string">"output/"</span>)
</code></pre>
<p>orderBy() triggers an expensive full shuffle using range partitioning, while sortWithinPartitions() sorts data locally without a shuffle and is often 4–5× faster. Use it when writing partitioned files, computing window functions with partitionBy(), or when order is needed only within groups, and reserve orderBy() strictly for true global ordering, because in most production ETL, the best sort is the one that doesn’t shuffle.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>You began this handbook likely wondering why your Spark application was slow, and now you see that the answer was both clear and not so clear: your problem was never your Spark application, your configuration, or your version of Spark. It was your plan all along.</p>
<p>You now understand that Spark runs plans, not code, that transformation order affects logical plans, that shuffles generate stages and are key to runtime performance, and that examining your physical plans allows you to directly link your application performance issues back to your problematic line of code.</p>
<p>And you’ve seen this pattern repeat across many scenarios: problem, plan, solution, improved plan, and so forth, until optimization feels less like a dark art and more like a certainty.</p>
<p>This is the Spark optimization mindset: read plans before you write code, and challenge every single Exchange. Engineers who write high-performance Spark jobs minimize shuffles, filter early, project narrowly, deal with skew carefully, and validate everything via explain() and the Spark UI. Once you learn to read the plan, Spark performance becomes mechanical.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Use Vibe Coding Effectively as a Dev ]]>
                </title>
                <description>
                    <![CDATA[ It may seem like everyone is a vibe coder these days, and prompting seemed like it would become the new coding. But is this AI-generated code really deployable? Bragging on social media about a clever script is one thing, but pushing a vibe coded app... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-use-vibe-coding-effectively-as-a-dev/</link>
                <guid isPermaLink="false">6925deb0b459e862808eb04c</guid>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Programming Blogs ]]>
                    </category>
                
                    <category>
                        <![CDATA[ software development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Web Development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ vibe coding ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ankur Tyagi ]]>
                </dc:creator>
                <pubDate>Tue, 25 Nov 2025 16:52:00 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1764089459731/0122c0b7-08e2-434a-b5eb-518025401951.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>It may seem like everyone is a vibe coder these days, and prompting seemed like it would become the new coding. But is this AI-generated code really deployable?</p>
<p>Bragging on social media about a clever script is one thing, but pushing a vibe coded app to prod comes with many security risks.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758881769141/9bedc585-5608-4660-a304-bbb10f10b8f2.png" alt="Vibe-debug, vibe-refactor, and vibe-check" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>With so many AI dev tools out there now, <a target="_blank" href="https://www.freecodecamp.org/news/how-to-perform-code-reviews-in-tech-the-painless-way/">code reviews</a> become more critical than ever.</p>
<p>This article will explore what <strong>vibe coding</strong> means and how code reviews should adapt in the era of AI.</p>
<h2 id="heading-table-of-contents"><strong>Table of Contents:</strong></h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-is-vibe-coding">What is Vibe Coding?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-implement-vibe-coding-in-practice">How to Implement Vibe Coding in Practice</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-isnt-vibe-coded-output-production-ready">Why isn’t Vibe Coded Output Production Ready?</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-context-gaps-are-the-first-crack">Context gaps are the first crack.</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-those-gaps-lead-directly-to-integration-blind-spots">Those gaps lead directly to integration blind spots.</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-most-serious-risk-is-security-by-omission">The most serious risk is security by omission.</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-testing-and-correctness-evidence-are-thin">Testing and correctness evidence are thin.</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-operability-lags-behind">Operability lags behind.</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-guidelines-for-ai-code-reviews">Guidelines for AI Code Reviews</a></p>
<ul>
<li><a class="post-section-overview" href="#heading-code-review-process-in-vibe-coding">Code Review Process in Vibe Coding</a></li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-checklist-for-reviewing-ai-generated-code">Checklist for Reviewing AI Generated Code</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-work-effectively-with-ai-tools">How to Work Effectively with AI Tools</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-what-is-vibe-coding">What is Vibe Coding?</h2>
<p>In early 2025, AI researcher <a target="_blank" href="https://x.com/karpathy">Andrej Karpathy</a> popularized the term vibe coding to describe a new way of development in which you “fully give in to the vibes” and let AI write code while you focus on high level intent.</p>
<p>A developer expresses their desired functionality in plain language, and an AI system (like an <a target="_blank" href="https://en.wikipedia.org/wiki/Large_language_model">LLM</a>) generates the source code to implement it.</p>
<p>This code-by-prompt approach allows even beginners to produce working code without deep knowledge of programming languages. Karpathy joked that with advanced IDE agents (like <a target="_blank" href="https://www.devtoolsacademy.com/blog/cursor-vs-windsurf/">Cursor’s</a> Composer mode), “I barely even touch the keyboard... I ‘Accept All’ always, I don’t read the diffs anymore... and it mostly works”.</p>
<p>So, vibe coding is coding by vibe and trusting AI to handle the heavy lifting.</p>
<h2 id="heading-how-to-implement-vibe-coding-in-practice">How to Implement Vibe Coding in Practice</h2>
<p>In practice, vibe coding usually involves using AI assistants and adapting your workflow to a more interactive, prompt-driven style.</p>
<p>Here’s an overview of how you can “vibe code” a project:</p>
<h3 id="heading-step-1-choose-an-ai-assistant">Step 1: Choose an AI assistant</h3>
<p>Select a development env that supports AI code generation. Popular choices include <a target="_blank" href="https://cursor.com/">Cursor</a> and <a target="_blank" href="https://github.com/features/copilot">GitHub Copilot</a>.</p>
<h3 id="heading-step-2-define-your-requirements">Step 2: Define your requirements</h3>
<p>Instead of writing boilerplate code, describe what you want to build. Provide AI with a specific prompt detailing functionality. The more <a target="_blank" href="https://www.philschmid.de/context-engineering">context</a> and detail you give, the better AI can fulfill your intent.</p>
<p>For example, when I ran an SEO inspection for my website, DevTools Academy, I used this prompt in Cursor:</p>
<blockquote>
<p>“Now, act as a senior product engineer and UX strategist. Evaluate and improve <a target="_blank" href="https://www.devtoolsacademy.com">https://www.devtoolsacademy.com</a> with a practical, no-fluff lens.</p>
<p>Scope:</p>
<ul>
<li><p>UX</p>
</li>
<li><p>SEO and technical SEO</p>
</li>
<li><p>Positioning and messaging</p>
</li>
<li><p>Copywriting and information architecture</p>
</li>
<li><p>What to add to stand out in the developer tools space.”</p>
</li>
</ul>
</blockquote>
<p>This prompt works well because it gives the AI a clear role, a defined scope, and a specific intent. AI knows it’s not just fixing SEO but also reviewing how the site communicates value to devs. That combination of clarity and context produces actionable insights instead of surface-level suggestions.</p>
<p>Below is a screenshot of that audit in progress and showing how I reviewed code, metadata, and UX recommendations side by side.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761258218099/91e93726-1a7d-4d1a-9839-531355037dfc.png" alt="cursor screenshot showing CodeRabbit reviewing a pull request with comments and summary." class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>You can checkout the full code on my open source <a target="_blank" href="https://github.com/tyaga001/devtoolsacademy">blog</a> here and check out closed PRs. This will help you learn how I use all these coding agents on a production ready app.</p>
<h3 id="heading-step-3-review-the-code">Step 3: Review the code</h3>
<p>AI will produce initial code based on your prompt. Think of this as a prototype – it’s not perfect. Run the code and see how it behaves.</p>
<p>Let’s look at an example: here, CodeRabbit is reviewing one of my <a target="_blank" href="https://github.com/tyaga001/devtoolsacademy/pull/145">pull requests</a> on GitHub. I had pushed a small fix to sort blog posts correctly and make sure the RSS feed reflects the latest publish date. Within seconds, CodeRabbit analyzed the diff, understood the intent behind my change, and explained exactly what the new code does.</p>
<p>It pointed out that the fix now sorts posts before mapping them, uses the sorted data for both items and the lastBuildDate, and ensures proper chronological order throughout the feed.</p>
<p>It’s like having a senior reviewer who not only checks syntax but also validates logic and confirms that your reasoning holds up.</p>
<p><a target="_blank" href="https://github.com/tyaga001/devtoolsacademy/pull/145"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758879621613/95bee1e7-3953-4416-b48b-e844332be950.png" alt="GitHub pull request showing CodeRabbit review comments on code changes with highlighted fixes." class="image--center mx-auto" width="600" height="400" loading="lazy"></a></p>
<p>This is just a reminder to expect imperfections. Vibe coding embraces a <em>“code first, refine later”</em> mindset. This means you get a working version quickly, then iteratively improve it. You might go through a few cycles of prompt -&gt; code -&gt; test -&gt; tweak.</p>
<h3 id="heading-step-4-validate-debug-polish">Step 4: Validate, debug, polish</h3>
<p>Once AI generated code meets your expectations, do a final review.</p>
<p>Throughout the process, the core idea is that you collaborate with the AI. The AI agent serves as a coding assistant, making real-time suggestions, automating tedious boilerplate, and even generating entire modules on your behalf.</p>
<h2 id="heading-why-isnt-vibe-coded-output-production-ready">Why Isn’t Vibe Coded Output Production Ready?</h2>
<p>Vibe coding moves fast: you describe intent, the AI produces something that runs, and you’re off to the next prompt. What’s missing is the slow, unglamorous work that usually turns a draft into shippable software, like shared context, architectural alignment, verification, and documentation.</p>
<p>AI generates plausible code based on patterns it has seen. But it doesn’t understand your team’s history, your system’s constraints, or the implicit rules that keep everything coherent over time.</p>
<p>That mismatch shows up the moment a “works on my machine” demo meets a real codebase.</p>
<p>Let’s explore the common pitfalls of vibe-coded code, so you’ll know what to watch for. Then, in the checklist section below, I’ll outline practical strategies to address or prevent each issue.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758271815928/5f763a0f-2dda-4318-8c19-0c9e58447abe.png" alt="AI is Limited by Context" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-context-gaps-are-the-first-crack">Context gaps are the first crack.</h3>
<p>AI only sees what you show it, so it’s easy for it to make the right local choice and the wrong global one: duplicating logic that already exists, choosing defaults that conflict with prior decisions, or introducing functions that don’t respect domain boundaries.</p>
<p>The result is code that looks reasonable in isolation but collides with existing assumptions and conventions once integrated.</p>
<h3 id="heading-those-gaps-lead-directly-to-integration-blind-spots">Those gaps lead directly to integration blind spots.</h3>
<p>Drafts often ignore the lived details of your environment – shared utilities, cross-cutting concerns, configuration, deployment hooks, and operational policies. Interfaces may line up at a glance and still fail at runtime because the draft doesn’t fit how your system composes modules, handles errors, or manages state across services.</p>
<h3 id="heading-the-most-serious-risk-is-security-by-omission">The most serious risk is security by omission.</h3>
<p>AI rarely includes robust input validation, clear authentication and authorization paths, or rate limiting unless you spell it out. Secrets handling and logging tend to be superficial or missing. That leaves common exposure points like request handlers, job processors, and webhook endpoints without the checks that prevent injection, SSRF, mass assignment, or data exfiltration.</p>
<p>Even when the surface looks tidy, the absence of explicit security controls means you’re trusting defaults you didn’t choose.</p>
<h3 id="heading-testing-and-correctness-evidence-are-thin">Testing and correctness evidence are thin.</h3>
<p>Quality suffers in quieter ways, too. Beyond “it runs,” there’s little to demonstrate behavior across edge cases or to guard against regressions.</p>
<p>Performance and scalability remain unknowns: extra network calls, N+1 patterns, and quadratic loops sneak in because nobody measured them. Dependencies and environments drift as versions aren’t pinned, infrastructure isn’t declared, and configuration lives only in the author’s head, making behavior differ across machines and CI.</p>
<h3 id="heading-operability-lags-behind">Operability lags behind.</h3>
<p>A lack of metrics, missing health/readiness probes, and no runbook make failures harder to detect and slower to recover from. Add in data quality and compliance concerns (PII handling, encoding assumptions, transitive license obligations), and you have code that demos well but isn’t ready for production’s reliability, security, and audit demands.</p>
<p>In short, vibe-coded output accelerates drafting but skips the shared understanding and evidence that make software safe to ship.</p>
<p>Until those gaps are closed, it’s a prototype, not a release.</p>
<h2 id="heading-guidelines-for-ai-code-reviews">Guidelines for AI Code Reviews</h2>
<p>Your team should keep pre-AI engineering standards as the bar, including security, tests, readability, maintainability, performance, and docs. AI should change how fast you gather the evidence for those standards, not how much evidence you require. In other words, use AI to accelerate the path to your existing bar, never to lower it.</p>
<p>Using AI, you can generate code at speed. But if reviews take the same amount of time (or more time), you lose some of the benefit. The goal isn’t to relax standards, it’s to shorten the time to prove you met them. That means layering in automation (tests, static analysis, secret scans, SCA) and AI-assisted review to catch obvious issues quickly so human reviewers can focus on intent, architecture, and risk.</p>
<p>Well-used assistants can help here. For example, tools like CodeRabbit, GitHub Copilot PR Reviewer, Claude Code, Cursor’s Bugbot, Graphite’s AI Review, and Greptile can highlight potential bugs, security gaps, style deviations, and mismatched intent, and summarize diffs for faster context. Treat these as accelerators for your existing process, not as replacements for judgment.</p>
<h3 id="heading-code-review-process-in-vibe-coding">Code Review Process in Vibe Coding</h3>
<p>The fundamentals of good code reviews haven’t changed – and in fact, they’re more critical now.</p>
<p>Below are some key principles to maintain speed without sacrificing quality.</p>
<h4 id="heading-1-trust-but-verify">1. Trust, but verify.</h4>
<p>A reviewer usually assumes the author understands the system. With vibe-coded output, the “author” may be an AI with limited context. If something looks odd or unnecessary, question it. Run the code, add/execute tests, or ask the developer/AI for clarification on intent and constraints.</p>
<h4 id="heading-2-dont-let-reviews-become-a-bottleneck">2. Don’t let reviews become a bottleneck.</h4>
<p>Vibe coding generates code quickly. If human review takes as long as hand-writing the change, you’ve erased the gain.</p>
<p>Combat this by front-loading automation: run unit/integration tests, static analysis (lint/SAST), secret scans, SCA, and basic perf checks in CI to clear the noise. Then reviewers spend their time on design trade-offs, boundary cases, and risk. The balance is: high standards, faster evidence.</p>
<h4 id="heading-3-use-ai-code-reviews-wisely">3. Use AI code reviews wisely</h4>
<p>AI can help review code just as it helps generate it. Modern “pair reviewer” tools scan a PR and surface likely bugs, security issues, missing tests, or style violations in minutes plus give natural-language summaries of the change.</p>
<p>Tools you can consider include CodeRabbit, GitHub Copilot PR Reviewer, Claude Code, Cursor Bugbot, Graphite, and Greptile. Many integrate with the CLI/IDE and GitHub/GitLab to leave actionable comments.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758272500586/a9cc891f-ab1a-47d8-a607-a772cbaef2e0.png" alt="coderabbit CLI" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Think of them as fast first-pass reviewers that increase coverage and consistency across PRs.</p>
<h4 id="heading-4-human-judgment-is-still-irreplaceable">4. Human judgment is still irreplaceable.</h4>
<p>Even the best AI reviewer is an assistant. Keep humans accountable for correctness, security posture, architectural fit, and user impact. A healthy pattern is AI first-pass &gt; human second-pass that inspects invariants, failure modes, and long-term maintainability.</p>
<h4 id="heading-5-maintain-a-high-bar-for-quality">5. Maintain a high bar for quality.</h4>
<p>It’s tempting to accept “it runs” when an AI wrote it. Don’t. Stakeholders still expect software to be robust, secure, and maintainable. Keep DRY, readability, and testability standards. Insist on input validation, authZ checks where relevant, and sensible logging/metrics. If you can’t provide evidence that you met the bar, you haven’t met it.</p>
<h4 id="heading-6-educate-and-document">6. Educate and document</h4>
<p>When reviewers find bugs or security flaws in AI-generated code, capture the lesson.</p>
<p>Update internal guides with patterns like “When generating handlers, validate and bound inputs, add rate limits, log request IDs, avoid N+1 queries, and sanitize user-visible output.” Over time, bake these into prompts, templates, repo scaffolds, and CI checks so the next AI draft starts closer to done.</p>
<h2 id="heading-checklist-for-reviewing-ai-generated-code">Checklist for Reviewing AI Generated Code</h2>
<p>Before approving any vibe-coded change, make the standards explicit and verifiable. Use this checklist to confirm behavior, security, performance, integration, and documentation so the draft you got from AI becomes code you can safely ship.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762510535966/85ea547a-f955-446b-9e22-965dc18f9e49.png" alt="Checklist for Reviewing AI Generated Code" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Here’s a checklist a human reviewer should go through before approving vibe-coded output:</p>
<h3 id="heading-1-define-the-codes-purpose-scope-amp-non-goals">1. Define the code’s purpose (scope &amp; non-goals).</h3>
<p>Be explicit about what this change does and does not do. Tie it to a user story/ticket and call out non-goals so “helpful” AI changes don’t creep in.</p>
<h3 id="heading-2-verify-x-and-y-behavior-and-edge-cases">2. Verify X and Y (behavior and edge cases).</h3>
<p>Be clear about what you’re verifying. For example, verify input parsing and pagination boundaries, verify that error paths return the correct status and body, and verify that database writes are idempotent. Run existing tests, add missing unit/integration tests, and reproduce edge inputs (empty, null, huge, unicode).</p>
<h3 id="heading-3-perform-code-quality-checks-readability-dry-refactor-needs">3. Perform code-quality checks (readability, DRY, refactor needs).</h3>
<p>AI often produces verbose or duplicated logic. Ensure names are meaningful, side effects are clearly stated, and duplication is removed or minimized. Run linters/formatters, collapse repetition, and extract helpers where they aid clarity.</p>
<h3 id="heading-4-analyze-organization-and-structure-make-sure-it-fits-the-architecture">4. Analyze organization and structure (make sure it fits the architecture).</h3>
<p>AI writes code in isolation. Confirm the change uses existing utilities, layers, and boundaries (domain/services/controllers/jobs). Check imports and module placement, avoid reinventing existing helpers, and align with repository conventions.</p>
<h3 id="heading-5-validate-inputs-and-assumptions-make-the-implicit-explicit">5. Validate inputs and assumptions (make the implicit explicit).</h3>
<p>List the assumptions the AI made (default locale/timezone, allowed ranges, required fields). Add schema validation (DTO/class validators/JSON Schema). Empty, null, over-max, non-ASCII, unexpected enum, malicious strings. And finally, enforce limits/timeouts.</p>
<h3 id="heading-6-perform-security-audits-minimum-pass">6. Perform security audits (minimum pass).</h3>
<ul>
<li><p><strong>AuthN/AuthZ:</strong> Confirm endpoint checks identity and authorization paths; deny-by-default.</p>
</li>
<li><p><strong>Inputs:</strong> Sanitize/validate inputs, prevent injection (SQL/NoSQL/command), and escape user-visible output.</p>
</li>
<li><p><strong>Secrets</strong>: No secrets in code/diff/logs, use env/secret manager, and rotate any test keys.</p>
</li>
<li><p><strong>Abuse controls:</strong> Add rate limits, size limits, and timeouts on network and disk operations. Run SAST/secret scan/SCA, and fix or justify findings.</p>
</li>
</ul>
<h3 id="heading-7-do-a-performance-evaluation-right-now-at-a-small-scale">7. Do a performance evaluation (right now, at a small scale).</h3>
<p>Look for N+1s, needless network calls, unbounded loops, quadratic sorts. Add a micro-benchmark or run a quick load test for hot paths. Set sensible cache/timeout/retry with jitter where applicable.</p>
<h3 id="heading-8-manage-dependencies-pin-justify-minimize">8. Manage dependencies (pin, justify, minimize).</h3>
<p>Review any new libraries. Are they necessary? Maintained? License compatible? Pin versions, add lockfiles, or remove unused transitive adds.</p>
<h3 id="heading-9-review-documentation-what-to-add-and-where">9. Review documentation (what to add and where).</h3>
<p>Ensure the docs are in line with the code. AI often changes some parts or adds code blocks at different places while resolving various issues. These changes might not make it into the docs.</p>
<h3 id="heading-10-observability-see-problems-early">10. Observability (see problems early).</h3>
<p>Use structured logs with request/trace IDs, key counters/timers (success/error/latency), health/readiness probes, and a basic dashboard or alert stub.</p>
<h3 id="heading-11-compliance-and-data-handling-when-applicable">11. Compliance and data handling (when applicable).</h3>
<p>Identify any personally identifiable information (PII), document collection/retention, ensure masking/redaction in logs, verify dependency licenses and data-residency constraints.</p>
<h2 id="heading-how-to-work-effectively-with-ai-tools">How to Work Effectively with AI Tools</h2>
<p>At this point, you can probably see why it’s very important to understand the actual skills involved in AI-assisted development.</p>
<p>There’s a pretty big difference between an experienced developer who uses AI tools to help them get more done, and a newbie who thinks AI can build the next Facebook or Google just with a simple prompt.</p>
<p>An inexperienced dev will ask AI something like "Hey, Build me Twitter and make no mistakes"</p>
<p>But an experienced developer who has a solid fundamentals might say say something like:</p>
<ul>
<li><p>"AI, we're building a Twitter replica. Use $SQL_Database, Use $Language, Avoid $Common_Pitfalls, Follow $Standard_Practices."</p>
</li>
<li><p>"The generated code is prone to X problem, implement this fix."</p>
</li>
<li><p>"Implementation of $X is flawed because of $Y, do $Z instead."</p>
</li>
</ul>
<p>So as you can see, you still need to know the how's and the why's and what depends on what. Often you’ll just need to make the changes manually, because it will be faster. And you don’t want to outsource the critical thinking part, which is the part that AI can't actually do.</p>
<p>LLMs are good at information retrieval. If you know nothing about what you’re looking for, then asking an AI isn’t going to be that helpful (or that reliable). But if you have an idea, some background knowledge/context, and the skills to verify AI’s responses, then it can be really helpful.</p>
<p>Last month, I shared in my <a target="_blank" href="https://bytesizedbets.com/">newsletter</a> how my current coding loop looks in practice.</p>
<p>I draft with Claude Code (or Copilot/Cursor), open a PR, and let an AI reviewer like CodeRabbit (or Copilot PR Reviewer / Cursor Bugbot or Greptile) do the first pass. CI runs tests and scans.</p>
<p>I repeat until everything’s green and the PR is ready to merge. It’s fast, but it’s still disciplined.</p>
<p>If you want to understand why this kind of workflow is becoming essential, read this article: <a target="_blank" href="https://bytesizedbets.com/p/era-of-ai-slop-cleanup-has-begun">Era of AI Slop Cleanup Has Begun</a>. I talk about what’s happening in AI-assisted engineering, where generating code is easy, but keeping it clean and production ready takes experience – and you must have good programming skills.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>AI-generated code can boost productivity – but production value still comes from software that is robust, secure, and maintainable.</p>
<p>Mindless code generation creates technical debt. But when you integrate AI thoughtfully, with guardrails, verification, tests, security checks, and documentation, you can go faster without lowering your standards.</p>
<p>That's it for this article. I hope you learned something new today.</p>
<p>If you have any questions about code reviews, engineering, startups, or business in general, please find me on Twitter: <a target="_blank" href="https://x.com/TheAnkurTyagi">@TheAnkurTyag</a>i. I’d be more than happy to discuss them.</p>
<h3 id="heading-want-to-read-more-interesting-articles-like-this">Want to read more interesting articles like this?</h3>
<p>You can read more about the latest dev tools like this one on my <a target="_blank" href="https://www.devtoolsacademy.com/">website</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Improve Your Programming Skills by Building Games ]]>
                </title>
                <description>
                    <![CDATA[ When most people think about learning to code, they imagine building websites or automating small tasks. Few think of building games as a serious way to improve programming skills.  But creating even a simple game can teach lessons that no tutorial e... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/improve-your-programming-skills-by-building-games/</link>
                <guid isPermaLink="false">690364c01022fd77927ddcbd</guid>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Game Development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Programming Tips ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Thu, 30 Oct 2025 13:14:40 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1761829724019/2dc484e9-e0d2-4632-85ff-8ed39233fb51.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>When most people think about learning to code, they imagine building websites or automating small tasks. Few think of building games as a serious way to improve programming skills. </p>
<p>But creating even a simple game can teach lessons that no tutorial ever could. Games force you to think about performance, user input, structure, and creative problem-solving all at once.</p>
<p>When I started building small <a target="_blank" href="https://www.freecodecamp.org/news/how-to-build-a-snake-game-using-phaserjs/">2D games</a> as weekend projects, I didn’t realize how much they would sharpen my overall coding skills. From learning how to organize complex systems to handling real-time input, every part of game development stretched my thinking. </p>
<p>Whether you’re a web developer, mobile engineer, or hobby coder, building games will make you a stronger problem solver.</p>
<p>Here are ten programming skills you’ll learn along the way.</p>
<h2 id="heading-table-of-contents"><strong>Table of Contents</strong></h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-1-thinking-in-systems">1. Thinking in Systems</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-2-writing-event-driven-code">2. Writing Event-Driven Code</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-3-optimizing-for-performance">3. Optimizing for Performance</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-4-debugging-complex-states">4. Debugging Complex States</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-5-handling-user-input-responsively">5. Handling User Input Responsively</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-6-building-reusable-game-loops-and-engines">6. Building Reusable Game Loops and Engines</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-7-managing-complexity-through-components">7. Managing Complexity Through Components</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-8-learning-the-math-that-actually-matters">8. Learning the Math That Actually Matters</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-9-sharpening-your-design-and-ux-instincts">9. Sharpening Your Design and UX Instincts</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-10-embracing-creative-problem-solving">10. Embracing Creative Problem Solving</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-1-thinking-in-systems"><strong>1. Thinking in Systems</strong></h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761568833173/febffe00-1c5d-47cf-8c0a-a172a7d273f1.png" alt="Systems thinking" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Every game is a set of systems working together. You might have a physics system that controls movement, a rendering system that draws the visuals, and an AI system that decides how enemies react. </p>
<p>Each one depends on the others, but they must remain separate enough to be managed and improved without breaking the rest of the game.</p>
<p>This is exactly what developers deal with in larger software projects. Building a game helps you understand modular design and why separating logic into smaller, independent parts makes everything easier to scale and debug. </p>
<p>You stop writing long scripts that try to do everything and instead start thinking in terms of systems that talk to each other through clear rules.</p>
<h2 id="heading-2-writing-event-driven-code"><strong>2. Writing Event-Driven Code</strong></h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761568856531/4e18c861-8cd8-45cf-9f4b-4b876f8e41a3.png" alt="Event-Driven Programming" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Games live and breathe on events. A button press, a collision, or a timer hitting zero are all events that trigger actions. </p>
<p>When you code a game, you quickly learn to think in event loops. This helps you understand how asynchronous code works in real life.</p>
<p>If you’ve struggled with JavaScript event listeners or backend message queues, building a small game is the perfect way to get comfortable with them. </p>
<p>Every time a player jumps, attacks, or collects an item, you’re writing code that listens for an event and reacts in real time. That experience makes you a better developer, even outside of gaming.</p>
<h2 id="heading-3-optimizing-for-performance"><strong>3. Optimizing for Performance</strong></h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761568898214/e086f416-0b25-489f-86fd-8dbdaba200b4.png" alt="Performance Optimisation" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Unlike websites, games can’t afford to lag. A delay of even a few milliseconds can break the experience. </p>
<p>When you write games, you learn to measure performance constantly. You start thinking about memory usage, CPU load, and rendering time.</p>
<p>You might experiment with how often to update physics calculations or how to reuse textures instead of loading them every frame. </p>
<p>Those small optimizations become second nature, and later, when you’re building a web app or a backend service, you’ll know exactly where to look when something feels slow.</p>
<h2 id="heading-4-debugging-complex-states"><strong>4. Debugging Complex States</strong></h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761568916767/4a084536-2076-4065-bf67-674e53f5b28e.png" alt="Debugging" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Games are full of moving parts that interact in unpredictable ways. Maybe a character disappears after jumping twice, or a power-up triggers twice because of overlapping timers. These problems force you to learn structured debugging.</p>
<p>You’ll get used to adding logs, reproducing edge cases, and isolating bugs by breaking large systems into smaller ones. The patience and process you develop while debugging a tricky game bug translate perfectly to real-world software. </p>
<p>You become the kind of developer who doesn’t panic when something goes wrong because you’ve already handled far more chaotic code in your side projects.</p>
<h2 id="heading-5-handling-user-input-responsively"><strong>5. Handling User Input Responsively</strong></h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761568965695/990963ce-4474-4609-aca3-28f27901bee4.jpeg" alt="Handling user input" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>When you build a game, user input becomes one of your main concerns. You want the player’s actions to feel instant. </p>
<p>That means learning how to manage input devices like keyboards, mice, or <a target="_blank" href="https://www.eneba.com/hub/gaming-gear/best-pc-controller/">best PC controllers</a>. You’ll discover how to debounce actions, prevent lag, and detect simultaneous keypresses. You might even test your code with the best PC controller to make sure it feels smooth and accurate. </p>
<p>This focus on responsiveness changes how you approach every future project. You begin to see every button click or touch gesture as part of a feedback loop that should feel immediate and natural.</p>
<h2 id="heading-6-building-reusable-game-loops-and-engines"><strong>6. Building Reusable Game Loops and Engines</strong></h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761568992859/a5a94ae0-5899-476a-816d-74883b5ac259.png" alt="Reusable Loops" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>After writing a few games, you’ll realize that many parts of your code repeat. The main loop that updates the world, the input handlers, and the collision checks all follow patterns. This realization leads to a powerful skill: abstraction.</p>
<p>You start building small frameworks or reusable components that handle these repetitive tasks. In doing so, you learn the same lessons that professional developers learn when they design APIs or internal tools. </p>
<p>The discipline of turning messy scripts into organized, reusable code teaches you about structure and design in a way that theory never can.</p>
<h2 id="heading-7-managing-complexity-through-components"><strong>7. Managing Complexity Through Components</strong></h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761569009038/cf7d045e-a1d2-4dde-94e5-90f3b84f41b5.jpeg" alt="Managing Complexity" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Game developers often use something called an <a target="_blank" href="https://en.wikipedia.org/wiki/Entity_component_system">Entity-Component-System (ECS) architecture</a>. It’s a way of organizing objects in a game so they can share behavior without heavy inheritance trees. For example, a player and an enemy might both have movement and health components, but different AI logic.</p>
<p>This pattern is very similar to how modern front-end frameworks work. If you use React, you already think in components. Building games strengthens that habit. </p>
<p>You start to see every system, UI, physics, AI, as a component that can be composed and reused. It’s one of the most powerful ways to manage complexity in any large codebase.</p>
<h2 id="heading-8-learning-the-math-that-actually-matters"><strong>8. Learning the Math That Actually Matters</strong></h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761569027607/b9c2453b-a8dc-401b-b824-b200b6d0555f.jpeg" alt="Learning Math" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Many developers shy away from math, but games make it practical. When you need to move a character along a curve, calculate projectile motion, or detect collisions, you’re forced to use geometry, trigonometry, and vectors.</p>
<p>The best part is that you learn it through doing, not memorizing formulas. You begin to understand how angles, distances, and forces interact in a way that feels visual and intuitive. Later, when you face algorithmic problems or data visualizations, that math background helps you approach them with confidence.</p>
<h2 id="heading-9-sharpening-your-design-and-ux-instincts"><strong>9. Sharpening Your Design and UX Instincts</strong></h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761569045237/f59d54e0-bf26-49ae-8c4c-838ac624c9e7.jpeg" alt="Design Thinking" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Good games feel right. The jump height, the delay between actions, the feedback when you collect a coin, every small detail affects how enjoyable the game feels. </p>
<p>When you design these experiences, you’re learning about user experience design without even realizing it.</p>
<p>You begin to think about things like timing, feedback, and accessibility. You learn how to make interactions satisfying and clear. </p>
<p>The same mindset applies when you build apps or websites. You start designing not just for functionality but for how it feels to use.</p>
<h2 id="heading-10-embracing-creative-problem-solving"><strong>10. Embracing Creative Problem Solving</strong></h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761569066107/7a9aab1e-1814-4a50-b837-ba5129f49e49.jpeg" alt="Creative Problem Solving" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Games are rarely built in a straight line. You’ll face problems that don’t have clear answers. </p>
<p>Maybe you need a way to fake physics without heavy computation or make AI feel smarter than it is. These challenges train you to think creatively.</p>
<p>You’ll often come up with unconventional but clever solutions. That kind of flexible problem-solving becomes one of your most valuable programming skills. </p>
<p>When something breaks in production or a feature seems impossible under current constraints, you’ll know how to find a creative way around it because you’ve done it before in your own projects.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Building games is more than a hobby. It’s an accelerated crash course in becoming a better developer. You’ll write cleaner code, understand systems thinking, and develop a sharp sense for performance and design. You’ll also have fun in the process, which keeps your motivation alive longer than any tutorial series can.</p>
<p>Each project you build will teach you something new about programming. The lessons won’t come from books but from the moments you struggle, test, and finally see your creation come to life. Build something that teaches you back, and you’ll grow as both a coder and a creator.</p>
<p>Hope you enjoyed this article. Connect with me <a target="_blank" href="https://www.linkedin.com/in/manishmshiva/?originalSubdomain=in">on Linkedin</a> or <a target="_blank" href="https://manishshivanandhan.com/">visit my website</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ The Architecture of Mathematics – And How Developers Can Use it in Code ]]>
                </title>
                <description>
                    <![CDATA[ "To understand is to perceive patterns." - Isaiah Berlin Math is not just numbers. It is the science of finding complex patterns that shape our world. This means that to truly understand it, we need to see beyond numbers, formulas, and theorems and ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/the-architecture-of-mathematics-and-how-developers-can-use-it-in-code/</link>
                <guid isPermaLink="false">68308ee8ccde6bc325c82393</guid>
                
                    <category>
                        <![CDATA[ Mathematics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Math ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ history ]]>
                    </category>
                
                    <category>
                        <![CDATA[ MathJax ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tiago Capelo Monteiro ]]>
                </dc:creator>
                <pubDate>Fri, 23 May 2025 15:06:16 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1748012748947/1df613bf-93e7-4f03-b0f0-47ff49f38504.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <blockquote>
<p>"To understand is to perceive patterns." - Isaiah Berlin</p>
</blockquote>
<p>Math is not just numbers. It is the science of finding complex patterns that shape our world. This means that to truly understand it, we need to see beyond numbers, formulas, and theorems and understand its structures.</p>
<p>The main goal of this article is to show how math is just like a growing tree of ideas. I want to show that math is a living system of logic, not just formulas to memorize. With analogies, history, and code examples, I want to help you understand math more deeply and how you can apply it to programming.</p>
<p>I’ve also included some code examples here to help you connect theory and practice. I show them to demonstrate how math ideas are applied to real problems. Whether you are new to more advanced math or are more experienced, these code examples will help you understand how to apply math in programming.</p>
<p>This link across theory and application reflects my own studies. I am a finalist in an undergraduate degree in Electrical and Computer Engineering at NOVA FCT, one of the best engineering faculties in Portugal.</p>
<p>My engineering degree is one with more math and physics. This is because it’s key to get a solid grasp of math to understand electronics, telecommunications, control theory, and other areas of engineering.</p>
<p>Here’s a brief overview of some of the math and physics subjects I’ve learned:</p>
<ul>
<li><p><strong>Partial Differential Equations (PDEs):</strong> These equations model real-world phenomena, from heat diffusion to the economy of a country.</p>
</li>
<li><p><strong>Harmonic Analysis (Fourier &amp; Laplace):</strong> Integral transforms like the Fourier and Laplace transform allow us to understand problems in new domains.</p>
</li>
<li><p><strong>Complex Analysis:</strong> Extending calculus into the complex plane gives rise to powerful tools used in physics and engineering.</p>
</li>
<li><p><strong>Numerical Analysis:</strong> When analytical solutions are impossible or inefficient, numerical methods provide computer-based approximations. This is crucial for real-world applications.</p>
</li>
<li><p><strong>Control and Signal Theory:</strong> These areas show us how to design stable systems like rockets, trains, and robots.</p>
</li>
<li><p><strong>Physics:</strong> Courses in Classical Mechanics and Electromagnetism helped bridge theoretical math to physical laws</p>
</li>
</ul>
<p>During my years of study, besides technical skills, I’ve developed a deeper understanding of how the world works and the structure of the field of mathematics. And I’ve started to find patterns in how math is a framework of interconnected logic.</p>
<h3 id="heading-in-this-article-well-explore">In this article, we’ll explore:</h3>
<ul>
<li><p><a class="post-section-overview" href="#heading-simple-analogy-the-tree-of-mathematics">Simple Analogy: The Tree of Mathematics</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-structure-and-history-of-mathematics">The Structure and History of Mathematics</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-an-tree-example-foundations-of-relativity-by-albert-einstein">An Tree example: Foundations of Relativity by Albert Einstein</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-biggest-paradox-of-math-discovered-by-kurt-godel">The Biggest Paradox of Math, Discovered by Kurt Gödel</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-about-applied-math-and-engineering">What About Applied Math and Engineering?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-code-examples-analytical-and-numerical-approaches">Code Examples – Analytical and Numerical Approaches</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-impact-of-a-grand-unified-theory-of-mathematics">The Impact of a Grand Unified Theory of Mathematics</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-a-final-lesson-from-history">A Final Lesson From History</a></p>
</li>
</ul>
<h2 id="heading-simple-analogy-the-tree-of-mathematics">Simple Analogy: The Tree of Mathematics</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747518175609/78838825-d872-42df-9dc8-736fa012a630.jpeg" alt="Photo of two trees by Johannes Plenio: https://www.pexels.com/photo/two-brown-trees-1632790/" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Imagine math as a vast tree growing forever.</p>
<p>The roots of the tree are the foundations of mathematics: logic and set theory. From this foundation emerge the main basic fields of math: arithmetic, algebra, geometry, and analysis.</p>
<p>As the tree divides further and further into more branches, new, more complex subfields start to appear, like topology, abstract algebra, and complex analysis. Sometimes the branches are connected to each other.</p>
<p>And remember: this tree is always growing in many directions. From branches creating new branches to branches connecting to other branches. Little by little, it grows.</p>
<p>Throughout history, there have been times that, due to some big scientific discoveries, parts of the math tree started to grow very fast. Other times, decades and even centuries passed without many new branches. This is the case for imaginary numbers, for example.</p>
<p>And you might wonder: How many more branches and connections between them will keep appearing?</p>
<h2 id="heading-the-structure-and-history-of-mathematics">The Structure and History of Mathematics</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747518363058/9911acd4-ad4f-4da2-a62b-9fa87e219c35.jpeg" alt="Photo of a writing desk and notebook on Pixabay: https://www.pexels.com/photo/brown-wooden-desk-159618/" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>The first mathematical ideas appeared independently across ancient civilizations. For example:</p>
<ul>
<li><p>India’s invention of zero</p>
</li>
<li><p>Islamic algebraic advances</p>
</li>
<li><p>Greek geometric rigor</p>
</li>
</ul>
<p>Over time, many different great mathematicians created and shared them by writing and giving lectures.</p>
<p>Eventually, these new ideas were shared widely with new generations and these new generations created new math based on old math.</p>
<p>This is is how new branches are continuously born from previous branches of the tree of mathematics.</p>
<p>And this is why Isaac Newton wrote, in a letter to Robert Hooke in 1675:</p>
<blockquote>
<p>If I have seen further, it is by standing on the shoulders of giants</p>
</blockquote>
<p>He meant that by working from previous knowledge, he was able to create and (re)discover new ideas.</p>
<p>Yet, the real power of math lies in practicing it over and over and understanding it more and more deeply. As one of my professors once explained:</p>
<blockquote>
<p><em>More important than knowing the theorems is knowing the ideas behind them and the history of how they were created.</em></p>
</blockquote>
<p>Very often, to solve problems, it is necessary to think in terms of first principles and build from there. Math teaches exactly that. In this way, math is not just an academic subject. It is a language spoken by scientists and engineers around the globe.</p>
<p>By having it well preserved and shared, it is still possible to create new math from previous ideas. And it’s possible for the big tree to continue growing based on previous branches or nodes.</p>
<h2 id="heading-an-tree-example-foundations-of-relativity-by-albert-einstein">An Tree example: Foundations of Relativity by Albert Einstein</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747518865627/e84ff108-b383-405b-8bb0-73ffb50b4dcf.jpeg" alt="Albert Einstein, one of the greatest physics giants in history" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Albert Einstein created the general and special theories of relativity. These have big consequences nowadays:</p>
<ul>
<li><p>GPS and Global Communication</p>
</li>
<li><p>Advancements in Satellite Telecommunications</p>
</li>
<li><p>Space Exploration and Satellite Launches</p>
</li>
</ul>
<p>But this was only possible through the unification of geometry with calculus, called <strong>differential geometry.</strong> The evolution of differential geometry happened over the centuries, thanks to many great mathematicians. Below are some of them, but this is not a complete list:</p>
<ul>
<li><p><strong>Euclid (circa 300 BCE):</strong> Contributed to geometry, laying the groundwork for later mathematical systems</p>
</li>
<li><p><strong>Archimedes (circa 287–212 BCE):</strong> Pioneered the understanding of volume, surface area, and the principles of mechanics</p>
</li>
<li><p><strong>René Descartes (1596–1650):</strong> Developed Cartesian coordinates and analytical geometry</p>
</li>
<li><p><strong>Isaac Newton (1642–1727) &amp; Gottfried Wilhelm Leibniz (1646–1716):</strong> Newton’s laws of motion and gravitation, alongside Leibniz’s development of calculus, formed the basis of classical mechanics that Einstein sought to extend and modify in his theory of relativity.</p>
</li>
<li><p><strong>Leonhard Euler (1707–1783):</strong> Contributed to the development of differential equations, which are essential in the mathematical foundations of physics.</p>
</li>
<li><p><strong>Gaspard Monge (1746–1818):</strong> The father of differential geometry and pioneer in descriptive geometry</p>
</li>
<li><p><strong>Carl Friedrich Gauss (1777–1855):</strong> Made groundbreaking advances in geometry, including the concept of curved surfaces.</p>
</li>
<li><p><strong>Bernhard Riemann (1826–1866):</strong> Introduced Riemannian geometry, a branch of differential geometry.</p>
</li>
</ul>
<p>Once again, as Isaac Newton wrote, in a letter to Robert Hooke in 1675:</p>
<blockquote>
<p>If I have seen further, it is by standing on the shoulders of giants.</p>
</blockquote>
<p>Albert Einstein saw what no one else in his time saw, thanks to these great math giants and countless others.</p>
<h2 id="heading-the-biggest-paradox-of-math-discovered-by-kurt-godel">The Biggest Paradox of Math, Discovered by Kurt Gödel</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747518411126/df53f84c-f920-4b42-9081-5aeb1017f543.jpeg" alt="Kurt Gödel, one of the greatest math giants in history" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>The biggest paradox in math, in my opinion, is what Kurt Gödel discovered. His early 20th century research revealed a limitation within this cycle.</p>
<p>This paradox – that is, <a target="_blank" href="https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems">his incompleteness theorems</a> – shows that in any consistent formal system capable of expressing simple arithmetic, there will always be true mathematical statements that cannot be proven within the system itself.</p>
<p>This means that in ALL systems, there are limits to what you can actually prove as to what is true and false. For for mathematicians, this means that the tree will never be completed. There are truths that are beyond formal truths, and yet we still assume that they are true (albeit unproven).</p>
<p>This way, it proves that no matter how many mathematicians work in the field or how much AI is used to find new mathematics, there will always exist limitations. Some things are impossible to prove that are true, and we just know that they are due to approximation estimations and other non logical exact methods.</p>
<h2 id="heading-what-about-applied-math-and-engineering">What About Applied Math and Engineering?</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747518581076/606f3bce-d7db-4ac3-9322-833673a734b0.jpeg" alt="Photo by JESHOOTS.com: https://www.pexels.com/photo/person-holding-a-chalk-in-front-of-the-chalk-board-714699/" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Applied math and engineering involves interpreting the same pure math ideas in real-world scenarios. Actually, in many cases, it is the combination of many math ideas. Let’s consider some examples:</p>
<p>Principal component analysis (PCA) is a widely used tool in data science. Yet, it is a mixture of linear algebra (in PCA, eigenvalues) with optimization (order eigenvalues that represent more data with less data) in order to make datasets shorter.</p>
<p>In machine learning, logistic regression is a mixture of calculus with statistics and probability.</p>
<p>In harmonic analysis, Laplace, Fourier, and Z-transforms are a way to see the same thing in a new domain to get new insights. In this case, integrals are used to make this mapping.</p>
<p>In deep learning, neural networks are just many matrices multiplying and updating themselves that adapt to model a dataset representing a system. This optimization of matrix values happens with activation functions, a gradient descent-based optimization method (tells how much values need to change), and backpropagation (applies those alterations to all matrix values).</p>
<p>I have actually written an article where I teach <a target="_blank" href="https://www.freecodecamp.org/news/activation-functions-in-neural-networks/">why activation functions are important</a> if you want to check it out.</p>
<p>But the best example of this fusion of math with engineering is in <a target="_blank" href="https://www.freecodecamp.org/news/basic-control-theory-with-python/">control theory</a>.</p>
<p>Control theory is the study of the architecture of systems. From trains to cars to airplanes, everything is based on control theory. It is everywhere in nearly all modern electronic devices. In electric circuits, control theory is also used heavily to guarantee circuit stability in the face of electric disturbances.</p>
<p>So as you can probably start to see, many of the tools we now have are just a mixture of many pure math ideas. Just many combinations and recipes of pure math ideas. In essence, applied math is the application of pure math as “ingredients“ in "recipes" to solve problems.</p>
<p>So, we’ve explored the structure and evolution of mathematics. Yet, it is important to see how these ideas can be applied in real life. Pure math makes the framework, and applied math applies that framework to solve problems. To understand this, we’ll examine two code examples that show how you can use math ideas as programming tools.</p>
<h2 id="heading-code-examples-analytical-and-numerical-approaches">Code Examples – Analytical and Numerical Approaches</h2>
<p>These code examples demonstrate a couple ways you can use Python to solve math equations.</p>
<p>In the first code example, we’ll solve the problem in the same way that kids in school solve math exercises: essentially, by hand with a pencil. Moving variables from left to right to find their values. In the second example, we’ll solve the problem using numerical analysis.</p>
<h3 id="heading-example-1-solve-a-problem-analytically">Example 1: Solve a Problem Analytically</h3>
<p>When we solve math problems analytically, like we did in school, we are manipulating symbols to get exact values. Often there symbols are x, y and z. In Python, we can do this using the SymPy library:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sympy <span class="hljs-keyword">import</span> symbols, Eq, solve

x, y = symbols(<span class="hljs-string">'x y'</span>)
eq1 = Eq(<span class="hljs-number">2</span>*x + <span class="hljs-number">3</span>*y, <span class="hljs-number">6</span>)
eq2 = Eq(-x + y, <span class="hljs-number">1</span>)

solution = solve((eq1, eq2), (x, y))
print(solution)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747160359386/7a21cddc-f4ba-4f9f-afa0-d1cc11fb27d6.png" alt="7a21cddc-f4ba-4f9f-afa0-d1cc11fb27d6" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Essentially, we are finding x and y based on this equation:</p>
<p>$$\begin{align*} 2x + 3y &amp;= 6 \\ -x + y &amp;= 1 \end{align*}$$</p><p>Which gives us the following result:</p>
<pre><code class="lang-python">{x: <span class="hljs-number">3</span>/<span class="hljs-number">5</span>, y: <span class="hljs-number">8</span>/<span class="hljs-number">5</span>}
</code></pre>
<p>Or:</p>
<ul>
<li><p>x= 0.6</p>
</li>
<li><p>y = 1.6</p>
</li>
</ul>
<p>When we say that we’re solving this analytically, it means that we’re finding an exact mathematical solution using formulas or equations.</p>
<p>But many times, problems are harder and can be solved by adding symbols to the right or left of the equation.</p>
<p>Sometimes, there can be so many symbols and transformed versions of them, with things like derivatives and integrals, that it can become very hard to manage and takes a lot of time.</p>
<p>For this reason, there is an area of mathematics devoted to finding approximations of already created mathematical formulas called numerical analysis. It makes it faster to solve these problems. And this is the method we will explore next.</p>
<h3 id="heading-example-2-solve-numerically-approximation">Example 2: Solve Numerically (Approximation)</h3>
<p>We’ll now use SciPy to solve the same system with numerical methods:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> scipy.linalg <span class="hljs-keyword">import</span> solve

A = np.array([[<span class="hljs-number">3</span>, <span class="hljs-number">2</span>, <span class="hljs-number">-1</span>, <span class="hljs-number">4</span>, <span class="hljs-number">5</span>],
              [<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">3</span>, <span class="hljs-number">2</span>, <span class="hljs-number">-2</span>],
              [<span class="hljs-number">4</span>, <span class="hljs-number">-1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>],
              [<span class="hljs-number">5</span>, <span class="hljs-number">3</span>, <span class="hljs-number">-2</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>],
              [<span class="hljs-number">2</span>, <span class="hljs-number">-3</span>, <span class="hljs-number">1</span>, <span class="hljs-number">3</span>, <span class="hljs-number">4</span>]])

b = np.array([<span class="hljs-number">12</span>, <span class="hljs-number">5</span>, <span class="hljs-number">7</span>, <span class="hljs-number">9</span>, <span class="hljs-number">10</span>])

solution = solve(A, b)

print(solution)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747160347486/d1f17aa6-b288-4e41-9be7-0810c45e778c.png" alt="d1f17aa6-b288-4e41-9be7-0810c45e778c" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>In this code example, this line of code:</p>
<pre><code class="lang-python">solution = solve(A, b)
</code></pre>
<p>Uses the <a target="_blank" href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.solve.html">solve</a> method from the <a target="_blank" href="https://scipy.org/">SciPy</a> Python library:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> scipy.linalg <span class="hljs-keyword">import</span> solve
</code></pre>
<p>It’s a method that helps you find the values of x in an equation A⋅x=b, where a is a square grid of numbers and b is a list of numbers. Which gives us the following:</p>
<pre><code class="lang-python">[ <span class="hljs-number">1.35022026</span> <span class="hljs-number">-0.79955947</span> <span class="hljs-number">-1.17180617</span>  <span class="hljs-number">3.14317181</span> <span class="hljs-number">-0.83920705</span>]
</code></pre>
<p>Now imagine, in this simple case, that a matrix like A could represent the <strong>traffic flow</strong> between cities or intersections, and b could represent the <strong>traffic entering or leaving</strong> each city.</p>
<p>By solving the system, it could help us determine the distribution of traffic between cities to meet desired traffic conditions.</p>
<p>Of course, these types of problems are far more complex in real life. But to understand and solve the big problems, you need to first understand the smaller problems.</p>
<p>And by the way, a system of equations is the same thing as a matrix. We just represent systems of equations as matrices to make the findings of properties and clarity easier to understand.</p>
<p>The thing is that by using matrices, it is easier to make calculations and to perform linear algebra math to check for characteristics of the matrix and understand it better.</p>
<p>In essence, a matrix represents a system of equations. Also, systems of equations can represent real life phenomena like the economy of a country or the weather.</p>
<p>If you want to know more, I wrote an <a target="_blank" href="https://www.freecodecamp.org/news/numerical-analysis-explained-how-to-apply-math-with-python/">entire article on numerical analysis</a> that you can check out.</p>
<h2 id="heading-the-impact-of-a-grand-unified-theory-of-mathematics">The Impact of a Grand Unified Theory of Mathematics</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747518681068/54a9556c-2a79-441c-a6d6-27ff38e1f4ff.jpeg" alt="Photo by Porapak Apichodilok: https://www.pexels.com/photo/person-holding-world-globe-facing-mountain-346885/" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Despite the biggest paradox in mathematics, what would happen with a <a target="_blank" href="https://www.scientificamerican.com/article/the-evolving-quest-for-a-grand-unified-theory-of-mathematics/">Grand Unified Theory of Mathematics</a>?</p>
<p>Remember that such a theory tells us that there are things that are true that are impossible to formally prove, and we need to just accept it. But even with this assumption, it is still possible to unify all math.</p>
<p>This is what <a target="_blank" href="https://en.wikipedia.org/wiki/Langlands_program">the Langland's program</a> is trying to solve. A kind of attempt to interconnect the largest parts of the big tree of math to uncover new patterns in math.</p>
<p>With a Grand Unified Theory of Mathematics, we would be able to understand how every branch of the tree connects with the others and all the relationships between them.</p>
<h3 id="heading-what-is-the-value-of-this-big-unification-for-society">What is the value of this big unification for society?</h3>
<p>By studying history, we can find patterns. The unification of various fields has created many massive impacts on society, such as:</p>
<ul>
<li><p>In the 19th century, James Clerk Maxwell united the fields of <em>electricity</em> and <em>magnetism</em> with his famous Maxwell equations. This allowed the creation of radios and electric grids around the globe. In turn, it served as a foundation for all technological progress in the 20th and 21th century.</p>
</li>
<li><p>In the 20th century, the unification of <em>algebra</em> with <em>logic</em> led to the rise of digital systems. In turn, digital systems gave the rise of processors and the evolution of computers to the modern laptop.</p>
</li>
<li><p>Also in the 20th century, the unification of <em>probability</em> and <em>communication</em> led to information theory. This became the foundation for the internet. This unification was carried out by a great mathematician called Clause Shannon.</p>
</li>
</ul>
<p>In the end, a Grand Unified Theory of Mathematics could be one of the biggest achievements in modern society.</p>
<p>It could lead to new discoveries in physics, such as in string theory or quantum gravity, where deep mathematical structures are needed to create new physics. In AI, it could help unify all machine learning models in a common architecture. This would help accelerate the development of new AI models. It could also open the door to new cryptographic methods and material science advances, revealing, with math, the deep patterns still not found in these fields.</p>
<p>Just as uniting electricity and magnetism led to modern technology, a unified math framework would lead to a wave of innovation.</p>
<h2 id="heading-a-final-lesson-from-history">A Final Lesson From History</h2>
<p>From Greek geometry to AI, math has grown like a tree over centuries. By understanding its structure, it is possible to see its role in finding the patterns of our universe. I hope I was able to make you see math in this way.</p>
<p>In addition, we can conclude that the unification of scientific fields makes the foundations for the creation of new innovations to help society go forward. Many profound societal transformations only came to be thanks to abstract math ideas. When these are shared and refined, they become the hidden architecture of progress in society. Innovation begins when disconnected ideas are united, well-linked, and widely shared.</p>
<p>Find the full code <a target="_blank" href="https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code">here</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Refactor Complex Codebases – A Practical Guide for Devs ]]>
                </title>
                <description>
                    <![CDATA[ Developers often see refactoring as a secondary concern that they can delay indefinitely because it doesn’t immediately contribute to revenue or feature development. And managers frequently view refactoring as "not a business need" until it boils ove... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-refactor-complex-codebases/</link>
                <guid isPermaLink="false">682df5a0f2057ab279952dbe</guid>
                
                    <category>
                        <![CDATA[ Web Development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ webdev ]]>
                    </category>
                
                    <category>
                        <![CDATA[ software development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ code review ]]>
                    </category>
                
                    <category>
                        <![CDATA[ refactoring ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ankur Tyagi ]]>
                </dc:creator>
                <pubDate>Wed, 21 May 2025 15:47:44 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1747835131515/f6ea465a-9b14-4918-8943-87ec225b19b3.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Developers often see refactoring as a secondary concern that they can delay indefinitely because it doesn’t immediately contribute to revenue or feature development.</p>
<p>And managers frequently view refactoring as "not a business need" until it boils over and becomes the most significant business need possible.</p>
<blockquote>
<p><em>"Oh, our software somehow works. We can't implement any new changes. And oh, everyone is quitting because work is miserable."</em></p>
</blockquote>
<p>In this article, I’ll walk you through the steps I use to refactor a complex codebase. We’ll talk about setting goals, writing tests, breaking up monoliths into smaller modules, verifying changes, making sure existing features still work, and keeping tabs on performance. I’ll also show you how to speed up reviews using AI tools.</p>
<p>By following these steps, you can turn complex, fragile code into a clean, reliable codebase your team can own.</p>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXccvZ3sehF8oGifjnapnY9AUcPde9aKy9t_YEUeL8M2s3dcwxFq_bJLCSp_S02fIvfbwzpZfkz7e-2JQpXpzcdqELqs80EjkLLRpz0Uat6q9_RcRM5VQbjLoUxA2GlaqyeolsKGeA?key=nBTgfzmVkL2-N7DBMJ6e6gyk" alt="code-refactoring" width="600" height="400" loading="lazy"></p>
<h2 id="heading-the-issue-of-technical-debt">The Issue of Technical Debt</h2>
<p>As projects grow and evolve, <a target="_blank" href="https://en.wikipedia.org/wiki/Technical_debt">technical debt</a> increases. Code that was once functional and manageable turns into an unmaintainable mess, where even small changes become risky and time-consuming.</p>
<p>Despite the obvious need for cleanup, refactoring rarely gets prioritized because there's always something more urgent, new features, bug fixes, and client demands.</p>
<p>I’ve had conversations with engineers, many of whom are working on enterprise software and are fully aware of their codebase's code smells and inconsistencies. They dislike the situation but feel powerless to change it.</p>
<p>So how do we shift from a culture of writing for pure functionality to a culture that values maintainability, especially for complex codebases?</p>
<p>It’s usually a mistake to completely halt new feature development for a long refactoring period (except perhaps in emergencies). Business needs still exist, and putting everything on hold can create tension and lost opportunities. It’s better to find a balance so you’re still delivering value to users even as you clean under the hood.</p>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXeZx-XKCA2DC6kQQe2-4NU07wKEm0_VZ4kqEjbF6u2vy2paRigdNRUGjr-_AoE6ueNjCxNjnB-mI7uroXFhJ0nFfvWzwYq2VUMsdsPhXu4KvGYSZcUN0nFmKg8U8WzgGJQAgKtUaw?key=nBTgfzmVkL2-N7DBMJ6e6gyk" alt="Uncle-bob-take-on-refactoring" width="600" height="400" loading="lazy"></p>
<p>While there is no one-size-fits-all solution, a structured approach can help teams introduce sustainable refactoring practices, even in environments where management is resistant. Let’s explore how this works.</p>
<h2 id="heading-table-of-contents"><strong>Table of Contents:</strong></h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-refactoring">What is Refactoring?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-preparing-for-refactoring">Preparing for Refactoring</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-secure-management-buy-in">Secure Management Buy-in</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-ensure-a-safety-net-with-automated-testing">Ensure a Safety Net with Automated Testing</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-identify-high-risk-areas">Identify High-Risk Areas</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-set-clear-refactoring-goals">Set Clear Refactoring Goals</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-techniques-for-refactoring-complex-codebases">Techniques for Refactoring Complex Codebases</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-1-identifying-and-isolating-problem-areas">1. Identifying and Isolating Problem Areas</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-2-incremental-vs-big-bang-refactoring">2. Incremental vs. Big Bang Refactoring</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-3-breaking-down-monolithic-code">3. Breaking Down Monolithic Code</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-4-ensuring-backward-compatibility">4. Ensuring Backward Compatibility</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-5-handling-dependencies-and-tight-coupling">5. Handling Dependencies and Tight Coupling</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-6-testing-strategiessafely-refactoring-with-confidence">6. Testing Strategies (Safely Refactoring with Confidence)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-7-refactoring-without-breaking-performance">7. Refactoring Without Breaking Performance</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-8-automate-code-reviews-with-ai-tools">8. Automate Code Reviews with AI Tools</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-summary">Summary</a></p>
</li>
</ul>
</li>
</ul>
<h2 id="heading-what-is-refactoring"><strong>What is Refactoring?</strong></h2>
<p>Many people all too often use the word "refactor" when they mean a targeted rewrite.</p>
<p>As Martin Fowler famously said,</p>
<blockquote>
<p><em>“Refactoring is a controlled technique for improving the design of an existing code base. Its essence is applying a series of small behavior-preserving transformations... However, the cumulative effect... is quite significant.”</em>​</p>
</blockquote>
<p>In practice, this means continuously polishing code to reduce complexity and technical debt.</p>
<p>While traditional software development follows a linear approach of designing first and coding second, real-world projects often evolve in ways that lead to structural decay. Refactoring counteracts this by continuously refining the codebase, transforming disorganized or inefficient implementations into well-structured, maintainable solutions.</p>
<p>A targeted rewrite is a focused overhaul of a specific aspect of an application, often affecting multiple parts of the codebase. It carries more risk than refactoring but is still controlled and contained.</p>
<h2 id="heading-preparing-for-refactoring">Preparing for Refactoring</h2>
<p>Even the most skilled refactoring effort can stall without proper preparation. Before you start moving code around, laying a foundation that will keep your work organized and your team on the same page is crucial.</p>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcr3hNpzC9XPUVnG6d7uHuC977aYrG2VVOH-8E4WhzM5Rfz3vzPDUPTwJChrK0l7WUK8BLTzYr5-295_27ARWQvcmjufXOk68Bg8szUjEq3IFVCDO0XfTSRFy1LaxqyjvjVDNddsw?key=nBTgfzmVkL2-N7DBMJ6e6gyk" alt="martin-fowler-on-refactoring" width="600" height="400" loading="lazy"></p>
<p>Here are some steps you can take to ensure your refactoring efforts are successful.</p>
<h3 id="heading-secure-management-buy-in">Secure Management Buy-in</h3>
<p>As I’ve already discussed, getting time for refactoring can be difficult in feature-driven organizations. Often, management will accept refactoring investment if you can tie it to business outcomes, faster time to market, fewer outages (which translates to happier customers), and the ability to take on new initiatives.</p>
<p>Make those connections explicit. For example, you could say:</p>
<blockquote>
<p><em>“If we refactor our reporting engine now, it will make it feasible to add the analytics module next quarter, which unlocks a new revenue stream.”</em></p>
</blockquote>
<p>Or use data:</p>
<blockquote>
<p><em>“We spent 30% of our last sprint fixing bugs in module Y. After refactoring Y, we expect that to drop significantly, freeing time for new features.”</em></p>
</blockquote>
<p>Business-minded arguments help justify the balance.</p>
<h3 id="heading-ensure-a-safety-net-with-automated-testing">Ensure a Safety Net with Automated Testing</h3>
<p>As you refactor, tests are your safety net. Before modifying a component, write characterization tests around it if they don’t exist.</p>
<pre><code class="lang-python"><span class="hljs-comment"># example: characterization test for a legacy function</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">legacy_calculate_discount</span>(<span class="hljs-params">price, rate</span>):</span>
    <span class="hljs-comment"># ... complex logic you don't fully understand yet ...</span>
    <span class="hljs-keyword">return</span> price * (<span class="hljs-number">1</span> - rate/<span class="hljs-number">100</span>) <span class="hljs-keyword">if</span> rate &lt; <span class="hljs-number">100</span> <span class="hljs-keyword">else</span> <span class="hljs-number">0</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_legacy_calculate_discount</span>():</span>
    <span class="hljs-comment"># capture existing behavior</span>
    <span class="hljs-keyword">assert</span> legacy_calculate_discount(<span class="hljs-number">100</span>, <span class="hljs-number">10</span>) == <span class="hljs-number">90</span>
    <span class="hljs-keyword">assert</span> legacy_calculate_discount(<span class="hljs-number">50</span>, <span class="hljs-number">200</span>) == <span class="hljs-number">0</span>
</code></pre>
<p>These tests capture the current behavior, so you’ll know if you accidentally change it. Unit tests, integration tests, and e2e tests all validate that refactoring hasn’t broken anything.</p>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfWfke-9FxoQIPFwRWVoIWrYN7L40mEmhpdAUkcBm34mwzXJ0R8jXKH8rZ0HjAghAtQ-v6dTUYYvK0T8_QBgyfeab-7R50pnB6BgdDm9L4PkFwvwGlUYTHNo21f37fxMZYt3xeY?key=nBTgfzmVkL2-N7DBMJ6e6gyk" alt="automated-testing-is-imp-for-refactoring" width="600" height="400" loading="lazy"></p>
<p>It’s often worth investing time in setting up a continuous integration pipeline so that every change triggers automated tests. This gives rapid feedback and confidence that you’re not introducing regressions. Robust testing and CI/CD enable you to move faster and refactor with peace of mind.</p>
<pre><code class="lang-powershell"><span class="hljs-comment"># .github/workflows/ci.yml</span>
name: CI
on: [<span class="hljs-type">push</span>, <span class="hljs-type">pull_request</span>]
jobs:
  test:
    runs<span class="hljs-literal">-on</span>: ubuntu<span class="hljs-literal">-latest</span>
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup<span class="hljs-literal">-python</span>@v4
        with: python<span class="hljs-literal">-version</span>: <span class="hljs-string">'3.10'</span>
      - run: pip install <span class="hljs-literal">-r</span> requirements.txt
      - run: pytest -<span class="hljs-literal">-maxfail</span>=<span class="hljs-number">1</span> -<span class="hljs-literal">-disable</span><span class="hljs-literal">-warnings</span> <span class="hljs-literal">-q</span>
</code></pre>
<h3 id="heading-identify-high-risk-areas">Identify High-Risk Areas</h3>
<p>The first step is to figure out what to refactor. High-risk areas are parts of the code likely to cause bugs or slow development. Common signs include long methods, large classes, duplicate code, and complex conditional logic​.</p>
<p>Such code “smells” often hint at deeper design problems. Tools like static analysis can automatically flag these issues.</p>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfS4aFy2hyRSq3UmgB2gQ8NN_-yUksNXcSavTtpnL8KIiWpGGidCSCstLKANZGOjJLqEF69wp-xjMGH6jrjurSaFtUIMS09vUaDgJ6vGtyabP-4QC5ISmT_cMvaaw6c2KlyVa1CKQ?key=nBTgfzmVkL2-N7DBMJ6e6gyk" alt="SonarQube-dashboard" width="600" height="400" loading="lazy"></p>
<p>For example, SonarQube will mark code smells (like high complexity or long methods) that increase technical debt​. Using SonarQube or similar tools, you can generate reports on code complexity (for example, cyclomatic complexity metrics​) and find hotspots in the codebase that need more attention.</p>
<h3 id="heading-set-clear-refactoring-goals">Set Clear Refactoring Goals</h3>
<p>Before refactoring code, define the goal.</p>
<p>Goals must be specific and measurable. For example, you might aim to reduce a class’s size or a function’s <a target="_blank" href="https://www.ibm.com/docs/en/raa/6.1.0?topic=metrics-cyclomatic-complexity">cyclomatic complexity</a> by a certain amount or to increase unit test coverage from 60% to 90%.</p>
<p>Each goal is tied to a measurable outcome: shorter methods, fewer if statements or classes with a single responsibility, faster execution for processing orders, higher test coverage, and no unused code. These targets will guide our refactoring plan and let us verify when we’ve succeeded.</p>
<p><strong>Tip:</strong> Write down your refactoring goals and share them with your team. This sets expectations that you’re not adding new features in this effort, just making the code cleaner and more robust. It also helps justify the time spent by showing the benefits (like more straightforward future additions and fewer bugs).</p>
<h2 id="heading-techniques-for-refactoring-complex-codebases">Techniques for Refactoring Complex Codebases</h2>
<h3 id="heading-1-identifying-and-isolating-problem-areas">1. Identifying and Isolating Problem Areas</h3>
<p>It can be overwhelming to decide where to start refactoring a large codebase. Not every part of the code needs refactoring – some areas are delicate or rarely touched.</p>
<p>The most impactful refactoring efforts typically target the “problem areas”: parts of the codebase that are overly complex, error-prone, or act as bottlenecks for development and performance. Identifying these areas is a crucial first step.</p>
<h3 id="heading-techniques-for-finding-hotspots">Techniques for Finding Hotspots</h3>
<h4 id="heading-team-knowledge-amp-developer-frustration">Team knowledge &amp; developer frustration</h4>
<p>Don’t underestimate the value of anecdotal information from the team. Which parts of the code do developers dread working in? Often, the team’s instincts point to areas that are hard to understand or modify (for example, “the accounting module is a black box, we hate touching it”). These could be areas to improve.</p>
<p>In my experience, simply asking, “If you had a magic wand, which part of the code would you rewrite?” yields very insightful answers.</p>
<h4 id="heading-code-complexity-metrics">Code complexity metrics</h4>
<p>Use static analysis tools to measure cyclomatic complexity, code duplication, large functions/classes, and so on. Files or modules with extremely high complexity numbers or thousands of lines are good candidates for scrutiny. But static complexity alone doesn’t tell the whole story – a file might be ugly but rarely touched.</p>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXc07SWwlu4GxU6AwoXQEHyyEcQY-6YMOEPr7b7Quhk5UvLD7qx9XyZla2SzP32eGFoYY_Xy-SYZQ9mOMX7Mxeq1YCnFXQxudsMNbvak9CLZfSOeRIvdll_pLW56sAmvRcPZMk36Rg?key=nBTgfzmVkL2-N7DBMJ6e6gyk" alt="SonarQube" width="600" height="400" loading="lazy"></p>
<h4 id="heading-change-frequency-churn">Change frequency (Churn)</h4>
<p>Look at version control history to see which files are often changed, especially those associated with bug fixes or incidents.</p>
<h4 id="heading-hotspot-analysis">Hotspot analysis</h4>
<p>A robust approach combines complexity and change frequency to find “hotspots.” For example, a tool or technique plotting modules by their complexity and how often they change can highlight the problematic areas. CodeScene (a code analysis tool) popularized this: <em>hotspots</em> are parts of the code that are highly complex and frequently modified, indicating areas where “paying down debt has a real impact”​.</p>
<p>If a module is a mess and developers are in it every week, improving that module will likely yield outsized benefits (fewer bugs, faster adds).</p>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdJkGfbDK6UFDN9hqzeyCMBWmajADhAMJwzSouyMNz_63o9SRNfOly9AP_XiY2jqfi02fHSIFkMBCfstkjJfkxVB-NaHCSit0xssTYfztZ2BRQZmqYr_lTc3R750-1-lrJi7eeViQ?key=nBTgfzmVkL2-N7DBMJ6e6gyk" alt="code-health-dashboard" width="600" height="400" loading="lazy"></p>
<h4 id="heading-performance-bottlenecks-and-crashes">Performance bottlenecks and crashes</h4>
<p>Some parts of the codebase become targets for refactoring because they cause frequent performance problems or outages. For instance, if a specific service or job crashes often or can’t keep up with the load, you might need to refactor it for stability.</p>
<h3 id="heading-how-to-isolate-problem-areas">How to Isolate Problem Areas</h3>
<p>Once you’ve identified a hotspot or problem area, the next challenge is isolating it so you can refactor safely. In a complex system, nothing lives in complete isolation. That problematic module likely interacts with many others.</p>
<p>Here are strategies to isolate and tackle it:</p>
<h4 id="heading-break-dependencies-create-seams">Break dependencies (Create seams)</h4>
<p>Michael Feathers (in <em>Working Effectively with Legacy Code</em>) introduced the concept of “seams” – places where you can cut into a codebase to isolate a part for testing or refactoring. This might mean introducing an interface or abstraction between components so you can work on one side independently.  </p>
<p>For example, suppose PaymentService is tightly coupled to StripeGateway, with direct calls scattered throughout the code.</p>
<pre><code class="lang-python"><span class="hljs-comment"># payment_service.py</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">charge_customer</span>(<span class="hljs-params">order_id, amount</span>):</span>
    <span class="hljs-comment"># Hardcoded dependency to Stripe</span>
    stripe = StripeGateway()
    stripe.charge(order_id, amount)
</code></pre>
<p>To isolate and refactor the payment logic safely, you can introduce a <code>PaymentProcessor</code> interface and have <code>PaymentService</code> depend on that interface instead. Then, create an adapter like StripeAdapter that implements PaymentProcessor and delegates to the existing Stripe logic.</p>
<p>This way, you can safely refactor or even replace the Stripe integration behind the StripeAdapter without impacting <code>PaymentService</code> or any other module that uses it. As long as the <code>PaymentProcessor</code> interface is honored, the rest of the system remains unaffected.</p>
<pre><code class="lang-python"><span class="hljs-comment"># interfaces.py</span>

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">PaymentProcessor</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">charge</span>(<span class="hljs-params">self, order_id, amount</span>):</span>
        <span class="hljs-keyword">raise</span> NotImplementedError


<span class="hljs-comment"># stripe_adapter.py</span>

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">StripeAdapter</span>(<span class="hljs-params">PaymentProcessor</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">charge</span>(<span class="hljs-params">self, order_id, amount</span>):</span>
        <span class="hljs-comment"># Internally still uses Stripe</span>
        stripe = StripeGateway()
        stripe.charge(order_id, amount)


<span class="hljs-comment"># payment_service.py (Refactored)</span>

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">PaymentService</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, processor: PaymentProcessor</span>):</span>
        self.processor = processor

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">charge_customer</span>(<span class="hljs-params">self, order_id, amount</span>):</span>
        self.processor.charge(order_id, amount)
</code></pre>
<h4 id="heading-branch-by-abstraction">“Branch-by-abstraction”</h4>
<p>This technique is related to the above and is often used in continuous delivery. The idea is to add a layer of abstraction (like an interface or proxy) in front of the old code, have both old and new code implementations behind it, and then gradually shift usage from the old to the new implementation. For a while, you might have a temporary state where both versions exist (perhaps toggled by a config or feature flag).</p>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcaFoXSHVYTBz_1DOsucPkvwGQwfo9qrvhPYvvjYOXQsLIh2MCTfseB1g9SOfijpdKMwcwmK4lfPWcyhn4vf5gaFwdliKUZUGDOcQVJ0qupRLjvnhFrSm5LZfe8OoqZtZkHkj9IXw?key=nBTgfzmVkL2-N7DBMJ6e6gyk" alt="Branch-by-abstraction" width="600" height="400" loading="lazy"></p>
<p>This is similar to how the strangler fig pattern works at an architectural level. It’s a bit of extra work (since you maintain two paths for a while), but it allows you to migrate functionality and fall back if needed incrementally.</p>
<p>Aim to identify the 20% of the code causing 80% of the problems. Focus your refactoring energy there for maximum impact. When you do, create a plan to isolate that area via abstractions, interfaces, modules, or other means so that you can work on it with minimal risk of side effects. The more you can contain the blast radius of a refactoring, the more confidently you can move forward.</p>
<h3 id="heading-2-incremental-vs-big-bang-refactoring">2. Incremental vs. Big Bang Refactoring</h3>
<p>One of the first strategic decisions is approaching the refactor <strong>incrementally</strong> or going for a <strong>“big bang”</strong> overhaul. In most cases, an incremental approach is preferable, but there are scenarios where more significant coordinated refactoring steps are considered.</p>
<p><strong>Let’s break down what these mean:</strong></p>
<pre><code class="lang-python"><span class="hljs-comment"># before: one large function with multiple responsibilities</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">process_order</span>(<span class="hljs-params">order</span>):</span>
    validate(order)
    apply_discount(order)
    save_to_db(order)
    send_confirmation(order)
    log_metrics(order)
    update_loyalty_points(order)
    <span class="hljs-comment"># potentially more steps </span>

<span class="hljs-comment"># after: refactored incrementally into clearer, smaller units</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">process_order</span>(<span class="hljs-params">order</span>):</span>
    validate(order)
    apply_discount(order)
    persist_and_notify(order)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">persist_and_notify</span>(<span class="hljs-params">order</span>):</span>
    save_to_db(order)
    send_confirmation(order)
    log_metrics(order)
    update_loyalty_points(order)
</code></pre>
<h4 id="heading-incremental-refactoring">Incremental refactoring</h4>
<p>This means making small, manageable changes over time rather than attempting a massive overhaul in one shot. The system should remain functional at each step (even internally in transition). The advantage is risk mitigation: each small change is less likely to go wrong, and it’s easier to pinpoint and fix if it does.</p>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdaSmnIWRE9FKNmmABBzc6Tk6KFwsj29FQ2YwyQ_kWqryheb0yUdpec51lQHg5XahoxKgCm4vv9twD849H3Yo5dn0678tuGih9Z-HfBBCfhBngs4YhpH6x2pjzqnAeDVYGohXHvDQ?key=nBTgfzmVkL2-N7DBMJ6e6gyk" alt="Incremental refactoring" width="600" height="400" loading="lazy"></p>
<p>Incremental delivery lets you confirm changes in production and makes diagnosing issues easier since you’re only changing one small thing at a time​. It also means the system keeps running during the refactor, so there’s less pressure to rush to “get the system back to working condition”​. If priorities shift, you can pause after some increments and still have a working product.</p>
<h4 id="heading-big-bang-refactoring-rewrite">Big bang refactoring (Rewrite)</h4>
<p>This is the “tear it down and rebuild” approach. You stop adding new features, possibly freeze the code for a period, and devote a considerable effort to redesigning or rewriting a significant portion (or the entirety) of the system. The idea is to emerge on the other side with a <em>brand new, clean</em> system.</p>
<p>So when (if ever) is a big bang justified? Perhaps when the existing system is truly untenable – for example, an outdated technology that <strong>must</strong> be replaced (such as a platform that can’t meet new performance or security requirements or code written in a language no longer supported). Even then, wise teams often simulate a big bang by breaking it into stages or developing the new system in parallel.</p>
<p>Whenever possible, favor an incremental refactoring strategy. Teams successfully pull off massive transformations by treating the big refactor as a series of mini-refactors under a shared vision.</p>
<h3 id="heading-3-breaking-down-monolithic-code">3. Breaking Down Monolithic Code</h3>
<p>Many complex codebases start life as a single monolithic application, one deployable, a single code project, or a tightly coupled set of modules all maintained and released together.</p>
<p>Over time, monoliths can become unwieldy, builds take forever, a change in one area can unintentionally affect another, and teams can be complex to scale because everyone is stepping on each other’s toes in the same code. A common refactoring challenge for senior engineers is modularising or splitting a monolith into more manageable pieces.</p>
<pre><code class="lang-python"><span class="hljs-comment"># define the interface</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">PaymentProcessor</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">charge</span>(<span class="hljs-params">self, amount</span>):</span> ...

<span class="hljs-comment"># old implementation</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">LegacyProcessor</span>(<span class="hljs-params">PaymentProcessor</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">charge</span>(<span class="hljs-params">self, amount</span>):</span>
        <span class="hljs-comment"># original code</span>

<span class="hljs-comment"># new implementation behind a feature flag</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">NewProcessor</span>(<span class="hljs-params">PaymentProcessor</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">charge</span>(<span class="hljs-params">self, amount</span>):</span>
        <span class="hljs-comment"># cleaner code</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_processor</span>():</span>
    <span class="hljs-keyword">if</span> config.feature_new_payment:
        <span class="hljs-keyword">return</span> NewProcessor()
    <span class="hljs-keyword">return</span> LegacyProcessor()

<span class="hljs-comment"># usage remains the same</span>
processor = get_processor()
processor.charge(<span class="hljs-number">100</span>)
</code></pre>
<h4 id="heading-strategies-for-modularization">Strategies for modularization.</h4>
<ul>
<li><p><strong>Layer separation:</strong> Start by enforcing logical layer boundaries. For example, separate the user interface code from business logic and separate business logic from data access. In a messy monolith, these concerns often get mixed together. By organizing the code into layers (even within the same repository), you can limit the ripple effect of changes.</p>
</li>
<li><p><strong>Domain-based modularization:</strong> If your system spans multiple business domains or functional areas, consider splitting along those lines. For example, an e-commerce monolith might be separated into modules like Accounts, Orders, Products, Shipping, and so on.<br>  Each could become a subsystem or a package. The goal is to minimize the information these modules need to know about each other’s internals (high cohesion within modules and clear APIs between them).</p>
</li>
<li><p><strong>Microservices or services extraction:</strong> In recent years, the trend has been to break monoliths into microservices, independent services that communicate over APIs. This form of architectural refactoring can significantly improve independent deployability and scalability. But it’s a significant undertaking with complexities (distributed systems, network calls, and so on). If you decide to go this route, do it gradually.<br>  A proven method is the <strong>strangler fig pattern</strong> mentioned earlier: you pick one piece of functionality and rewrite or extract it as a separate service, redirect traffic or calls to the new service. At the same time, the rest of the monolith remains intact and iteratively does this for other pieces​.</p>
</li>
<li><p><strong>Modular monolith:</strong> Not every system needs to go full microservices. There’s an approach called a modular monolith, essentially structuring your single application into well-defined modules that communicate via explicit interfaces (almost like internal microservices but without the overhead of separate deployments).</p>
</li>
</ul>
<p>This can give you many microservices' advantages (clear boundaries, separate development responsibility) while avoiding operational complexity.</p>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfLiNAEDyOsR4G_q1oQS3jpSenci3XDJRm10Gy3picTpaO9uHwme2H3YkbJF-Jrvqq3Q-QMxGjJJwy04mqUf1a7D8IRsCDER5pHBT6GTMPRkao5EXXIFGtj4Iki15mOHmRKRLTiWw?key=nBTgfzmVkL2-N7DBMJ6e6gyk" alt="microservices' advantages" width="600" height="400" loading="lazy"></p>
<ul>
<li><strong>Identify shared utilities vs. truly independent components:</strong> In breaking down a monolith, some code is widely shared (like utility functions or cross-cutting concerns such as authentication). It might make sense to factor those into libraries or services <em>first</em>, as they will be needed by whatever other pieces you split out.</li>
</ul>
<p>While breaking down a monolith, maintaining functionality during the transition is essential. Techniques like backward compatibility (discussed next) and thorough testing will be your safety net.</p>
<p>Finally, be prepared for the team workflow to change. If you move to microservices, teams might take ownership of different services, requiring more DevOps and communication across teams. If you keep a modular monolith, enforce code ownership or review rules to keep the modules from tangling up again (for example, you might restrict direct database access from one module to another’s tables, and so on).</p>
<h3 id="heading-4-ensuring-backward-compatibility">4. Ensuring Backward Compatibility</h3>
<p>A critical concern during large refactoring is: <em>Will our changes break existing contracts</em>?</p>
<p>In other words, can other systems, modules, or clients that rely on our code work as expected after we refactor? Backward compatibility is especially important if your codebase provides public APIs (to external customers or other teams), data persisted in a certain format, configuration files that users have written, etc.</p>
<p>Here are some strategies and considerations to maintain backward compatibility:</p>
<p>Suppose you have a widely-used function like <code>send_email(to, subject, body)</code>. You want to refactor the internal logic to support additional features like HTML formatting, but you don’t want to break existing callers.</p>
<p>Instead of changing the function signature, you keep the public API unchanged and delegate to a new internal function:</p>
<pre><code class="lang-python"><span class="hljs-comment"># original API</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">send_email</span>(<span class="hljs-params">to, subject, body</span>):</span>
    <span class="hljs-comment"># send mail...</span>

<span class="hljs-comment"># refactored internals, keep signature</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">send_email</span>(<span class="hljs-params">to, subject, body</span>):</span>
    sendv2(to=to, subject=subject, body=body)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">sendv2</span>(<span class="hljs-params">to, subject, body, html=True</span>):</span>
    <span class="hljs-comment"># new implementation with HTML support</span>
</code></pre>
<p>The internal <code>send_email_v2()</code> function adds new capabilities like HTML formatting, but older code using <code>send_email()</code> still works without any modifications.</p>
<p>If you're introducing a new, improved version like <code>send_email_v2(to, subject, body, html=True)</code>, it's good practice to:</p>
<ul>
<li><p>Mark the old version (send_email) as deprecated in documentation.</p>
</li>
<li><p>Ensure the old version internally calls the new one.</p>
</li>
<li><p>Give other teams time to migrate at their own pace.</p>
</li>
</ul>
<h4 id="heading-use-versioning-for-external-apis">Use versioning for external APIs</h4>
<p>If your system provides an HTTP API or similar to external clients, the safest route for major changes is to version the API. Introduce a v2 API endpoint for the refactored logic, keep v1 running (maybe internally calling v2 or using a translation layer). Clients can move to v2 at their own pace.</p>
<p>It’s extra work to maintain two APIs temporarily, but it prevents a breaking change from angering users or causing outages. Always communicate changes clearly and provide migration guides if applicable.</p>
<h4 id="heading-have-a-clear-deprecation-policy">Have a clear deprecation policy</h4>
<p>Make sure there’s a policy (and communication) around how long deprecated features will be supported. For internal APIs, maybe it’s one release cycle. For external ones, maybe multiple cycles or never removal without a major version bump. A good practice is to announce deprecation early.</p>
<p>If you’re exposing an HTTP API, consider introducing a new versioned endpoint (for example, <strong>/api/v2/send_email</strong>) and maintain the older <strong>/api/v1/send_email temporarily</strong>. Internally, v1 might call v2 with default parameters, ensuring behavior stays consistent for existing clients.</p>
<p>In summary, maintain backward compatibility whenever possible, and implement a clear deprecation policy for anything you do change​.</p>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXe3xM4som_GQrtHXI3NNR0G-4KJ-1D2YO-JbNdT75IxZ5_upcBRDnOVp7krEESiqwwtXg18pDypLq3VxDr44Hof76cs8HajOZy2w0FZ50kWmPk6Y7EwNByNLNrqAokmhmmL5sP3AA?key=nBTgfzmVkL2-N7DBMJ6e6gyk" alt="Clear deprecation policy" width="600" height="400" loading="lazy"></p>
<h4 id="heading-write-adapter-or-compatibility-layers">Write adapter or compatibility layers</h4>
<p>In some cases, you can write an adapter to bridge old and new systems. For instance, suppose you refactor the underlying data model of your application, but you still have old configuration files in the old format. Rather than forcing all those files to be rewritten immediately, you could write a small adapter that translates the old format to the new one at runtime (or during startup). This way, old data continues to work. </p>
<h4 id="heading-test-for-compatibility">Test for compatibility</h4>
<p>Include tests that specifically ensure backward compatibility. For instance, if you have a public API, keep a suite of tests using the old API contracts and run them against the refactored code, they should still pass. </p>
<p>In summary, ensure that as you refactor, the external behavior and contracts remain consistent. This careful approach protects your users and downstream systems, allowing you to reap the internal benefits of refactoring without causing external chaos.</p>
<h3 id="heading-5-handling-dependencies-and-tight-coupling">5. Handling dependencies and tight coupling</h3>
<p>One of the hairiest aspects of refactoring a large codebase is dealing with deeply interdependent code. Complex systems often suffer from tight coupling. Module A assumes details about Module B and vice versa, global variables or singletons are used all over, or a change in one place ripples through half the codebase.</p>
<p>Reducing coupling is a significant aim of refactoring because it makes the code more modular, meaning each piece can be understood, tested, and changed independently. So, how do we gradually loosen the coupling in a legacy system?</p>
<p>Let’s go over some strategies to reduce coupling.</p>
<h4 id="heading-introduce-interfaces-or-abstraction-layers">Introduce interfaces or abstraction layers</h4>
<p>A very effective way to decouple is to put an interface between components. For example, if you have a class that directly queries a database, introduce an interface and have the class use that instead. The underlying database code implements the interface.</p>
<pre><code class="lang-python"><span class="hljs-comment"># before: direct instantiation</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">OrderService</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
        self.repo = OrderRepository()

<span class="hljs-comment"># after: inject dependency</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">OrderService</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, repo</span>):</span>
        self.repo = repo

<span class="hljs-comment"># wiring up in application startup</span>
repo = OrderRepository(db_conn)
service = OrderService(repo)
</code></pre>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfuMNvzC4x3X0EOgoRXzflfOv4C-Dxzc2Tm16KA0NdZcOH0nK300LUwcNzXCL6iqu0rhknHiVhnQN4csDCYUupQLc4Kt6Q4c7d1Pi47NfrXKoF9rhXCUMAhtozsDpFMVT2lo2OX5Q?key=nBTgfzmVkL2-N7DBMJ6e6gyk" alt="Introduce interfaces or abstraction layers" width="600" height="400" loading="lazy"></p>
<p>Now, that class no longer depends on how the data is fetched. Applying the dependency inversion principle depends on abstractions, not concretions.</p>
<h4 id="heading-use-dependency-injection">Use dependency injection</h4>
<p>Once you have interfaces, use dependency injection to supply concrete implementations. Many frameworks support DI containers, or you can do it manually (passing in dependencies via constructors). Dependency injection means code A doesn’t instantiate code B itself – instead, B is passed into A.  </p>
<p>This approach also makes unit testing easier (you can inject mock dependencies).</p>
<h4 id="heading-facades-or-wrapper-services">Facades or wrapper services</h4>
<p>If a particular subsystem is heavily entangled with others, consider creating a Facade, an object that provides a simplified interface to a larger body of code. Other parts of the system are then called the Facade, not the many internal methods of the subsystem. Internally, the subsystem can be refactored (even split into smaller pieces) as long as the Facade’s outward interface remains consistent.</p>
<p>This is similar to how microservices work (other services don’t care how one service is implemented internally – they just call its API), but you can do it in-process, too.</p>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXe_X2G_VNTR-I2EIp86SgPD3Zlks70Q4iG3BsqIs94PMgh-_qNfRk7ogT4mqONP7qXzg8PpN92k342-2nH6ertfy32Ga6SFH3PdSLwxP4US9PPjMi6Rqc9hy-gHbSKVzvTvYmTzOQ?key=nBTgfzmVkL2-N7DBMJ6e6gyk" alt="Facades or wrapper services" width="600" height="400" loading="lazy"></p>
<h4 id="heading-gradual-replacement-parallel-run">Gradual replacement (Parallel Run)</h4>
<p>If a specific component is to be replaced with a new implementation, it can help to run them in parallel for a while. For instance, if you have a spaghetti module that you want to redo correctly, you could leave the spaghetti code in place for legacy calls but start routing new calls to the new module.</p>
<p>The result is a codebase where changes in one area (hopefully) won’t unpredictably break another, a key property of a maintainable system.</p>
<h3 id="heading-6-testing-strategies-safely-refactoring-with-confidence">6. Testing Strategies (Safely Refactoring with Confidence)</h3>
<p>A robust testing strategy will give you the confidence to make sweeping changes because you’ll know quickly if something important breaks. Here’s how to approach testing in the context of a large refactoring:</p>
<h4 id="heading-establish-a-baseline-with-regression-tests">Establish a baseline with regression tests</h4>
<p>Before you even begin refactoring a particular component, make sure you have tests that cover its current behavior. You're lucky if the codebase already has a good test suite, but many legacy systems have inadequate tests.</p>
<p>One of the first tasks in those cases is often writing <strong>characterization tests</strong>. A characterization test is a test that documents what the system <em>currently does</em>, not what we think it should do​.</p>
<p>As Feathers says, “a characterization test is a test that characterizes the actual behavior of a piece of code.” This allows you to take a snapshot of what it does and ensure that it doesn’t change​.</p>
<p>This gives you a safety net so you can refactor with confidence that you’re not introducing regressions​. Use automated test suites to help things run smoothly (unit, integration, end-to-end).</p>
<h4 id="heading-continuous-integration-ci">Continuous integration (CI)</h4>
<p>It is highly recommended that testing be integrated into a CI pipeline that runs on every commit or merge. This way, you catch a bug during refactoring as soon as you introduce it, tightening the feedback loop.</p>
<h4 id="heading-canary-releases-and-feature-flags">Canary releases and feature flags</h4>
<p>Beyond pre-release testing, consider strategies for safely deploying refactored code. A canary release involves rolling out the change to a small subset of users or servers first, observing it, and then gradually expanding​.</p>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfAif0ftiqEhiRPDygrmhtzSsfrctq6ZPfJnMg04GwKmxKk-NFiP9GjEGE9rfz7U_WKhRcBYSBYlirjKwzr-PvfZz2FJpEWS6U0UqNh-WayiVM5BGIyz3sabSX-zdKKA0j_ojvhIA?key=nBTgfzmVkL2-N7DBMJ6e6gyk" alt="Canary releases and feature flags" width="600" height="400" loading="lazy"></p>
<p>This is great for catching issues that tests might miss (for example, performance issues or edge cases in production data). If the canary looks good (no errors, metrics are healthy), you proceed to full rollout. If not, you rollback quickly—with only a small impact scope.</p>
<h4 id="heading-performance-and-load-testing">Performance and load testing</h4>
<p>If performance is a concern, incorporate performance tests into your strategy. This can be done in a staging environment. You might reconsider your approach or optimize the new code if you see a significant regression.</p>
<h4 id="heading-testing-legacy-code-lacking-tests">Testing legacy code lacking tests</h4>
<p>If you’re dealing with a part of the system with zero tests (not uncommon in older code), prioritize getting at least some coverage there. There are also techniques like <strong>approval testing</strong> (where you generate output and have a human approve it as correct, then use that as a baseline for future tests). The key is not to refactor entirely in the dark; give yourself at least a flashlight in the form of tests!</p>
<p>In sum, a strong testing strategy is non-negotiable for refactoring complex systems. It’s your safety net, early warning system, and guide to know that your “cleanup” hasn’t broken anything vital.</p>
<h3 id="heading-7-refactoring-without-breaking-performance">7. Refactoring Without Breaking Performance</h3>
<p>A common concern when refactoring is whether these cleaner code changes will make my system slower or more resource-hungry. Ideally, refactoring is about the internal structure and shouldn’t change external behavior, and performance is part of the behavior.</p>
<p>In theory, performance should remain the same if you don’t change algorithms or data structures in a way that affects complexity.</p>
<p>In practice, though, performance can be inadvertently affected by refactoring. The new code may be more readable but uses more memory, or perhaps a critical caching mechanism was removed in the spirit of simplicity.</p>
<p><strong>Senior engineers need to be mindful of performance-sensitive parts of the system when refactoring and take steps to avoid regressions (or even improve performance where possible).</strong></p>
<p>Here’s how to refactor with performance in mind:</p>
<h4 id="heading-identify-performance-critical-code-paths">Identify performance-critical code paths</h4>
<p>Not all codes are equal regarding performance impact. If you refactor them, treat it almost like a functional change: you must re-measure performance afterwards. You have more leeway for parts of the code that run rarely or are not bottlenecks.</p>
<h4 id="heading-use-profiling-before-and-after">Use profiling before and after</h4>
<p>A profiler is a tool that measures where time is spent in your code or how memory is allocated. It’s beneficial to run a profiler on the code before refactoring a module to see how it behaves, and then run it after to compare. If you see, for example, that after refactoring, a function now shows up as taking 30% of execution time (when it was negligible before), that’s a red flag. Maybe the new code calls it more times than before.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> cProfile, pstats
<span class="hljs-keyword">from</span> mymodule <span class="hljs-keyword">import</span> slow_function

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">profile</span>(<span class="hljs-params">fn</span>):</span>
    profiler = cProfile.Profile()
    profiler.enable()
    fn()
    profiler.disable()
    stats = pstats.Stats(profiler).strip_dirs().sort_stats(<span class="hljs-string">'cumtime'</span>)
    stats.print_stats(<span class="hljs-number">10</span>)

<span class="hljs-comment"># run before refactor</span>
profile(<span class="hljs-keyword">lambda</span>: slow_function())

<span class="hljs-comment"># after you refactor slow_function(), re-run and compare stats</span>
</code></pre>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXd1xNcjypguN9JbN7JtBhAtBkfDrtCV6IwOORRUVT5rOAha_I2GQx3vgKRAjlxpeeUIGLTETRR6J3EnS2y95DY6ypiH95DQJT0vRfcyxv2KIz99hPXa0O8JjTzxpi5eSsk3spN6EQ?key=nBTgfzmVkL2-N7DBMJ6e6gyk" alt="profiler-dashboard" width="600" height="400" loading="lazy"></p>
<h4 id="heading-when-possible-improve-performance-through-refactoring">When possible, improve performance through refactoring</h4>
<p>On the flip side, refactoring can help performance.</p>
<p>For example, by refactoring duplicated code into one place, you can use better caching in that one place. So, watch for performance improvement opportunities that arise naturally as you refactor.</p>
<p>Performance should be treated as part of the “external behavior” that needs to be preserved in a good mindset. Refactoring should ideally not make things slower for users. To ensure that, incorporate performance checks into your plan, especially for critical sections. Measure, don’t guess. The end goal is a codebase that is both clean <strong>and</strong> fast enough.</p>
<h3 id="heading-8-automate-code-reviews-with-ai-tools">8. Automate Code Reviews with AI tools</h3>
<p>Refactoring code is an ongoing process, not a one-time event – AI code review tools help enforce clean-code standards, catch smells early, and reduce the repetitive tasks that can bog down human reviewers. This frees your engineers to focus on deeper architectural or domain-specific issues.</p>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfWs-ZM80TK_JcjwyPEnywdJl6Tf4G6gYFa1cN_J2ugTlniaGr4a397JuUj721m7kUw0EKMnzYHykpHJdG_aW7w3_B2J91bLL1UoaabdNsmH1uckMJHcFVpAhqZM2r855AsVYwDJg?key=nBTgfzmVkL2-N7DBMJ6e6gyk" alt="CodeRabbit-ai-code-reviewer-tool" width="600" height="400" loading="lazy"></p>
<p>One powerful option is <a target="_blank" href="https://www.coderabbit.ai/">CodeRabbit</a>, an AI-driven review platform designed to cut review time and bugs in half.</p>
<p>Here’s how it works and why it can boost your refactoring workflow:</p>
<h4 id="heading-ai-powered-contextual-feedback">AI-powered contextual feedback</h4>
<p>CodeRabbit analyzes pull requests line by line, applying both advanced language models and static analysis under the hood. It flags potential bugs, best-practice deviations, and style issues before a human opens the PR.</p>
<p>Some other features include:</p>
<ul>
<li><p><strong>Auto-generated summaries and 1-click fixes</strong> – Summarize large PRs and apply straightforward fixes instantly.</p>
</li>
<li><p><strong>Real-time collaboration and AI chat</strong> – Chat with the AI for clarifications, alternate code snippets, and instant feedback.</p>
</li>
<li><p><strong>Integrates with popular dev platforms</strong> – Supports GitHub, GitLab, and Azure DevOps for seamless PR scanning.</p>
</li>
</ul>
<p>CodeRabbit even has a free AI code reviews in VS Code and with this <a target="_blank" href="https://marketplace.visualstudio.com/items?itemName=CodeRabbit.coderabbit-vscode">VS Code extension</a>, you can get the most advanced AI code reviews directly in your code editor, saving review time, catching more bugs, and helping you in refactoring.</p>
<h2 id="heading-summary">Summary</h2>
<p>Refactoring a complex enterprise codebase is like renovating a large building while people still live in it without collapsing the structure.</p>
<p>Refactoring should be an ongoing process. You prevent the codebase from decaying by incorporating these practices into your regular development (perhaps allocating some time each sprint for refactoring or doing it opportunistically when touching your code). Each minor refactoring should not be too complex, and the cumulative effect is significant.</p>
<p>As <a target="_blank" href="https://martinfowler.com/">Martin Fowler</a> puts it, a series of small changes can lead to a significant improvement in design.</p>
<p>That's it for this blog. I hope you learned something new today.</p>
<p>If you want to read more interesting articles about developer tools, React, Next.js, AI and more, then I'll encourage you to checkout my <a target="_blank" href="https://www.devtoolsacademy.com/">blog</a>.</p>
<p>Some of the new and interesting articles I've written in the last 24 months.</p>
<ul>
<li><p><a target="_blank" href="https://www.devtoolsacademy.com/blog/cursor-vs-windsurf/">Cursor vs Windsurf</a></p>
</li>
<li><p><a target="_blank" href="https://clerk.com/blog/nextjs-role-based-access-control">How to Implement Role-Based Access Control in Next.js</a></p>
</li>
<li><p><a target="_blank" href="https://www.devtoolsacademy.com/blog/ai-code-reviewers-vs-human-code-reviewers/">AI Code Reviewers vs Human Code Reviewers</a></p>
</li>
<li><p><a target="_blank" href="https://www.freecodecamp.org/news/how-i-built-a-custom-video-conferencing-app-with-stream-and-nextjs/">How to Build a Custom Video Conferencing App with Stream and Next.js</a></p>
</li>
<li><p><a target="_blank" href="https://www.freecodecamp.org/news/how-to-perform-code-reviews-in-tech-the-painless-way/">How to Perform Code Reviews in Tech – The Painless Way</a></p>
</li>
</ul>
<p>You can get in touch if you have any questions or corrections. I’m expecting them.</p>
<p>And if you found this blog useful, please share it with your friends and colleagues who might benefit from it as well. Your support enables me to continue producing useful content for the tech community.</p>
<p>Now it’s time to take the next step by subscribing to my <a target="_blank" href="https://bytesizedbets.com/"><strong>newsletter</strong></a> and following me on <a target="_blank" href="https://twitter.com/theankurtyagi"><strong>Twitter</strong></a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Perform Code Reviews in Tech – The Painless Way ]]>
                </title>
                <description>
                    <![CDATA[ Okay, I know you may be skeptical: other guides have promised painless code reviews only to reveal that their solution requires some hyper-specific tech stack or a paid developer tool. I won’t do that to you. This guide provides a straightforward and... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-perform-code-reviews-in-tech-the-painless-way/</link>
                <guid isPermaLink="false">674f6a16678426d72d800cd5</guid>
                
                    <category>
                        <![CDATA[ code review ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Web Development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ software development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Software Engineering ]]>
                    </category>
                
                    <category>
                        <![CDATA[ automation ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ankur Tyagi ]]>
                </dc:creator>
                <pubDate>Tue, 03 Dec 2024 20:29:10 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1733242289474/def1a314-fe64-448b-9236-f66a529e3f13.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Okay, I know you may be skeptical: other guides have promised painless <code>code reviews</code> only to reveal that their solution requires some hyper-specific tech stack or a paid developer tool. I won’t do that to you.</p>
<p>This guide provides a straightforward and flexible template for <code>code reviews</code> that you can apply to your engineering team. The <strong>only</strong> requirement is that your app code is <code>open source</code>.</p>
<p>You can test a TypeScript workflow, Java workflow, Python workflow, PHP, Ruby or even some wacky web stack you invented. And it doesn’t matter if you’re developing on Windows, Linux, or Mac. Best of all, you don’t have to perform convoluted configuration or install software beyond a <code>yaml</code>.</p>
<p>I’ve been in engineering for the last 15 years, and <code>code reviews</code> have a bad reputation. We’ve all witnessed or lived through horror stories where sometimes it feels like every previous line gets torn to shreds.</p>
<p>So, what <em>can</em> you do differently? How can you make reviewing your code painless so that even the biggest nitpick on your team has nothing but praise?</p>
<p>After participating in code reviews for a decade, taking code reviews less personally is <strong>the single biggest thing you can do to improve your code.</strong> Why? Because all software is iterative. Even “perfect” code will eventually become outdated. Instead of thinking of it like a graded assignment, think of it as a part of the process.</p>
<h2 id="heading-table-of-contents"><strong>Table of Contents:</strong></h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-a-code-review">What is a Code Review?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-the-purpose-of-a-code-review">What is the Purpose of a Code Review?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-is-doing-code-reviews-hard">Why is Doing Code Reviews Hard?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-can-ai-replace-code-reviews">Can AI Replace Code Reviews?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-to-focus-on-during-a-code-review">What to Focus on During a Code Review</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-code-review-best-practices-and-process">Code Review Best Practices And Process</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-coderabbit">What is CodeRabbit?</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-how-does-coderabbit-help">How Does CodeRabbit Help?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-a-github-repo-to-test">A GitHub Repo to Test</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-additional-examples">Additional Examples</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites"><strong>Prerequisites</strong></h2>
<p>This tutorial uses free, open-source tools. You’ll need to have a <a target="_blank" href="https://github.com/tyaga001">GitHub account</a> to help you make your code reviews more pleasant and valuable.</p>
<h2 id="heading-what-is-a-code-review"><strong>What is a Code Review?</strong></h2>
<p>The term “<a target="_blank" href="https://en.wikipedia.org/wiki/Code_review">code review</a>” can refer to various activities, from simply reading code over your teammate’s shoulder to a 10-person meeting where you dissect code line by line. I use the term to refer to a formal and written process, but not so heavyweight as a series of in-person code inspection meetings.</p>
<p>In a project where you work on a repository with other developers, after you complete your work, you commit, push, and create a pull request on the VCS, most likely using Git commands. Then, everyone reviews the pull request to determine whether it’s okay to use. If so, they approve it, and that code gets used in the project.</p>
<h2 id="heading-what-is-the-purpose-of-a-code-review"><strong>What is the Purpose of a Code Review?</strong></h2>
<p>Code Reviews are a tool for <em>knowledge transfer</em>. They help make devs more efficient when doing maintenance on a part of the system they didn't write.</p>
<p>When you review a pull request, it’s an opportunity to iron out issues before they become technical debt.</p>
<p>Code reviews can also be a good setting for mentoring junior developers.</p>
<p>Now, let’s discuss what is <strong>not</strong> the purpose of a code review:</p>
<ul>
<li>Finding bugs. That's what tests (unit, integration, e2e, api, and so on…)are for.</li>
</ul>
<p>Nitpicking on style issues – settle for one style and use formatters or AI tools to enforce it. Just keep in mind that there are many things that an AI tool cannot check. Code reviews are an excellent place to ensure the code is sufficiently documented or self-documenting.</p>
<p>Do you want to know how you can check this? Return to the code you wrote 6-12 months ago and try to understand what it was written to do.</p>
<p>If you understand it quickly, that means it's readable, and the code review was done properly and in a helpful manner.</p>
<h2 id="heading-why-is-doing-code-reviews-hard">Why is Doing <strong>Code Reviews</strong> Hard?</h2>
<p>Despite their importance, many devs don’t like doing code reviews – in part because they can be challenging, especially if you’re not following best practices.</p>
<p>Here are some pain points I’ve observed during my years of participating in code reviews:</p>
<ul>
<li><p>When people talk about code reviews, they focus on the reviewer. But the developer who writes the code is just as crucial to the review as the person who reads it.</p>
</li>
<li><p>Doing a code review is not an automatic routine for a developer.</p>
</li>
<li><p>The reviewer may sometimes just do a partial review and add new comments at every pass, even on code in the previous review(s) that remained untouched.</p>
</li>
<li><p>Sometimes, the code reviewer may not clearly express their expectations.</p>
</li>
<li><p>Multiple code reviewers can often have diverging opinions, leading to (too) long discussions.</p>
</li>
<li><p>The developer does not understand the comments from reviewers and requires back-and-forth discussions.</p>
</li>
<li><p>The developer addresses code review comments differently than agreed upon during the review process.</p>
</li>
</ul>
<p>These pain points often bottleneck our development velocity. But recent advances in AI-assisted code review tools have started addressing these common friction points in our PR workflows.</p>
<p>Let's explore how AI-powered tools, along with some best practices, can address these review challenges and optimize your development workflow.</p>
<h2 id="heading-can-ai-replace-code-reviews"><strong>Can AI Replace Code Reviews?</strong></h2>
<p>While AI hasn’t replaced human code reviews, it is a powerful force multiplier in the review process.</p>
<p>Here's how: AI code reviews excel as a preliminary screening tool, catching common issues before human reviewers see the code. This becomes especially valuable in open-source projects where maintainer bandwidth is limited.</p>
<p>I recently started using AI code reviews on a case-by-case basis for my projects.</p>
<p>AI tools improve my existing workflows, reduce failure rates by detecting logic errors early on, and boost productivity. So I’ve added it to my CI/CD pipelines. It doesn't have to be perfect at detecting logic errors, as long as its false positive rate is very low (ideally as close to 0 as possible).</p>
<p>Most importantly, AI reviews respect the golden rule of 'value your reviewer's time' by handling routine checks. This allows human reviewers to focus on architecture, business logic, and complex edge cases.</p>
<p>This approach positions AI as a complementary tool that augments rather than replaces human expertise in the code review process.</p>
<h2 id="heading-what-to-focus-on-during-a-code-review">What to Focus on During a Code Review</h2>
<p>When reviewing code, try to prioritise what matters most using the Code Review Pyramid. This is a framework that helps you focus your attention where it creates the most value.</p>
<p>Think of it like building a house — start with the foundation before worrying about paint colours.</p>
<p>The pyramid has five layers, from most critical (bottom) to least critical (top):</p>
<ol>
<li><p><strong>API Semantics</strong>: Core design decisions that affect users</p>
</li>
<li><p><strong>Implementation Semantics</strong>: The code's functionality, security, and performance</p>
</li>
<li><p><strong>Documentation</strong>: Clear explanation of how to use the code</p>
</li>
<li><p><strong>Tests</strong>: Verification that everything works as intended</p>
</li>
<li><p><strong>Code Style</strong>: Formatting and naming conventions</p>
</li>
</ol>
<p>Source: <a target="_blank" href="https://www.morling.dev/blog/the-code-review-pyramid/">The Code Review Pyramid by Gunnar Morling</a></p>
<div class="embed-wrapper">
        <blockquote class="twitter-tweet">
          <a href="https://twitter.com/gunnarmorling/status/1501645187407388679"></a>
        </blockquote>
        <script defer="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></div>
<p> </p>
<p>Remember: if you want to catch issues/bugs, there are more appropriate processes for that. That is why we have automated testing, canary releases, testing environments, and so on.</p>
<p>In my personal opinion, using code reviews as a bug catching tool is somewhat of an anti-pattern where you're compensating for a development process that may be lacking some key steps/processes.</p>
<p>To me, a <code>code review</code> is much more about managing technical <code>debt</code> and ensuring that quality is produced, while shipping more features.</p>
<p>In doing a code review, you should make sure that:</p>
<ul>
<li><p>The code is readable</p>
</li>
<li><p>It has appropriate unit tests</p>
</li>
<li><p>The developer used clear names for everything</p>
</li>
<li><p>The code is well-designed and isn’t more complex than it needs to be</p>
</li>
<li><p>Test cases make sense and have comprehensive coverage</p>
</li>
<li><p>It’s something the team can maintain in the long run</p>
</li>
<li><p>There are no architectural issues that will block the team</p>
</li>
<li><p>The code fits the team's idea of quality</p>
</li>
<li><p>You’re thinking about what you can learn from the PR</p>
</li>
<li><p>You’re sharing any knowledge the developer might use in their PR</p>
</li>
<li><p>You’re thinking about how you can empower the dev through your positive feedback</p>
</li>
<li><p>The PR has a clear changelist description</p>
</li>
</ul>
<h2 id="heading-code-review-best-practices-and-process"><strong>Code Review Best Practices And Process</strong></h2>
<p>There is no general rule in engineering for code reviews, as what you’ll need to focus on depends on many factors. You can and should set up the process according to your company standards and way of working as a team.</p>
<p>Here are some factors you’ll need to think about before setting up a code review process:</p>
<ul>
<li><p>The size and type of company you’re in (for example a startup vs a large corporation)</p>
</li>
<li><p>The number of developers on your team</p>
</li>
<li><p>Your budget</p>
</li>
<li><p>The timeframe you’re working with</p>
</li>
<li><p>Your and your team’s workloads</p>
</li>
<li><p>The complexity of the code</p>
</li>
<li><p>The abilities and skills of the reviewer(s)</p>
</li>
<li><p>The availability of the reviewer(s)</p>
</li>
</ul>
<p>As an example, at my work we have a very simple rule: <strong>all</strong> <strong>code</strong> <strong>changes must be reviewed by at least one developer</strong> before a merge or a commit to the trunk.</p>
<p>Code reviews need a systematic approach, but maintaining consistency across every PR is challenging. It’s useful to let computers handle repetitive checks (style, formatting) while humans focus on what matters most: architecture and logic. This balanced approach makes reviews both thorough and sustainable.</p>
<p><strong>Take a look at this example</strong>. It shows how we can optimize our <code>code review</code> process by intelligently delegating tasks between humans and automated tools. The diagram below illustrates a typical code style review workflow, comparing manual human review steps against automated tooling.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1731490662335/8b0e9e27-c31b-409f-9c9e-fd1a33195d9b.png" alt="Human vs Automated Code Style Review Process - showing why formatting checks should be automated" class="image--center mx-auto" width="1287" height="2074" loading="lazy"></p>
<p>The diagram shows a real problem we all face in code reviews. See the left side? That's we humans doing manual formatting checks: finding weird spaces, fixing indents, writing comments about it... pretty tedious stuff. But check out the right side: that's where tools like <code>Prettier</code> just fix these formatting issues automatically.</p>
<p>No meetings, no back-and-forth – just done. That's why I started using <code>CodeRabbit</code>, which is a dev tool that caught my attention recently.</p>
<h2 id="heading-what-is-coderabbit"><strong>What is CodeRabbit?</strong></h2>
<p>The CodeRabbit docs describe the tool pretty effectively, so I’ll just leave this here:</p>
<blockquote>
<p><a target="_blank" href="https://www.coderabbit.ai/"><strong>CodeRabbit</strong></a> is an AI-powered code reviewer that delivers context-aware feedback on pull requests within minutes, reducing the time and effort needed for manual code reviews. It provides a fresh perspective and catches issues that are often missed, enhancing the overall review quality. – from the CodeRabbit docs</p>
</blockquote>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1731326629130/933c46f2-a24c-4e08-a470-8449e96387aa.png" alt="what is CodeRabbit - home page" class="image--center mx-auto" width="3084" height="1850" loading="lazy"></p>
<h3 id="heading-how-does-coderabbit-help">How Does CodeRabbit Help?</h3>
<p>Let me walk you through a real example. When you submit a PR, CodeRabbit:</p>
<ol>
<li>Performs a PR summary on the fly:</li>
</ol>
<ul>
<li><p>First, it gives you a quick overview of what changed.</p>
</li>
<li><p>It also explains the impact in plain English (great for non-tech folks in your team).</p>
</li>
<li><p>Then it includes a detailed walkthrough of file changes.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1732879970322/c671b932-25b1-474c-8cae-c393cb1706b8.png" alt="Pull Request Summary" class="image--center mx-auto" width="2462" height="1356" loading="lazy"></p>
<ol start="2">
<li>Does a “Smart Code Review”:</li>
</ol>
<ul>
<li><p>It drops comments right on the specific lines that need attention.</p>
</li>
<li><p>It also suggests fixes in diff format that you can apply them with one click.</p>
</li>
<li><p>And it shows what commits and files it checked (which is helpful for tracking review coverage).</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1732880687958/8d0e1ce5-cb23-4c62-b9ba-823f3a59845e.png" alt="Smart Code Reviews" class="image--center mx-auto" width="1952" height="1614" loading="lazy"></p>
<ol start="3">
<li>Give you interactive feedback:</li>
</ol>
<ul>
<li><p>You can chat with it right in the PR comments.</p>
</li>
<li><p>You can ask it questions about specific code changes to get more details.</p>
</li>
<li><p>And it remembers your team's patterns and preferences which is super helpful for consistency’s sake (which I discussed above).</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1732880617364/9e246445-1d43-45f1-b4af-62d9f013d76a.png" alt="chat with coderabbit" class="image--center mx-auto" width="1930" height="1672" loading="lazy"></p>
<ol start="4">
<li>Extra Helpful Features:</li>
</ol>
<ul>
<li><p>CodeRabbit validates changes against linked GitHub/GitLab issues.</p>
</li>
<li><p>It creates sequence diagrams to visualize changes.</p>
</li>
<li><p>And it can perform one-click fixes on applications for simple issues.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1732880539024/412d6c15-d691-4b65-b335-2e04b04a55e1.png" alt="sequence diagrams by coderabbit" class="image--center mx-auto" width="1966" height="1458" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1731322941721/9e7c5e9a-ac02-458b-9de3-4cf92232786d.png" alt="Code reviews done by CodeRabbit" class="image--center mx-auto" width="2834" height="1842" loading="lazy"></p>
<p>I first discovered <code>CodeRabbit</code> last month while I was searching for something else on GitHub. I accidentally came across it and I was surprised by how many people are already using it.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1731323088015/12db3391-bad0-45a7-908d-2c34391a7803.png" alt="how many projects are already using coderabbit" width="3142" height="1842" loading="lazy"></p>
<p>I instantly signed up because I was looking for exactly such a solution which could help me and my team out with our reviews.</p>
<p>I read through <a target="_blank" href="https://docs.coderabbit.ai/">the CodeRabbit docs</a> and was very impressed.</p>
<p>Getting started using it is pretty much a plug and play process.</p>
<p>In the next section, we’ll go through the quick steps you can follow to enable CodeRabbit using an example repo.</p>
<ul>
<li><p>Sign up at <a target="_blank" href="http://coderabbit.ai">coderabbit.ai</a> using your GitHub account.</p>
</li>
<li><p>Go to Add Your Repository.</p>
</li>
<li><p>And that's it. CodeRabbit starts reviewing your PRs automatically.</p>
</li>
</ul>
<h3 id="heading-a-github-repo-to-test"><strong>A GitHub Repo to Test</strong></h3>
<p>As an example <strong>GitHub</strong> <strong>repo</strong> to test, we’ll use <a target="_blank" href="https://www.devtoolsacademy.com/">devtoolsacademy</a>: my blog on everything about awesome developer tools.</p>
<p>First, visit the <a target="_blank" href="https://app.coderabbit.ai/login">CodeRabbit login page</a> and login via GitHub.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1732880133507/959c0521-eddf-4026-bf33-64b415f4d9b3.png" alt="login - coderabbit" class="image--center mx-auto" width="1462" height="1172" loading="lazy"></p>
<p>Next, add CodeRabbit to some of your public GitHub repositories.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1731327118318/7329afd5-af9c-4e54-9aba-6720cd00b8ca.png" alt="how-to-add-a-public-repo-to-use-coderabbit" class="image--center mx-auto" width="3126" height="1804" loading="lazy"></p>
<p>Now, CodeRabbit is fully integrated and ready to do code reviews on your selected repo.</p>
<p>Yes: it’s that simple and fast. And in my opinion, it’s one of the main reasons the tool is so useful.</p>
<p>Here are some sample PRs for you to check out:</p>
<ul>
<li><p><a target="_blank" href="https://github.com/tyaga001/devtoolsacademy/pull/10">https://github.com/tyaga001/devtoolsacademy/pull/10</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/tyaga001/devtoolsacademy/pull/13">https://github.com/tyaga001/devtoolsacademy/pull/13</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/sartography/spiff-arena/pull/1233#discussion_r1529013218">sartography/spiff-arena#1233 (comment)</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/sartography/spiff-arena/pull/1233#discussion_r1529013218">kmesiab/equilibria#1 (comment</a><a target="_blank" href="https://github.com/kmesiab/equilibria/pull/1#discussion_r1505474270">)</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/kmesiab/equilibria/pull/1#discussion_r1505474270">kamiazya/web-csv-toolbox#60</a> <a target="_blank" href="https://github.com/kamiazya/web-csv-toolbox/pull/60#discussion_r1453463448">(comment)</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/kamiazya/web-csv-toolbox/pull/60#discussion_r1453463448">openreplay/openreplay#1858 (comme</a><a target="_blank" href="https://github.com/openreplay/openreplay/pull/1858#discussion_r1467629285">nt)</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/openreplay/openreplay/pull/1858#discussion_r1467629285">ls1intum/Artemis#8037 (comm</a><a target="_blank" href="https://github.com/ls1intum/Artemis/pull/8037#discussion_r1494109998">ent)</a></p>
</li>
</ul>
<h3 id="heading-additional-examples"><strong>Additional Examples</strong></h3>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">check all the open source examples of code reviews done by <a target="_self" href="https://github.com/search?q=coderabbitai&amp;type=pullrequests">CodeRabbit</a>.</div>
</div>

<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Everyone’s code needs reviewing. Just because someone is the most senior person on the team does not mean that their code doesn’t need to be reviewed.</p>
<p>In this article, I talked about code reviews along with some common pain points. I then showed you how you can leverage CodeRabbit to iterate quickly through your code reviews and focus more on business.</p>
<h3 id="heading-further-reading"><strong>Further reading</strong></h3>
<p>In this article I talked about basic intro to CodeRabbit, because that was my use case with my <a target="_blank" href="https://www.devtoolsacademy.com/">blog</a>.</p>
<p>For more advanced functionality, check out the official CodeRabbit <a target="_blank" href="https://docs.coderabbit.ai/">docs</a> or read their <a target="_blank" href="https://www.coderabbit.ai/blog">blog</a>.</p>
<h3 id="heading-before-i-end"><strong>Before I End</strong></h3>
<p>I hope you found it helpful learning how to use AI tools for code reviews.</p>
<p>If you like my writing, these are some of my other most recent articles.</p>
<ul>
<li><p><a target="_blank" href="https://www.devtoolsacademy.com/blog/neon-vs-supabase"><strong>Neon Postgres vs Supabase</strong></a></p>
</li>
<li><p><a target="_blank" href="https://www.devtoolsacademy.com/blog/mongoDB-vs-postgreSQL"><strong>MongoDB vs. PostgreSQL</strong></a></p>
</li>
<li><p><a target="_blank" href="https://www.devtoolsacademy.com/blog/supabase-vs-clerk">Supabase vs Clerk</a></p>
</li>
<li><p><a target="_blank" href="https://www.freecodecamp.org/news/how-i-built-a-custom-video-conferencing-app-with-stream-and-nextjs/#heading-next-steps">How I Built a Video Conferencing App with Stream and Next.js</a></p>
</li>
<li><p><a target="_blank" href="https://www.freecodecamp.org/news/fine-grained-authorization-in-java-and-springboot/">How to Implement Fine-Grained Authorization in Java</a></p>
</li>
</ul>
<p>Follow me on <a target="_blank" href="https://x.com/theankurtyagi">Twitter</a> to stay updated on my open source projects.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How I Built a Custom Video Conferencing App with Stream and Next.js ]]>
                </title>
                <description>
                    <![CDATA[ Building full-stack apps can be tough. You have to think about frontend, APIs, databases, auth – plus you have to know how all of these things work together. And building a project like a video conferencing app from scratch can feel even more overwhe... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-i-built-a-custom-video-conferencing-app-with-stream-and-nextjs/</link>
                <guid isPermaLink="false">66fd86ff9cea0a9dc9177283</guid>
                
                    <category>
                        <![CDATA[ Next.js ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Web Development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Startups ]]>
                    </category>
                
                    <category>
                        <![CDATA[ software development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ webdev ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Developer Tools ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ankur Tyagi ]]>
                </dc:creator>
                <pubDate>Wed, 02 Oct 2024 17:46:39 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1727433361539/498f0742-2ff1-4762-b268-2c25eb22017e.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Building full-stack apps can be tough. You have to think about frontend, APIs, databases, auth – plus you have to know how all of these things work together.</p>
<p>And building a project like a video conferencing app from scratch can feel even more overwhelming, especially with the complexities of managing video streams, user auth, and real-time interactions.</p>
<p>But what if I told you there’s an easier way to do this – one that lets you build your video conferencing app in a fraction of the time?</p>
<p>In this article, I’ll show you how I built a video conferencing app using <a target="_blank" href="https://getstream.io/">Stream</a> and Clerk in Next.js.</p>
<p>Here is the <a target="_blank" href="https://github.com/tyaga001/facetime-on-stream">source code</a> (remember to give it a star ⭐).</p>
<p>Before we start, let me tell you why I wrote this tutorial.</p>
<p>I’m a Software Engineer who cares about writing and I <strong>love</strong> to <strong>code</strong>, <strong>design</strong>, <strong>develop</strong>, and then <strong>teach</strong> people.</p>
<p>I've been using open-source projects, products, and services for a while now, and contributing to many of them to improve them how I can. Last month I built an open-source blog for “awesome developer tools“ called - <a target="_blank" href="https://www.devtoolsacademy.com/">devtoolsacademy</a></p>
<p><a target="_blank" href="https://www.devtoolsacademy.com/"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727430858395/70ffbec4-69ab-4f31-a9cb-02b44066ac6b.png" alt="devtoolsacademy" class="image--center mx-auto" width="3072" height="1830" loading="lazy"></a></p>
<p>This article is about sharing the experience I’ve had using yet another awesome developer tool.</p>
<h2 id="heading-table-of-contents"><strong>Table of Contents:</strong></h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-stream">What is Stream?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-build-the-app-interface-with-nextjs">How to Build the App Interface with Next.js</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-the-create-link-modal">The Create Link Modal</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-instant-meeting-modal">The Instant Meeting Modal</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-join-meeting-modal">The Join Meeting Modal</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-authenticate-users-with-clerk">How to Authenticate Users with Clerk</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-set-up-stream-in-a-nextjs-app">How to Set Up Stream in a Next.js app</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-create-and-join-calls-with-stream">How to Create and Join Calls with Stream</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-creating-and-scheduling-calls">Creating and Scheduling calls</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-joining-calls-and-the-meeting-page">Joining calls and the Meeting Page</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-retrieving-upcoming-calls">Retrieving Upcoming Calls</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-next-steps">Next Steps</a></p>
</li>
</ul>
<h2 id="heading-what-is-stream">What is Stream?</h2>
<p><a target="_blank" href="https://getstream.io/">Stream</a> is an open-source cloud-based platform that provides APIs and SDKs for building scalable and feature-rich real-time applications. It offers pre-built UI components for creating enterprise-grade software apps with features like chat, video, audio, and activity feeds.</p>
<p><a target="_blank" href="https://getstream.io"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1726475007023/be45aa40-7794-434a-8f5d-f4b637d97fd8.png" alt="What is Stream" class="image--center mx-auto" width="3114" height="1778" loading="lazy"></a></p>
<p>Here's how I'll use <code>Stream</code> while building the app:</p>
<ul>
<li><p>Set up real-time video and audio calls</p>
</li>
<li><p>Use Stream's UI components to quickly build the interface</p>
</li>
<li><p>Implement key features like <code>video</code> and <code>audio</code> calls</p>
</li>
<li><p><code>Call Types</code> – I'll implement instant meetings and pre-scheduled calls using Stream</p>
</li>
<li><p>Leverage Stream's call and participant objects to manage <code>call state</code></p>
</li>
</ul>
<h2 id="heading-prerequisites"><strong>Prerequisites</strong></h2>
<p>To fully understand the tutorial, you need to have a basic understanding of <a target="_blank" href="https://www.freecodecamp.org/news/learn-react-key-concepts/">React</a> and <a target="_blank" href="https://theankurtyagi.com/next-js/">Next.js</a>. You’ll also need the following:</p>
<ul>
<li><p><a target="_blank" href="https://getstream.io/chat/docs/sdk/react/">Stream React SDK</a> - provides pre-built UI components for adding video call features quickly.</p>
</li>
<li><p><a target="_blank" href="https://github.com/GetStream/stream-node">Stream Node.js SDK</a> - for managing server-side interactions and keeping Stream's state in sync.</p>
</li>
<li><p><a target="_blank" href="https://clerk.com/">Clerk</a> - a comprehensive user management platform to handle authentication effortlessly.</p>
</li>
<li><p><a target="_blank" href="https://headlessui.com/">Headless UI</a> - provides accessible UI components for building user-friendly applications.</p>
</li>
<li><p><a target="_blank" href="https://www.npmjs.com/package/react-copy-to-clipboard">React Copy-to-Clipboard</a> - allows users to easily copy meeting links within the app.</p>
</li>
<li><p><a target="_blank" href="https://react-icons.github.io/react-icons/">React Icons</a> - offers a library of easily integrated icons.</p>
</li>
</ul>
<h2 id="heading-how-to-build-the-app-interface-with-nextjs">How to Build the App Interface with Next.js</h2>
<p>In this section, I'll guide you through creating the user interface for the video-conferencing app. The interface will allow users to easily create, join, and schedule meetings, as well as view their upcoming meetings.</p>
<p>First, let’s create a Next.js TypeScript project by running the code snippet below:</p>
<pre><code class="lang-bash">npx create-next-app facetime-app
</code></pre>
<p>Then install the following packages:</p>
<ul>
<li><p><a target="_blank" href="https://react-icons.github.io/react-icons/">React icons</a> - a popular React icons package</p>
</li>
<li><p><a target="_blank" href="https://headlessui.com/">Headless UI</a> - provides a set of accessible UI components</p>
</li>
<li><p><a target="_blank" href="https://www.npmjs.com/package/react-copy-to-clipboard">React-copy-to-clipboard</a> - a lightweight package that enables us to copy meeting links.</p>
</li>
</ul>
<pre><code class="lang-bash">npm install react-icons @headlessui/react react-copy-to-clipboard
</code></pre>
<p>Copy the code snippet below into the <code>app/page.tsx</code> file:</p>
<pre><code class="lang-typescript"><span class="hljs-string">"use client"</span>;
<span class="hljs-keyword">import</span> { useState } <span class="hljs-keyword">from</span> <span class="hljs-string">"react"</span>;
<span class="hljs-keyword">import</span> { FaLink, FaVideo } <span class="hljs-keyword">from</span> <span class="hljs-string">"react-icons/fa"</span>;
<span class="hljs-keyword">import</span> InstantMeeting <span class="hljs-keyword">from</span> <span class="hljs-string">"@/app/modals/InstantMeeting"</span>;
<span class="hljs-keyword">import</span> UpcomingMeeting <span class="hljs-keyword">from</span> <span class="hljs-string">"@/app/modals/UpcomingMeeting"</span>;
<span class="hljs-keyword">import</span> CreateLink <span class="hljs-keyword">from</span> <span class="hljs-string">"@/app/modals/CreateLink"</span>;
<span class="hljs-keyword">import</span> JoinMeeting <span class="hljs-keyword">from</span> <span class="hljs-string">"@/app/modals/JoinMeeting"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">Dashboard</span>(<span class="hljs-params"></span>) </span>{
    <span class="hljs-keyword">const</span> [startInstantMeeting, setStartInstantMeeting] =
        useState&lt;<span class="hljs-built_in">boolean</span>&gt;(<span class="hljs-literal">false</span>);
    <span class="hljs-keyword">const</span> [joinMeeting, setJoinMeeting] = useState&lt;<span class="hljs-built_in">boolean</span>&gt;(<span class="hljs-literal">false</span>);
    <span class="hljs-keyword">const</span> [showUpcomingMeetings, setShowUpcomingMeetings] =
        useState&lt;<span class="hljs-built_in">boolean</span>&gt;(<span class="hljs-literal">false</span>);
    <span class="hljs-keyword">const</span> [showCreateLink, setShowCreateLink] = useState&lt;<span class="hljs-built_in">boolean</span>&gt;(<span class="hljs-literal">false</span>);

    <span class="hljs-keyword">return</span> (
        &lt;&gt;
            &lt;button
                className=<span class="hljs-string">' top-5 right-5 text-sm fixed bg-green-500 px-2 w-[150px] hover:bg-green-600 py-3 flex flex-col items-center text-white rounded-md shadow-sm cursor-pointer z-10'</span>
                onClick={<span class="hljs-function">() =&gt;</span> setJoinMeeting(<span class="hljs-literal">true</span>)}
            &gt;
                &lt;FaVideo className=<span class="hljs-string">'mb-[3px] text-white'</span> /&gt;
                Join FaceTime
            &lt;/button&gt;

            &lt;main className=<span class="hljs-string">'w-full h-screen flex flex-col items-center justify-center'</span>&gt;
                &lt;h1 className=<span class="hljs-string">'font-bold text-2xl text-center'</span>&gt;FaceTime&lt;/h1&gt;
                &lt;div className=<span class="hljs-string">'flex flex-col'</span>&gt;
                    &lt;button
                        className=<span class="hljs-string">'text-green-500 underline text-sm text-center cursor-pointer'</span>
                        onClick={<span class="hljs-function">() =&gt;</span> setShowUpcomingMeetings(<span class="hljs-literal">true</span>)}
                    &gt;
                        Upcoming FaceTime
                    &lt;/button&gt;
                &lt;/div&gt;

                &lt;div className=<span class="hljs-string">'flex items-center justify-center space-x-4 mt-6'</span>&gt;
                    &lt;button
                        className=<span class="hljs-string">'bg-gray-500 px-4 w-[200px] py-3 flex flex-col items-center hover:bg-gray-600 text-white rounded-md shadow-sm'</span>
                        onClick={<span class="hljs-function">() =&gt;</span> setShowCreateLink(<span class="hljs-literal">true</span>)}
                    &gt;
                        &lt;FaLink className=<span class="hljs-string">'mb-[3px] text-gray-300'</span> /&gt;
                        Create link
                    &lt;/button&gt;
                    &lt;button
                        className=<span class="hljs-string">'bg-green-500 px-4 w-[200px] hover:bg-green-600 py-3 flex flex-col items-center text-white rounded-md shadow-sm'</span>
                        onClick={<span class="hljs-function">() =&gt;</span> setStartInstantMeeting(<span class="hljs-literal">true</span>)}
                    &gt;
                        &lt;FaVideo className=<span class="hljs-string">'mb-[3px] text-white'</span> /&gt;
                        New FaceTime
                    &lt;/button&gt;
                &lt;/div&gt;
            &lt;/main&gt;

            {startInstantMeeting &amp;&amp; (
                &lt;InstantMeeting
                    enable={startInstantMeeting}
                    setEnable={setStartInstantMeeting}
                /&gt;
            )}
            {showUpcomingMeetings &amp;&amp; (
                &lt;UpcomingMeeting
                    enable={showUpcomingMeetings}
                    setEnable={setShowUpcomingMeetings}
                /&gt;
            )}
            {showCreateLink &amp;&amp; (
                &lt;CreateLink enable={showCreateLink} setEnable={setShowCreateLink} /&gt;
            )}
            {joinMeeting &amp;&amp; (
                &lt;JoinMeeting enable={joinMeeting} setEnable={setJoinMeeting} /&gt;
            )}
        &lt;/&gt;
    );
}
</code></pre>
<p>The code snippet above renders multiple buttons that allow users to perform actions like joining, creating, and scheduling a call. Each button opens a modal that prompts the user to provide additional details specific to the action they are performing.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1726481911712/286f7349-0d95-419d-97e5-193371307e13.png" alt="facetime-app-home-page" class="image--center mx-auto" width="3110" height="1818" loading="lazy"></p>
<p>Next, let’s create a <code>modals</code> folder within the Next.js app directory and add the following components to the <code>modals</code> folder:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> app
mkdir modals &amp;&amp; <span class="hljs-built_in">cd</span> modals
touch CreateLink.tsx InstantMeeting.tsx JoinMeeting.tsx UpcomingMeeting.tsx
</code></pre>
<p>The <code>CreateLink</code> modal allows users to provide a description and schedule a time for the call. The <code>InstantMeeting</code> modal lets users start an instant meeting by providing a call description. The <code>JoinMeeting</code> modal enables users to enter a call link and join a meeting. And the <code>UpcomingMeeting</code> modal displays all scheduled upcoming calls.</p>
<h3 id="heading-the-create-link-modal">The Create Link Modal</h3>
<p>Copy the code snippet below into the <code>CreateLink</code> modal:</p>
<pre><code class="lang-typescript"><span class="hljs-string">"use client"</span>;
<span class="hljs-keyword">import</span> {
    Dialog,
    DialogTitle,
    DialogPanel,
    Transition,
    Description,
    TransitionChild,
} <span class="hljs-keyword">from</span> <span class="hljs-string">"@headlessui/react"</span>;
<span class="hljs-keyword">import</span> { Fragment, SetStateAction, useState, Dispatch } <span class="hljs-keyword">from</span> <span class="hljs-string">"react"</span>;
<span class="hljs-keyword">import</span> CopyToClipboard <span class="hljs-keyword">from</span> <span class="hljs-string">"react-copy-to-clipboard"</span>;
<span class="hljs-keyword">import</span> { FaCopy } <span class="hljs-keyword">from</span> <span class="hljs-string">"react-icons/fa"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">CreateLink</span>(<span class="hljs-params">{ enable, setEnable }: Props</span>) </span>{
    <span class="hljs-keyword">const</span> [showMeetingLink, setShowMeetingLink] = useState(<span class="hljs-literal">false</span>);
    <span class="hljs-keyword">const</span> [facetimeLink, setFacetimeLink] = useState&lt;<span class="hljs-built_in">string</span>&gt;(<span class="hljs-string">""</span>);
    <span class="hljs-keyword">const</span> closeModal = <span class="hljs-function">() =&gt;</span> setEnable(<span class="hljs-literal">false</span>);

    <span class="hljs-keyword">return</span> (
        &lt;&gt;
            &lt;Transition appear show={enable} <span class="hljs-keyword">as</span>={Fragment}&gt;
                &lt;Dialog <span class="hljs-keyword">as</span>=<span class="hljs-string">'div'</span> className=<span class="hljs-string">'relative z-10'</span> onClose={closeModal}&gt;
                    &lt;TransitionChild
                        <span class="hljs-keyword">as</span>={Fragment}
                        enter=<span class="hljs-string">'ease-out duration-300'</span>
                        enterFrom=<span class="hljs-string">'opacity-0'</span>
                        enterTo=<span class="hljs-string">'opacity-100'</span>
                        leave=<span class="hljs-string">'ease-in duration-200'</span>
                        leaveFrom=<span class="hljs-string">'opacity-100'</span>
                        leaveTo=<span class="hljs-string">'opacity-0'</span>
                    &gt;
                        &lt;div className=<span class="hljs-string">'fixed inset-0 bg-black/75'</span> /&gt;
                    &lt;/TransitionChild&gt;

                    &lt;div className=<span class="hljs-string">'fixed inset-0 overflow-y-auto'</span>&gt;
                        &lt;div className=<span class="hljs-string">'flex min-h-full items-center justify-center p-4 text-center'</span>&gt;
                            &lt;TransitionChild
                                <span class="hljs-keyword">as</span>={Fragment}
                                enter=<span class="hljs-string">'ease-out duration-300'</span>
                                enterFrom=<span class="hljs-string">'opacity-0 scale-95'</span>
                                enterTo=<span class="hljs-string">'opacity-100 scale-100'</span>
                                leave=<span class="hljs-string">'ease-in duration-200'</span>
                                leaveFrom=<span class="hljs-string">'opacity-100 scale-100'</span>
                                leaveTo=<span class="hljs-string">'opacity-0 scale-95'</span>
                            &gt;
                                &lt;DialogPanel className=<span class="hljs-string">'w-full max-w-2xl transform overflow-hidden rounded-2xl bg-white p-6 align-middle shadow-xl transition-all text-center'</span>&gt;
                                    {showMeetingLink ? (
                                        &lt;MeetingLink facetimeLink={facetimeLink} /&gt;
                                    ) : (
                                        &lt;MeetingForm
                                            setShowMeetingLink={setShowMeetingLink}
                                            setFacetimeLink={setFacetimeLink}
                                        /&gt;
                                    )}
                                &lt;/DialogPanel&gt;
                            &lt;/TransitionChild&gt;
                        &lt;/div&gt;
                    &lt;/div&gt;
                &lt;/Dialog&gt;
            &lt;/Transition&gt;
        &lt;/&gt;
    );
}
</code></pre>
<p>The code snippet above renders a form that allows users to input a description and select a time to schedule a call. Once the call is created, the generated link is displayed and can be copied.</p>
<p>Finally, add the <code>MeetingForm</code> and <code>MeetingLink</code> components below the <code>CreateLink</code> component:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> MeetingForm = <span class="hljs-function">(<span class="hljs-params">{
    setShowMeetingLink,
    setFacetimeLink,
}: {
    setShowMeetingLink: React.Dispatch&lt;SetStateAction&lt;<span class="hljs-built_in">boolean</span>&gt;&gt;;
    setFacetimeLink: Dispatch&lt;SetStateAction&lt;<span class="hljs-built_in">string</span>&gt;&gt;;
}</span>) =&gt;</span> {
    <span class="hljs-keyword">const</span> [description, setDescription] = useState&lt;<span class="hljs-built_in">string</span>&gt;(<span class="hljs-string">""</span>);
    <span class="hljs-keyword">const</span> [dateTime, setDateTime] = useState&lt;<span class="hljs-built_in">string</span>&gt;(<span class="hljs-string">""</span>);

    <span class="hljs-keyword">const</span> handleStartMeeting = <span class="hljs-keyword">async</span> (e: React.FormEvent&lt;HTMLFormElement&gt;) =&gt; {
        e.preventDefault();
        <span class="hljs-built_in">console</span>.log({ description, dateTime });
    };

    <span class="hljs-keyword">return</span> (
        &lt;&gt;
            &lt;DialogTitle
                <span class="hljs-keyword">as</span>=<span class="hljs-string">'h3'</span>
                className=<span class="hljs-string">'text-lg font-bold leading-6 text-green-600'</span>
            &gt;
                Schedule a FaceTime
            &lt;/DialogTitle&gt;

            &lt;Description className=<span class="hljs-string">'text-xs opacity-40 mb-4'</span>&gt;
                Schedule a FaceTime meeting <span class="hljs-keyword">with</span> your cliq
            &lt;/Description&gt;

            &lt;form className=<span class="hljs-string">'w-full'</span> onSubmit={handleStartMeeting}&gt;
                &lt;label
                    className=<span class="hljs-string">'block text-left text-sm font-medium text-gray-700'</span>
                    htmlFor=<span class="hljs-string">'description'</span>
                &gt;
                    Meeting Description
                &lt;/label&gt;
                &lt;input
                    <span class="hljs-keyword">type</span>=<span class="hljs-string">'text'</span>
                    name=<span class="hljs-string">'description'</span>
                    id=<span class="hljs-string">'description'</span>
                    value={description}
                    onChange={<span class="hljs-function">(<span class="hljs-params">e</span>) =&gt;</span> setDescription(e.target.value)}
                    className=<span class="hljs-string">'mt-1 block w-full text-sm py-3 px-4 border-gray-200 border-[1px] rounded mb-3'</span>
                    required
                    placeholder=<span class="hljs-string">'Enter a description for the meeting'</span>
                /&gt;

                &lt;label
                    className=<span class="hljs-string">'block text-left text-sm font-medium text-gray-700'</span>
                    htmlFor=<span class="hljs-string">'date'</span>
                &gt;
                    <span class="hljs-built_in">Date</span> and Time
                &lt;/label&gt;

                &lt;input
                    <span class="hljs-keyword">type</span>=<span class="hljs-string">'datetime-local'</span>
                    id=<span class="hljs-string">'date'</span>
                    name=<span class="hljs-string">'date'</span>
                    required
                    className=<span class="hljs-string">'mt-1 block w-full text-sm py-3 px-4 border-gray-200 border-[1px] rounded mb-3'</span>
                    value={dateTime}
                    onChange={<span class="hljs-function">(<span class="hljs-params">e</span>) =&gt;</span> setDateTime(e.target.value)}
                /&gt;

                &lt;button className=<span class="hljs-string">'w-full bg-green-600 text-white py-3 rounded mt-4'</span>&gt;
                    Create FaceTime
                &lt;/button&gt;
            &lt;/form&gt;
        &lt;/&gt;
    );
};
</code></pre>
<p>The <code>MeetingForm</code> component accepts the call description and scheduled time, while the <code>MeetingLink</code> component displays the generated call link and allows users to copy it.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> MeetingLink = <span class="hljs-function">(<span class="hljs-params">{ facetimeLink }: { facetimeLink: <span class="hljs-built_in">string</span> }</span>) =&gt;</span> {
    <span class="hljs-keyword">const</span> [copied, setCopied] = useState&lt;<span class="hljs-built_in">boolean</span>&gt;(<span class="hljs-literal">false</span>);
    <span class="hljs-keyword">const</span> handleCopy = <span class="hljs-function">() =&gt;</span> setCopied(<span class="hljs-literal">true</span>);

    <span class="hljs-keyword">return</span> (
        &lt;&gt;
            &lt;DialogTitle
                <span class="hljs-keyword">as</span>=<span class="hljs-string">'h3'</span>
                className=<span class="hljs-string">'text-lg font-bold leading-6 text-green-600'</span>
            &gt;
                Copy FaceTime Link
            &lt;/DialogTitle&gt;

            &lt;Description className=<span class="hljs-string">'text-xs opacity-40 mb-4'</span>&gt;
                You can share the facetime link <span class="hljs-keyword">with</span> your participants
            &lt;/Description&gt;

            &lt;div className=<span class="hljs-string">'bg-gray-100 p-4 rounded flex items-center justify-between'</span>&gt;
                &lt;p className=<span class="hljs-string">'text-xs text-gray-500'</span>&gt;
                    {<span class="hljs-string">`<span class="hljs-subst">${process.env.NEXT_PUBLIC_FACETIME_HOST}</span>/<span class="hljs-subst">${facetimeLink}</span>`</span>}
                &lt;/p&gt;

                &lt;CopyToClipboard
                    onCopy={handleCopy}
                    text={<span class="hljs-string">`<span class="hljs-subst">${process.env.NEXT_PUBLIC_FACETIME_HOST}</span>/<span class="hljs-subst">${facetimeLink}</span>`</span>}
                &gt;
                    &lt;FaCopy className=<span class="hljs-string">'text-green-600 text-lg cursor-pointer'</span> /&gt;
                &lt;/CopyToClipboard&gt;
            &lt;/div&gt;

            {copied &amp;&amp; (
                &lt;p className=<span class="hljs-string">'text-red-600 text-xs mt-2'</span>&gt;Link copied to clipboard&lt;/p&gt;
            )}
        &lt;/&gt;
    );
};
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1726482044698/0cb22caa-3e5a-4f01-9fa2-25c7ce77b08a.png" alt="facetime-app-schedule-popup" class="image--center mx-auto" width="3098" height="1828" loading="lazy"></p>
<h3 id="heading-the-instant-meeting-modal">The Instant Meeting Modal</h3>
<p>Copy the following code snippet into the <code>InstantMeeting</code> modal:</p>
<pre><code class="lang-typescript"><span class="hljs-string">"use client"</span>;
<span class="hljs-keyword">import</span> {
    Dialog,
    DialogTitle,
    DialogPanel,
    Transition,
    Description,
    TransitionChild,
} <span class="hljs-keyword">from</span> <span class="hljs-string">"@headlessui/react"</span>;
<span class="hljs-keyword">import</span> { FaCopy } <span class="hljs-keyword">from</span> <span class="hljs-string">"react-icons/fa"</span>;
<span class="hljs-keyword">import</span> CopyToClipboard <span class="hljs-keyword">from</span> <span class="hljs-string">"react-copy-to-clipboard"</span>;
<span class="hljs-keyword">import</span> { Fragment, useState, Dispatch, SetStateAction } <span class="hljs-keyword">from</span> <span class="hljs-string">"react"</span>;
<span class="hljs-keyword">import</span> { useStreamVideoClient } <span class="hljs-keyword">from</span> <span class="hljs-string">"@stream-io/video-react-sdk"</span>;
<span class="hljs-keyword">import</span> { useUser } <span class="hljs-keyword">from</span> <span class="hljs-string">"@clerk/nextjs"</span>;
<span class="hljs-keyword">import</span> Link <span class="hljs-keyword">from</span> <span class="hljs-string">"next/link"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">InstantMeeting</span>(<span class="hljs-params">{ enable, setEnable }: Props</span>) </span>{
    <span class="hljs-keyword">const</span> [showMeetingLink, setShowMeetingLink] = useState(<span class="hljs-literal">false</span>);
    <span class="hljs-keyword">const</span> [facetimeLink, setFacetimeLink] = useState&lt;<span class="hljs-built_in">string</span>&gt;(<span class="hljs-string">""</span>);

    <span class="hljs-keyword">const</span> closeModal = <span class="hljs-function">() =&gt;</span> setEnable(<span class="hljs-literal">false</span>);

    <span class="hljs-keyword">return</span> (
        &lt;&gt;
            &lt;Transition appear show={enable} <span class="hljs-keyword">as</span>={Fragment}&gt;
                &lt;Dialog <span class="hljs-keyword">as</span>=<span class="hljs-string">'div'</span> className=<span class="hljs-string">'relative z-10'</span> onClose={closeModal}&gt;
                    &lt;TransitionChild
                        <span class="hljs-keyword">as</span>={Fragment}
                        enter=<span class="hljs-string">'ease-out duration-300'</span>
                        enterFrom=<span class="hljs-string">'opacity-0'</span>
                        enterTo=<span class="hljs-string">'opacity-100'</span>
                        leave=<span class="hljs-string">'ease-in duration-200'</span>
                        leaveFrom=<span class="hljs-string">'opacity-100'</span>
                        leaveTo=<span class="hljs-string">'opacity-0'</span>
                    &gt;
                        &lt;div className=<span class="hljs-string">'fixed inset-0 bg-black/75'</span> /&gt;
                    &lt;/TransitionChild&gt;

                    &lt;div className=<span class="hljs-string">'fixed inset-0 overflow-y-auto'</span>&gt;
                        &lt;div className=<span class="hljs-string">'flex min-h-full items-center justify-center p-4 text-center'</span>&gt;
                            &lt;TransitionChild
                                <span class="hljs-keyword">as</span>={Fragment}
                                enter=<span class="hljs-string">'ease-out duration-300'</span>
                                enterFrom=<span class="hljs-string">'opacity-0 scale-95'</span>
                                enterTo=<span class="hljs-string">'opacity-100 scale-100'</span>
                                leave=<span class="hljs-string">'ease-in duration-200'</span>
                                leaveFrom=<span class="hljs-string">'opacity-100 scale-100'</span>
                                leaveTo=<span class="hljs-string">'opacity-0 scale-95'</span>
                            &gt;
                                &lt;DialogPanel className=<span class="hljs-string">'w-full max-w-2xl transform overflow-hidden rounded-2xl bg-white p-6 align-middle shadow-xl transition-all text-center'</span>&gt;
                                    {showMeetingLink ? (
                                        &lt;MeetingLink facetimeLink={facetimeLink} /&gt;
                                    ) : (
                                        &lt;MeetingForm
                                            setShowMeetingLink={setShowMeetingLink}
                                            setFacetimeLink={setFacetimeLink}
                                        /&gt;
                                    )}
                                &lt;/DialogPanel&gt;
                            &lt;/TransitionChild&gt;
                        &lt;/div&gt;
                    &lt;/div&gt;
                &lt;/Dialog&gt;
            &lt;/Transition&gt;
        &lt;/&gt;
    );
}
</code></pre>
<p>The code snippet above renders a form that allows users to provide a call description. Once the call is created, the link is generated and available to be copied before starting the call.</p>
<p>Finally, add the <code>MeetingForm</code> and <code>MeetingLink</code> components below the <code>CreateLink</code> component:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> MeetingForm = <span class="hljs-function">(<span class="hljs-params">{
    setShowMeetingLink,
    setFacetimeLink,
}: {
    setShowMeetingLink: Dispatch&lt;SetStateAction&lt;<span class="hljs-built_in">boolean</span>&gt;&gt;;
    setFacetimeLink: Dispatch&lt;SetStateAction&lt;<span class="hljs-built_in">string</span>&gt;&gt;;
}</span>) =&gt;</span> {
    <span class="hljs-keyword">const</span> [description, setDescription] = useState&lt;<span class="hljs-built_in">string</span>&gt;(<span class="hljs-string">""</span>);

    <span class="hljs-keyword">const</span> handleStartMeeting = <span class="hljs-keyword">async</span> (e: React.FormEvent&lt;HTMLFormElement&gt;) =&gt; {
        e.preventDefault();
        <span class="hljs-built_in">console</span>.log({ description });
    };

    <span class="hljs-keyword">return</span> (
        &lt;&gt;
            &lt;DialogTitle
                <span class="hljs-keyword">as</span>=<span class="hljs-string">'h3'</span>
                className=<span class="hljs-string">'text-lg font-bold leading-6 text-green-600'</span>
            &gt;
                Create Instant FaceTime
            &lt;/DialogTitle&gt;

            &lt;Description className=<span class="hljs-string">'text-xs opacity-40 mb-4'</span>&gt;
                You can start a <span class="hljs-keyword">new</span> FaceTime instantly.
            &lt;/Description&gt;

            &lt;form className=<span class="hljs-string">'w-full'</span> onSubmit={handleStartMeeting}&gt;
                &lt;label
                    className=<span class="hljs-string">'block text-left text-sm font-medium text-gray-700'</span>
                    htmlFor=<span class="hljs-string">'description'</span>
                &gt;
                    Meeting Description
                &lt;/label&gt;
                &lt;input
                    <span class="hljs-keyword">type</span>=<span class="hljs-string">'text'</span>
                    name=<span class="hljs-string">'description'</span>
                    id=<span class="hljs-string">'description'</span>
                    value={description}
                    required
                    onChange={<span class="hljs-function">(<span class="hljs-params">e</span>) =&gt;</span> setDescription(e.target.value)}
                    className=<span class="hljs-string">'mt-1 block w-full text-sm py-3 px-4 border-gray-200 border-[1px] rounded mb-3'</span>
                    placeholder=<span class="hljs-string">'Enter a description for the meeting'</span>
                /&gt;

                &lt;button className=<span class="hljs-string">'w-full bg-green-600 text-white py-3 rounded mt-4'</span>&gt;
                    Proceed
                &lt;/button&gt;
            &lt;/form&gt;
        &lt;/&gt;
    );
};
</code></pre>
<p>The <code>MeetingForm</code> component accepts the call description, while the <code>MeetingLink</code> component displays the generated call link and allows users to copy it before starting the call.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1726482110082/638609aa-e0ae-4cc4-b520-2050966180b4.png" alt="facetime-app-create-instant-facetime" class="image--center mx-auto" width="3098" height="1792" loading="lazy"></p>
<h3 id="heading-the-join-meeting-modal">The Join Meeting Modal</h3>
<p>Copy the code snippet below into the <code>JoinMeeting.tsx</code> file. It renders a form that accepts the call link and redirects users to the call page.</p>
<pre><code class="lang-typescript"><span class="hljs-string">"use client"</span>;
<span class="hljs-keyword">import</span> {
    Dialog,
    DialogTitle,
    DialogPanel,
    Transition,
    TransitionChild,
} <span class="hljs-keyword">from</span> <span class="hljs-string">"@headlessui/react"</span>;
<span class="hljs-keyword">import</span> { useRouter } <span class="hljs-keyword">from</span> <span class="hljs-string">"next/navigation"</span>;
<span class="hljs-keyword">import</span> { Fragment, useState } <span class="hljs-keyword">from</span> <span class="hljs-string">"react"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">JoinMeeting</span>(<span class="hljs-params">{ enable, setEnable }: Props</span>) </span>{
    <span class="hljs-keyword">const</span> closeModal = <span class="hljs-function">() =&gt;</span> setEnable(<span class="hljs-literal">false</span>);

    <span class="hljs-keyword">return</span> (
        &lt;&gt;
            &lt;Transition appear show={enable} <span class="hljs-keyword">as</span>={Fragment}&gt;
                &lt;Dialog <span class="hljs-keyword">as</span>=<span class="hljs-string">'div'</span> className=<span class="hljs-string">'relative z-10'</span> onClose={closeModal}&gt;
                    &lt;TransitionChild
                        <span class="hljs-keyword">as</span>={Fragment}
                        enter=<span class="hljs-string">'ease-out duration-300'</span>
                        enterFrom=<span class="hljs-string">'opacity-0'</span>
                        enterTo=<span class="hljs-string">'opacity-100'</span>
                        leave=<span class="hljs-string">'ease-in duration-200'</span>
                        leaveFrom=<span class="hljs-string">'opacity-100'</span>
                        leaveTo=<span class="hljs-string">'opacity-0'</span>
                    &gt;
                        &lt;div className=<span class="hljs-string">'fixed inset-0 bg-black/75'</span> /&gt;
                    &lt;/TransitionChild&gt;

                    &lt;div className=<span class="hljs-string">'fixed inset-0 overflow-y-auto'</span>&gt;
                        &lt;div className=<span class="hljs-string">'flex min-h-full items-center justify-center p-4 text-center'</span>&gt;
                            &lt;TransitionChild
                                <span class="hljs-keyword">as</span>={Fragment}
                                enter=<span class="hljs-string">'ease-out duration-300'</span>
                                enterFrom=<span class="hljs-string">'opacity-0 scale-95'</span>
                                enterTo=<span class="hljs-string">'opacity-100 scale-100'</span>
                                leave=<span class="hljs-string">'ease-in duration-200'</span>
                                leaveFrom=<span class="hljs-string">'opacity-100 scale-100'</span>
                                leaveTo=<span class="hljs-string">'opacity-0 scale-95'</span>
                            &gt;
                                &lt;DialogPanel className=<span class="hljs-string">'w-full max-w-2xl transform overflow-hidden rounded-2xl bg-white p-6 align-middle shadow-xl transition-all text-center'</span>&gt;
                                    &lt;CallLinkForm /&gt;
                                &lt;/DialogPanel&gt;
                            &lt;/TransitionChild&gt;
                        &lt;/div&gt;
                    &lt;/div&gt;
                &lt;/Dialog&gt;
            &lt;/Transition&gt;
        &lt;/&gt;
    );
}
</code></pre>
<p>Add the <code>CallLinkForm</code> below the <code>JoinMeeting</code> component:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> CallLinkForm = <span class="hljs-function">() =&gt;</span> {
    <span class="hljs-keyword">const</span> [link, setLink] = useState&lt;<span class="hljs-built_in">string</span>&gt;(<span class="hljs-string">""</span>);
    <span class="hljs-keyword">const</span> router = useRouter();

    <span class="hljs-keyword">const</span> handleJoinMeeting = <span class="hljs-function">(<span class="hljs-params">e: React.FormEvent&lt;HTMLFormElement&gt;</span>) =&gt;</span> {
        e.preventDefault();
        router.push(<span class="hljs-string">`<span class="hljs-subst">${link}</span>`</span>);
    };

    <span class="hljs-keyword">return</span> (
        &lt;&gt;
            &lt;DialogTitle
                <span class="hljs-keyword">as</span>=<span class="hljs-string">'h3'</span>
                className=<span class="hljs-string">'text-lg font-bold leading-6 text-green-600'</span>
            &gt;
                Join FaceTime
            &lt;/DialogTitle&gt;

            &lt;form className=<span class="hljs-string">'w-full'</span> onSubmit={handleJoinMeeting}&gt;
                &lt;label
                    className=<span class="hljs-string">'block text-left text-sm font-medium text-gray-700'</span>
                    htmlFor=<span class="hljs-string">'link'</span>
                &gt;
                    Enter the FaceTime link
                &lt;/label&gt;
                &lt;input
                    <span class="hljs-keyword">type</span>=<span class="hljs-string">'url'</span>
                    name=<span class="hljs-string">'link'</span>
                    id=<span class="hljs-string">'link'</span>
                    value={link}
                    onChange={<span class="hljs-function">(<span class="hljs-params">e</span>) =&gt;</span> setLink(e.target.value)}
                    className=<span class="hljs-string">'mt-1 block w-full text-sm py-3 px-4 border-gray-200 border-[1px] rounded mb-3'</span>
                    placeholder=<span class="hljs-string">'Enter the FaceTime link'</span>
                /&gt;

                &lt;button className=<span class="hljs-string">'w-full bg-green-600 text-white py-3 rounded mt-4'</span>&gt;
                    Join now
                &lt;/button&gt;
            &lt;/form&gt;
        &lt;/&gt;
    );
};
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1726482173301/09881faa-54f8-4293-a186-b608ef5a0e05.png" alt="facetime-app-join-popup" class="image--center mx-auto" width="3104" height="1788" loading="lazy"></p>
<p>Congratulations! You’ve completed the app’s interface.</p>
<h2 id="heading-how-to-authenticate-users-with-clerk">How to Authenticate Users with Clerk</h2>
<p><a target="_blank" href="https://clerk.com/">Clerk</a> is a user management platform that enables you to add auth to web apps.</p>
<p>You can install the <a target="_blank" href="https://clerk.com/docs/quickstarts/nextjs">Clerk Next.js SDK</a> by running the following code snippet in your terminal:</p>
<pre><code class="lang-bash">npm install @clerk/nextjs
</code></pre>
<p>Create a <code>middleware.ts</code> file within the Next.js <code>src</code> folder and copy the code snippet below into the file:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { clerkMiddleware, createRouteMatcher } <span class="hljs-keyword">from</span> <span class="hljs-string">"@clerk/nextjs/server"</span>;

<span class="hljs-keyword">const</span> protectedRoutes = createRouteMatcher([
    <span class="hljs-string">"/facetime(.*)"</span>,
    <span class="hljs-string">"/dashboard"</span>,
    <span class="hljs-string">"/"</span>,
]);

<span class="hljs-comment">//👇🏻 protects the route</span>
<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> clerkMiddleware(<span class="hljs-function">(<span class="hljs-params">auth, req</span>) =&gt;</span> {
    <span class="hljs-keyword">if</span> (protectedRoutes(req)) {
        auth().protect();
    }
});

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> config = {
    matcher: [<span class="hljs-string">"/((?!.*\\\\..*|_next).*)"</span>, <span class="hljs-string">"/"</span>, <span class="hljs-string">"/(api|trpc)(.*)"</span>],
};
</code></pre>
<p>The <code>createRouteMatcher</code> function accepts an array containing routes to be protected from unauthenticated users and the <code>clerkMiddleware()</code> function ensures the routes are protected.</p>
<p>Next, import the following Clerk components into the <code>app/layout.tsx</code> file and update the <code>RootLayout</code> function as shown below:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> {
    ClerkProvider,
    SignInButton,
    SignedIn,
    SignedOut,
    UserButton,
} <span class="hljs-keyword">from</span> <span class="hljs-string">"@clerk/nextjs"</span>;
<span class="hljs-keyword">import</span> <span class="hljs-string">"./globals.css"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">RootLayout</span>(<span class="hljs-params">{
    children,
}: {
    children: React.ReactNode;
}</span>) </span>{
    <span class="hljs-keyword">return</span> (
        &lt;ClerkProvider&gt;
            &lt;html lang=<span class="hljs-string">'en'</span>&gt;
                &lt;body className={inter.className}&gt;
                    &lt;nav className=<span class="hljs-string">'w-full py-4 md:px-8 px-4 text-center flex items-center justify-between sticky top-0 bg-white '</span>&gt;
                        &lt;div className=<span class="hljs-string">'flex items-center justify-end gap-5'</span>&gt;
                            {<span class="hljs-comment">/*-- if user is signed out --*/</span>}
                            &lt;SignedOut&gt;
                                &lt;SignInButton mode=<span class="hljs-string">'modal'</span> /&gt;
                            &lt;/SignedOut&gt;
                            {<span class="hljs-comment">/*-- if user is signed in --*/</span>}
                            &lt;SignedIn&gt;
                                &lt;UserButton /&gt;
                            &lt;/SignedIn&gt;
                        &lt;/div&gt;
                    &lt;/nav&gt;

                    {children}
                &lt;/body&gt;
            &lt;/html&gt;
        &lt;/ClerkProvider&gt;
    );
}
</code></pre>
<p>After completing this, users will be prompted to create an account or sign in before they can access the application pages.</p>
<p>Finally, create a <a target="_blank" href="https://clerk.com">Clerk account</a> and set up a new Clerk application. Add your Clerk publishable and secret keys to the <code>.env.local</code> file in your project.</p>
<pre><code class="lang-bash">NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=&lt;publishable_key&gt;
CLERK_SECRET_KEY=&lt;secret_key&gt;
</code></pre>
<h2 id="heading-how-to-set-up-stream-in-a-nextjs-app">How to Set Up Stream in a Next.js app</h2>
<p>First, create a <a target="_blank" href="https://getstream.io/">Stream account</a> and set up an organization to house your app. Then, copy the following credentials into your <code>.env.local</code> file:</p>
<pre><code class="lang-bash">STREAM_APP_ID=&lt;your_app_id&gt;
NEXT_PUBLIC_STREAM_API_KEY=&lt;your_stream_api_key&gt;
STREAM_SECRET_KEY=&lt;your_stream_secret_key&gt;
NEXT_PUBLIC_FACETIME_HOST=http://localhost:3000/facetime
</code></pre>
<p>Next, install <a target="_blank" href="https://www.npmjs.com/package/@stream-io/video-react-sdk">Stream React Video SDK</a> and the <a target="_blank" href="https://getstream.io/video/docs/api/#installation">Stream Node.js SDK</a>.</p>
<pre><code class="lang-bash">npm install @stream-io/video-react-sdk @stream-io/node-sdk
</code></pre>
<p>Create a <code>providers</code> folder containing a <code>StreamVideoProvider.tsx</code> file and copy the following code snippet into the file:</p>
<pre><code class="lang-typescript"><span class="hljs-string">"use client"</span>;
<span class="hljs-keyword">import</span> { tokenProvider } <span class="hljs-keyword">from</span> <span class="hljs-string">"@/actions/stream.actions"</span>;
<span class="hljs-keyword">import</span> { StreamVideo, StreamVideoClient } <span class="hljs-keyword">from</span> <span class="hljs-string">"@stream-io/video-react-sdk"</span>;
<span class="hljs-keyword">import</span> { useState, ReactNode, useEffect } <span class="hljs-keyword">from</span> <span class="hljs-string">"react"</span>;
<span class="hljs-keyword">import</span> { useUser } <span class="hljs-keyword">from</span> <span class="hljs-string">"@clerk/nextjs"</span>;

<span class="hljs-keyword">const</span> apiKey = process.env.NEXT_PUBLIC_STREAM_API_KEY!;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> StreamVideoProvider = <span class="hljs-function">(<span class="hljs-params">{ children }: { children: ReactNode }</span>) =&gt;</span> {
    <span class="hljs-keyword">const</span> [videoClient, setVideoClient] = useState&lt;StreamVideoClient&gt;();

    <span class="hljs-keyword">const</span> { user, isLoaded } = useUser();

    useEffect(<span class="hljs-function">() =&gt;</span> {
        <span class="hljs-keyword">if</span> (!isLoaded || !user || !apiKey) <span class="hljs-keyword">return</span>;
        <span class="hljs-keyword">if</span> (!tokenProvider) <span class="hljs-keyword">return</span>;
        <span class="hljs-keyword">const</span> client = <span class="hljs-keyword">new</span> StreamVideoClient({
            apiKey,
            user: {
                id: user?.id,
                name: user?.primaryEmailAddress?.emailAddress,
                image: user?.imageUrl,
            },
            tokenProvider, <span class="hljs-comment">//👉🏻 pending creation</span>
        });

        setVideoClient(client);
    }, [user, isLoaded]);

    <span class="hljs-keyword">if</span> (!videoClient) <span class="hljs-keyword">return</span> <span class="hljs-literal">null</span>;

    <span class="hljs-keyword">return</span> &lt;StreamVideo client={videoClient}&gt;{children}&lt;/StreamVideo&gt;;
};
</code></pre>
<p>Let’s wrap the entire app with the <code>StreamVideoProvider</code> component, which initializes a Stream client to identify each user.</p>
<p>The <code>StreamVideoClient</code> function takes an object containing the API key, the user object with details from Clerk, and a <code>tokenProvider</code>.</p>
<p>Next, let’s create a <a target="_blank" href="https://nextjs.org/docs/app/building-your-application/data-fetching/server-actions-and-mutations">Next.js server action</a> (<code>tokenProvider</code>) that generates the token.</p>
<p>Create an <code>actions</code> folder, add a <code>stream.actions.ts</code> file, and copy the following code snippet into the file:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">//👇🏻 tokenPrvoider function</span>
<span class="hljs-string">"use server"</span>;

<span class="hljs-keyword">import</span> { currentUser } <span class="hljs-keyword">from</span> <span class="hljs-string">"@clerk/nextjs/server"</span>;
<span class="hljs-keyword">import</span> { StreamClient } <span class="hljs-keyword">from</span> <span class="hljs-string">"@stream-io/node-sdk"</span>;

<span class="hljs-keyword">const</span> STREAM_API_KEY = process.env.NEXT_PUBLIC_STREAM_API_KEY!;
<span class="hljs-keyword">const</span> STREAM_API_SECRET = process.env.STREAM_SECRET_KEY!;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> tokenProvider = <span class="hljs-keyword">async</span> () =&gt; {
    <span class="hljs-keyword">const</span> user = <span class="hljs-keyword">await</span> currentUser();

    <span class="hljs-keyword">if</span> (!user) <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">"User is not authenticated"</span>);
    <span class="hljs-keyword">if</span> (!STREAM_API_KEY) <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">"Stream API key secret is missing"</span>);
    <span class="hljs-keyword">if</span> (!STREAM_API_SECRET) <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">"Stream API secret is missing"</span>);

    <span class="hljs-keyword">const</span> streamClient = <span class="hljs-keyword">new</span> StreamClient(STREAM_API_KEY, STREAM_API_SECRET);

    <span class="hljs-keyword">const</span> expirationTime = <span class="hljs-built_in">Math</span>.floor(<span class="hljs-built_in">Date</span>.now() / <span class="hljs-number">1000</span>) + <span class="hljs-number">3600</span>;
    <span class="hljs-keyword">const</span> issuedAt = <span class="hljs-built_in">Math</span>.floor(<span class="hljs-built_in">Date</span>.now() / <span class="hljs-number">1000</span>) - <span class="hljs-number">60</span>;

    <span class="hljs-comment">//👇🏻 generates a Stream user token</span>
    <span class="hljs-keyword">const</span> token = streamClient.generateUserToken({
        user_id: user.id,
        exp: expirationTime,
        validity_in_seconds: issuedAt,
    });
    <span class="hljs-comment">//👇🏻 returns the user token</span>
    <span class="hljs-keyword">return</span> token;
};
</code></pre>
<p>Finally, update the <code>RootLayout</code> function in the <code>app/layout.tsx</code> file by wrapping the entire application with the <code>StreamVideoProvider</code> component:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> <span class="hljs-string">"@stream-io/video-react-sdk/dist/css/styles.css"</span>;
<span class="hljs-keyword">import</span> { StreamVideoProvider } <span class="hljs-keyword">from</span> <span class="hljs-string">"./providers/StreamVideoProvider"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">RootLayout</span>(<span class="hljs-params">{
    children,
}: {
    children: React.ReactNode;
}</span>) </span>{
    <span class="hljs-keyword">return</span> (
        &lt;ClerkProvider&gt;
            &lt;html lang=<span class="hljs-string">'en'</span>&gt;
                &lt;body className={inter.className}&gt;
                    &lt;StreamVideoProvider&gt;
                        &lt;nav className=<span class="hljs-string">'w-full py-4 md:px-8 px-4 text-center flex items-center justify-between sticky top-0 bg-white '</span>&gt;
                            &lt;div className=<span class="hljs-string">'flex items-center justify-end gap-5'</span>&gt;
                                {<span class="hljs-comment">/*-- if user is signed out --*/</span>}
                                &lt;SignedOut&gt;
                                    &lt;SignInButton mode=<span class="hljs-string">'modal'</span> /&gt;
                                &lt;/SignedOut&gt;
                                {<span class="hljs-comment">/*-- if user is signed in --*/</span>}
                                &lt;SignedIn&gt;
                                    &lt;UserButton /&gt;
                                &lt;/SignedIn&gt;
                            &lt;/div&gt;
                        &lt;/nav&gt;

                        {children}
                    &lt;/StreamVideoProvider&gt;
                &lt;/body&gt;
            &lt;/html&gt;
        &lt;/ClerkProvider&gt;
    );
}
</code></pre>
<p>Congratulations! You've successfully integrated Stream into the Next.js app.</p>
<h2 id="heading-how-to-create-and-join-calls-with-stream">How to Create and Join Calls with Stream</h2>
<p>In this section, you'll learn how to create, schedule, and join calls using the Stream SDK. You'll also learn how to set up the meeting room with the necessary components and fetch upcoming calls from Stream.</p>
<h3 id="heading-creating-and-scheduling-calls">Creating and Scheduling calls</h3>
<p>To create an instant meeting, execute the <code>handleStartMeeting</code> function. It generates a random ID for the call and creates the meeting using the current date and the provided description.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { useStreamVideoClient } <span class="hljs-keyword">from</span> <span class="hljs-string">"@stream-io/video-react-sdk"</span>;
<span class="hljs-keyword">import</span> { useUser } <span class="hljs-keyword">from</span> <span class="hljs-string">"@clerk/nextjs"</span>;
<span class="hljs-keyword">const</span> client = useStreamVideoClient();
<span class="hljs-keyword">const</span> { user } = useUser();

<span class="hljs-keyword">const</span> handleStartMeeting = <span class="hljs-keyword">async</span> (e: React.FormEvent&lt;HTMLFormElement&gt;) =&gt; {
    e.preventDefault();
    <span class="hljs-keyword">if</span> (!client || !user) <span class="hljs-keyword">return</span>;
    <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">const</span> id = crypto.randomUUID();
        <span class="hljs-keyword">const</span> call = client.call(<span class="hljs-string">"default"</span>, id);
        <span class="hljs-keyword">if</span> (!call) <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">"Failed to create meeting"</span>);

        <span class="hljs-keyword">await</span> call.getOrCreate({
            data: {
                starts_at: <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(<span class="hljs-built_in">Date</span>.now()).toISOString(),
                custom: {
                    description,
                },
            },
        });

        setFacetimeLink(<span class="hljs-string">`<span class="hljs-subst">${call.id}</span>`</span>);
        setShowMeetingLink(<span class="hljs-literal">true</span>);
    } <span class="hljs-keyword">catch</span> (error) {
        <span class="hljs-built_in">console</span>.error(error);
        alert(<span class="hljs-string">"Failed to create Meeting"</span>);
    }
};
</code></pre>
<p>The <code>call.getOrCreate()</code> function accepts an optional call description along with the current date and time to initiate the call.</p>
<p>It also allows you to schedule calls for a specific time in the future. In this case, you can specify the desired date and time, and Stream will automatically schedule the call for that period.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { useStreamVideoClient } <span class="hljs-keyword">from</span> <span class="hljs-string">"@stream-io/video-react-sdk"</span>;
<span class="hljs-keyword">import</span> { useUser } <span class="hljs-keyword">from</span> <span class="hljs-string">"@clerk/nextjs"</span>;
<span class="hljs-keyword">const</span> client = useStreamVideoClient();
<span class="hljs-keyword">const</span> { user } = useUser();

<span class="hljs-keyword">const</span> handleScheduleMeeting = <span class="hljs-keyword">async</span> (e: React.FormEvent&lt;HTMLFormElement&gt;) =&gt; {
    e.preventDefault();
    <span class="hljs-keyword">if</span> (!client || !user) <span class="hljs-keyword">return</span>;
    <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">const</span> id = crypto.randomUUID();
        <span class="hljs-keyword">const</span> call = client.call(<span class="hljs-string">"default"</span>, id);
        <span class="hljs-keyword">if</span> (!call) <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">"Failed to create meeting"</span>);

        <span class="hljs-keyword">await</span> call.getOrCreate({
            data: {
                <span class="hljs-comment">//👇🏻 only necessary changes</span>
                starts_at: <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(dateTime).toISOString(),
                custom: {
                    description,
                },
            },
        });
        setFacetimeLink(<span class="hljs-string">`<span class="hljs-subst">${call.id}</span>`</span>);
        setShowMeetingLink(<span class="hljs-literal">true</span>);
    } <span class="hljs-keyword">catch</span> (error) {
        <span class="hljs-built_in">console</span>.error(error);
        <span class="hljs-built_in">console</span>.error(<span class="hljs-string">"Failed to create Meeting"</span>);
    }
};
</code></pre>
<h3 id="heading-joining-calls-and-the-meeting-page">Joining calls and the Meeting Page</h3>
<p>Recall that the meeting link in the app is declared as:</p>
<pre><code class="lang-jsx"><span class="hljs-string">`<span class="hljs-subst">${process.env.NEXT_PUBLIC_FACETIME_HOST}</span>/<span class="hljs-subst">${facetimeLink}</span>`</span>
<span class="hljs-comment">// 👉🏻 format: &lt;http://localhost:3000/facetime/&gt;&lt;call.id&gt;</span>
</code></pre>
<p>Therefore, we need to create the <code>/facetime/&lt;callID&gt;</code> route to enable users to join a call. To do this, create a <code>facetime</code> folder with an <code>[id]</code> directory inside, and within that directory, add a <code>page.tsx</code> file. Then, copy the following code snippet into the file:</p>
<pre><code class="lang-typescript"><span class="hljs-string">"use client"</span>;
<span class="hljs-keyword">import</span> { useGetCallById } <span class="hljs-keyword">from</span> <span class="hljs-string">"@/app/hooks/useGetCallById"</span>;
<span class="hljs-keyword">import</span> { useUser } <span class="hljs-keyword">from</span> <span class="hljs-string">"@clerk/nextjs"</span>;
<span class="hljs-keyword">import</span> {
    StreamCall,
    StreamTheme,
    PaginatedGridLayout,
    CallControls,
} <span class="hljs-keyword">from</span> <span class="hljs-string">"@stream-io/video-react-sdk"</span>;
<span class="hljs-keyword">import</span> { useParams, useRouter } <span class="hljs-keyword">from</span> <span class="hljs-string">"next/navigation"</span>;
<span class="hljs-keyword">import</span> { useEffect, useState } <span class="hljs-keyword">from</span> <span class="hljs-string">"react"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">FaceTimePage</span>(<span class="hljs-params"></span>) </span>{
    <span class="hljs-keyword">const</span> { id } = useParams&lt;{ id: <span class="hljs-built_in">string</span> }&gt;();
    <span class="hljs-keyword">const</span> [confirmJoin, setConfirmJoin] = useState&lt;<span class="hljs-built_in">boolean</span>&gt;(<span class="hljs-literal">false</span>);
    <span class="hljs-keyword">const</span> [camMicEnabled, setCamMicEnabled] = useState&lt;<span class="hljs-built_in">boolean</span>&gt;(<span class="hljs-literal">false</span>);
    <span class="hljs-keyword">const</span> router = useRouter();
    <span class="hljs-comment">//👇🏻 gets call details by ID</span>
    <span class="hljs-keyword">const</span> { call, isCallLoading } = useGetCallById(id);

    useEffect(<span class="hljs-function">() =&gt;</span> {
        <span class="hljs-keyword">if</span> (camMicEnabled) {
            call?.camera.enable();
            call?.microphone.enable();
        } <span class="hljs-keyword">else</span> {
            call?.camera.disable();
            call?.microphone.disable();
        }
    }, [call, camMicEnabled]);

    <span class="hljs-comment">//👇🏻 enable users to join calls</span>
    <span class="hljs-keyword">const</span> handleJoin = <span class="hljs-function">() =&gt;</span> {
        call?.join();
        setConfirmJoin(<span class="hljs-literal">true</span>);
    };

    <span class="hljs-keyword">if</span> (isCallLoading) <span class="hljs-keyword">return</span> &lt;p&gt;Loading...&lt;/p&gt;;

    <span class="hljs-keyword">if</span> (!call) <span class="hljs-keyword">return</span> &lt;p&gt;Call not found&lt;/p&gt;;

    <span class="hljs-keyword">return</span> (
        &lt;main className=<span class="hljs-string">'min-h-screen w-full items-center justify-center'</span>&gt;
            &lt;StreamCall call={call}&gt;
                &lt;StreamTheme&gt;
                    {confirmJoin ? (
                        &lt;MeetingRoom /&gt;
                    ) : (
                        &lt;div className=<span class="hljs-string">'flex flex-col items-center justify-center gap-5'</span>&gt;
                            &lt;h1 className=<span class="hljs-string">'text-3xl font-bold'</span>&gt;Join Call&lt;/h1&gt;
                            &lt;p className=<span class="hljs-string">'text-lg'</span>&gt;
                                Are you sure you want to join <span class="hljs-built_in">this</span> call?
                            &lt;/p&gt;
                            &lt;div className=<span class="hljs-string">'flex gap-5'</span>&gt;
                                &lt;button
                                    onClick={handleJoin}
                                    className=<span class="hljs-string">'px-4 py-3 bg-green-600 text-green-50'</span>
                                &gt;
                                    Join
                                &lt;/button&gt;
                                &lt;button
                                    onClick={<span class="hljs-function">() =&gt;</span> router.push(<span class="hljs-string">"/"</span>)}
                                    className=<span class="hljs-string">'px-4 py-3 bg-red-600 text-red-50'</span>
                                &gt;
                                    Cancel
                                &lt;/button&gt;
                            &lt;/div&gt;
                        &lt;/div&gt;
                    )}
                &lt;/StreamTheme&gt;
            &lt;/StreamCall&gt;
        &lt;/main&gt;
    );
}
</code></pre>
<p>When users visit the meeting page, they are presented with a confirmation message, allowing them to confirm that they want to join the call.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1726483083226/26ccb1d9-dc33-4a31-81a9-c4b0a3d00b91.png" alt="facetime-app-live" class="image--center mx-auto" width="3092" height="1834" loading="lazy"></p>
<p>In the code snippet above:</p>
<ul>
<li><p>The <code>useGetCallById</code> hook is a custom function that retrieves call details based on the call ID.</p>
</li>
<li><p>The <code>handleJoin</code> function allows users to join the call and then displays the <code>&lt;MeetingRoom /&gt;</code> component.</p>
</li>
</ul>
<p>Add the <code>MeetingRoom</code> component below the <code>FaceTimePage</code> component:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> MeetingRoom = <span class="hljs-function">() =&gt;</span> {
    <span class="hljs-keyword">const</span> router = useRouter();

    <span class="hljs-keyword">const</span> handleLeave = <span class="hljs-function">() =&gt;</span> {
        confirm(<span class="hljs-string">"Are you sure you want to leave the call?"</span>) &amp;&amp; router.push(<span class="hljs-string">"/"</span>);
    };

    <span class="hljs-keyword">return</span> (
        &lt;section className=<span class="hljs-string">'relative min-h-screen w-full overflow-hidden pt-4'</span>&gt;
            &lt;div className=<span class="hljs-string">'relative flex size-full items-center justify-center'</span>&gt;
                &lt;div className=<span class="hljs-string">'flex size-full max-w-[1000px] items-center'</span>&gt;
                    &lt;PaginatedGridLayout /&gt;
                &lt;/div&gt;
                &lt;div className=<span class="hljs-string">'fixed bottom-0 flex w-full items-center justify-center gap-5'</span>&gt;
                    &lt;CallControls onLeave={handleLeave} /&gt;
                &lt;/div&gt;
            &lt;/div&gt;
        &lt;/section&gt;
    );
};
</code></pre>
<p>The <a target="_blank" href="https://getstream.io/video/docs/react/ui-components/core/call-layout/#paginatedgridlayout"><code>PaginatedGridLayout</code></a> arranges participants in a grid layout with pagination, allowing you to manage larger video calls by displaying a set number of participants per page.</p>
<p>The <code>CallControls</code> component provides built-in actions, such as muting, video toggling, and screen sharing, that can be performed during a call. Both components are part of the Stream SDK, making integration seamless.</p>
<p>Additionally, you can switch to the <a target="_blank" href="https://getstream.io/video/docs/react/ui-components/core/call-layout/#speakerlayout"><code>SpeakerLayout</code></a>, which highlights the dominant speaker or shared screen while displaying other participants in a smaller view.</p>
<p>Finally, create a <code>hooks</code> folder containing the <code>useGetCallById.ts</code> file and copy the code snippet below into the file:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { useEffect, useState } <span class="hljs-keyword">from</span> <span class="hljs-string">"react"</span>;
<span class="hljs-keyword">import</span> { Call, useStreamVideoClient } <span class="hljs-keyword">from</span> <span class="hljs-string">"@stream-io/video-react-sdk"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> useGetCallById = <span class="hljs-function">(<span class="hljs-params">id: <span class="hljs-built_in">string</span> | <span class="hljs-built_in">string</span>[]</span>) =&gt;</span> {
    <span class="hljs-keyword">const</span> [call, setCall] = useState&lt;Call&gt;();
    <span class="hljs-keyword">const</span> [isCallLoading, setIsCallLoading] = useState(<span class="hljs-literal">true</span>);

    <span class="hljs-keyword">const</span> client = useStreamVideoClient();

    useEffect(<span class="hljs-function">() =&gt;</span> {
        <span class="hljs-keyword">if</span> (!client) <span class="hljs-keyword">return</span>;

        <span class="hljs-keyword">const</span> loadCall = <span class="hljs-keyword">async</span> () =&gt; {
            <span class="hljs-keyword">try</span> {
                <span class="hljs-keyword">const</span> { calls } = <span class="hljs-keyword">await</span> client.queryCalls({
                    filter_conditions: { id },
                });

                <span class="hljs-keyword">if</span> (calls.length &gt; <span class="hljs-number">0</span>) setCall(calls[<span class="hljs-number">0</span>]);

                setIsCallLoading(<span class="hljs-literal">false</span>);
            } <span class="hljs-keyword">catch</span> (error) {
                <span class="hljs-built_in">console</span>.error(error);
                setIsCallLoading(<span class="hljs-literal">false</span>);
            }
        };

        loadCall();
    }, [client, id]);

    <span class="hljs-keyword">return</span> { call, isCallLoading };
};
</code></pre>
<p>The code snippet above filters the call list and <a target="_blank" href="https://getstream.io/video/docs/react/guides/querying-calls/#filters">returns the call with a matching ID</a>, allowing users to join the specified call.</p>
<h3 id="heading-retrieving-upcoming-calls">Retrieving Upcoming Calls</h3>
<p>To retrieve upcoming calls from Stream, you can create a custom hook that <a target="_blank" href="https://getstream.io/video/docs/react/guides/querying-calls/#calls-the-user-has-created-or-is-a-member-of">fetches all the calls created by the user</a>, as well as the calls they are a member of.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { useEffect, useState } <span class="hljs-keyword">from</span> <span class="hljs-string">"react"</span>;
<span class="hljs-keyword">import</span> { useUser } <span class="hljs-keyword">from</span> <span class="hljs-string">"@clerk/nextjs"</span>;
<span class="hljs-keyword">import</span> { Call, useStreamVideoClient } <span class="hljs-keyword">from</span> <span class="hljs-string">"@stream-io/video-react-sdk"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> useGetCalls = <span class="hljs-function">() =&gt;</span> {
    <span class="hljs-keyword">const</span> { user } = useUser();
    <span class="hljs-keyword">const</span> client = useStreamVideoClient();
    <span class="hljs-keyword">const</span> [calls, setCalls] = useState&lt;Call[]&gt;();
    <span class="hljs-keyword">const</span> [isLoading, setIsLoading] = useState(<span class="hljs-literal">false</span>);

    useEffect(<span class="hljs-function">() =&gt;</span> {
        <span class="hljs-keyword">const</span> loadCalls = <span class="hljs-keyword">async</span> () =&gt; {
            <span class="hljs-keyword">if</span> (!client || !user?.id) <span class="hljs-keyword">return</span>;
            setIsLoading(<span class="hljs-literal">true</span>);
            <span class="hljs-keyword">try</span> {
                <span class="hljs-comment">//👇🏻 gets all the calls the user is featured in</span>
                <span class="hljs-keyword">const</span> { calls } = <span class="hljs-keyword">await</span> client.queryCalls({
                    sort: [{ field: <span class="hljs-string">"starts_at"</span>, direction: <span class="hljs-number">-1</span> }],
                    filter_conditions: {
                        starts_at: { $exists: <span class="hljs-literal">true</span> },
                        $or: [
                            { created_by_user_id: user.id },
                            { members: { $in: [user.id] } },
                        ],
                    },
                });

                setCalls(calls);
            } <span class="hljs-keyword">catch</span> (error) {
                <span class="hljs-built_in">console</span>.error(error);
            } <span class="hljs-keyword">finally</span> {
                setIsLoading(<span class="hljs-literal">false</span>);
            }
        };

        loadCalls();
    }, [client, user?.id]);

    <span class="hljs-keyword">const</span> now = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>();

    <span class="hljs-comment">//👇🏻 gets only calls that are yet to start</span>
    <span class="hljs-keyword">const</span> upcomingCalls = calls?.filter(<span class="hljs-function">(<span class="hljs-params">{ state: { startsAt } }: Call</span>) =&gt;</span> {
        <span class="hljs-keyword">return</span> startsAt &amp;&amp; <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(startsAt) &gt; now;
    });

    <span class="hljs-keyword">return</span> { upcomingCalls, isLoading };
};
</code></pre>
<p>The <code>useGetCalls</code> hook <a target="_blank" href="https://getstream.io/video/docs/react/guides/querying-calls/#calls-the-user-has-created-or-is-a-member-of">retrieves the list of upcoming calls</a>, which can then be displayed in the <code>UpcomingMeeting</code> modal.</p>
<p>Congratulations! You’ve completed the project for this tutorial.</p>
<p>Check out the live app <a target="_blank" href="https://facetime-on-stream.vercel.app/">here.</a></p>
<h2 id="heading-next-steps">Next Steps</h2>
<p>So far, you’ve learned how to build a video conferencing app. If you'd like to learn more about how you can leverage Stream to build scalable apps, then check out these resources:</p>
<ul>
<li><p><a target="_blank" href="https://getstream.io/chat/">How to integrate Stream Chat Messaging</a></p>
</li>
<li><p><a target="_blank" href="https://getstream.io/video/">How to integrate Stream Audio and Video calls</a></p>
</li>
<li><p><a target="_blank" href="https://getstream.io/activity-feeds/">How integrate Stream Activity Feeds</a></p>
</li>
</ul>
<h2 id="heading-before-we-end"><strong>Before We End...</strong></h2>
<p>I hope you found it insightful and that it has given you enough motivation on how to build apps using awesome developer tools.</p>
<p>These are some of my other most recent blog posts.</p>
<ul>
<li><p><a target="_blank" href="https://www.devtoolsacademy.com/blog/state-of-databases-2024">State of Databases for Serverless in 2024</a></p>
</li>
<li><p><a target="_blank" href="https://www.devtoolsacademy.com/blog/neon-vs-supabase"><strong>Neon Postgres vs Supabase</strong></a></p>
</li>
<li><p><a target="_blank" href="https://www.devtoolsacademy.com/blog/mongoDB-vs-postgreSQL"><strong>MongoDB vs. PostgreSQL</strong></a></p>
</li>
</ul>
<p>Check out <a target="_blank" href="https://theankurtyagi.com/">my blog</a> for more tutorials like this on awesome developer tools.</p>
<p>Follow me on <a target="_blank" href="https://x.com/theankurtyagi">Twitter</a> to stay updated on my side projects and ongoing learning.</p>
<p>Happy coding.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Use React Compiler – A Complete Guide ]]>
                </title>
                <description>
                    <![CDATA[ In this tutorial, you'll learn how the React compiler can help you write more optimized React applications. React is a user interface library that has been doing its job quite well for over a decade. The component architecture, uni-directional data f... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/react-compiler-complete-guide-react-19/</link>
                <guid isPermaLink="false">66ce54c3e498db1304d6a34b</guid>
                
                    <category>
                        <![CDATA[ React ]]>
                    </category>
                
                    <category>
                        <![CDATA[ React 19 ]]>
                    </category>
                
                    <category>
                        <![CDATA[ React-compiler ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Beginner Developers ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tapas Adhikary ]]>
                </dc:creator>
                <pubDate>Tue, 27 Aug 2024 22:35:47 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1724760187590/f7115fd3-6291-4920-9522-61de269a47f3.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In this tutorial, you'll learn how the React compiler can help you write more optimized React applications.</p>
<p>React is a user interface library that has been doing its job quite well for over a decade. The component architecture, uni-directional data flow, and declarative nature stand out in helping devs building production-ready, scalable software applications.</p>
<p>Over the releases (even up until the latest stable release of v18.x), React has provided various techniques and methodologies to improve application performance.</p>
<p>For example, the entire memoization paradigm has been supported using the <code>React.memo()</code> higher-order component, or with hooks like <code>useMemo()</code> and <code>useCallback()</code>.</p>
<p>In programming, <code>memoization</code> is an optimization technique that makes your programs execute faster by caching the result of expensive computations.</p>
<p>Although React's <code>memoization</code> techniques are great for applying optimizations, as Uncle Ben (remember, Spiderman's uncle?) once said, "With great power comes great responsibility". So we as developers need to be a little more responsible in applying them. Optimization is great, but over-optimization can be a killer for the application's performance.</p>
<p>With React 19, the developer community has received a list of enhancements and features to boast about:</p>
<ul>
<li><p>An experimental open-source compiler. We will be focusing primarily on it in this article.</p>
</li>
<li><p>React Server Components.</p>
</li>
<li><p>Server Actions.</p>
</li>
<li><p>Easier and more organic way of handling the document metadata.</p>
</li>
<li><p>Enhanced hooks and APIs.</p>
</li>
<li><p><code>ref</code> can be passed as props.</p>
</li>
<li><p>Improvements in asset loading for styles, images, and fonts.</p>
</li>
<li><p>A much smoother integration with Web Components.</p>
</li>
</ul>
<p>If these are exciting to you, I recommend <a target="_blank" href="https://www.youtube.com/watch?v=hiiGUjEkzbM">watching this video</a> that explains how each feature will impact you as a React developer. I hope you like it 😊.</p>
<p>The introduction of a <code>compiler</code> with <code>React 19</code> is set to be a game-changer. From now on, we can let the compiler handle the optimization headache rather than keeping it on us.</p>
<p>Does this mean we do not have to use <code>memo</code>, <code>useMemo()</code>, <code>useCallback</code>, and so on anymore? No – we mostly don't. The compiler can take care of these things automatically if you understand and follow the <a target="_blank" href="https://react.dev/reference/rules">Rules of React</a> for components and hooks.</p>
<p>How will it do this? Well, we'll get to it. But before that, let's understand what a <code>compiler</code> is and whether it's justified to call this new optimizer for React code the <code>React Compiler</code>.</p>
<p>If you like to learn from video tutorials as well, this article is also available as a video tutorial here:</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/bdWUVp0TbTU" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<p> </p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-is-a-compiler-traditionally">What is a Compiler, traditionally?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-react-compiler-architecture">React Compiler Architecture</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-react-compiler-in-action">React Compiler in action</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-understanding-the-problem-without-the-react-compiler">Understanding the problem: Without the React Compiler</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-fixing-the-problem-without-the-react-compiler">Fixing the problem: Without the React Compiler</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-fixing-the-problem-using-the-react-compiler">Fixing the problem: Using the React Compiler</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-optimized-react-app-with-react-compiler">Optimized React App with React Compiler</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-react-compiler-in-react-devtools">React Compiler in React DevTools</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-diving-deep-how-does-the-react-compiler-work">Diving deep - How does the React Compiler work?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-do-you-opt-in-and-out-of-the-react-compiler">How do you opt in and out of the React compiler?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-can-we-use-the-react-compiler-with-react-18x">Can we use the React Compiler with React 18.x?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-repositories-to-look-into">Repositories to look into</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-whats-next">What's Next?</a></p>
</li>
</ol>
<h2 id="heading-what-is-a-compiler-traditionally">What is a Compiler, Traditionally?</h2>
<p>Simply put, a compiler is a software program/tool that translates high-level programming language code (source code) into machine code. There are several steps to follow to compile source code and generate machine code:</p>
<ul>
<li><p>The <code>lexical analyzer</code> tokenizes the source code and generates tokens.</p>
</li>
<li><p>The <code>Syntax Analyzer</code> creates an abstract syntax tree (AST) to structure the source code tokens logically.</p>
</li>
<li><p>The <code>Semantic Analyzer</code> validates the semantic (or syntactic) correctness of the code.</p>
</li>
<li><p>After all three types of analysis by the respective analyzers, some <code>intermediate code</code> gets generated. It is also known as the IR code.</p>
</li>
<li><p>Then <code>optimization</code> is performed on the IR code.</p>
</li>
<li><p>Finally, the <code>machine code</code> is generated by the compiler from the optimized IR code.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724227359567/a3994e4c-9018-4b67-94be-8b5f403eceb9.png" alt="Compiler phases as described above" class="image--center mx-auto" width="1200" height="630" loading="lazy"></p>
<p>Now that you understand the basics of how a compiler works, let's learn about the <code>React Compiler</code> and understand how it works.</p>
<h2 id="heading-react-compiler-architecture">React Compiler Architecture</h2>
<p>React compiler is a build-time tool that you need to configure with your React 19 project explicitly using the configuration options provided by the React tools ecosystem.</p>
<p>For example, if you are using <code>Vite</code> to create your React application, the compiler configuration will take place in the <code>vite.config.js</code> file.</p>
<p>React compiler has three primary components:</p>
<ol>
<li><p><code>Babel Plugin</code><strong>:</strong> helps transform the code during the compilation process<strong>.</strong></p>
</li>
<li><p><code>ESLint Plugin</code><strong>:</strong> helps catch and report any violations of the Rules of React.</p>
</li>
<li><p><code>Compiler Core</code>: the core compiler logic that performs the code analysis and optimizations. Both Babel and ESLint plugins use the core compiler logic.</p>
</li>
</ol>
<p>The compilation flow goes like this:</p>
<ul>
<li><p>The <code>Babel Plugin</code> identifies which functions (components or hooks) to compile. We will see some configurations later to learn how to opt in and out of the compilation process. The plugin calls the core compiler logic for each of the functions and finally creates the Abstract Syntax Tree.</p>
</li>
<li><p>Then the compiler core converts the Babel AST into IR code, analyzes it, and runs various validations to ensure none of the rules are broken.</p>
</li>
<li><p>Next, it tries to reduce the amount of code to be optimized by performing various passes to eliminate dead code. The code gets further optimized using memoization.</p>
</li>
<li><p>Finally, in the code generation stage, the transformed AST is converted back to the optimized JavaScript code.</p>
</li>
</ul>
<h2 id="heading-react-compiler-in-action">React Compiler in Action</h2>
<p>Now that you know how React Compiler works, let's now dive into configuring it with a React 19 project so you can start learning about the various optimizations.</p>
<h3 id="heading-understanding-the-problem-without-the-react-compiler">Understanding the problem: Without the React Compiler</h3>
<p>Let's create a simple product page with React. The product page shows a heading with the number of products on the page, a list of products, and the featured products.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724240252940/bd5118d1-2819-4119-ac96-57e267742432.png" alt="The Product Page" class="image--center mx-auto" width="744" height="914" loading="lazy"></p>
<p>The component hierarchy and the data passing between the components look like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724240027326/0a8a653d-9c6a-43ff-9457-81dde019e56e.png" alt="Product Page Component Hierarchy" class="image--center mx-auto" width="1456" height="976" loading="lazy"></p>
<p>As you can see in the image above,</p>
<ul>
<li><p>The <code>ProductPage</code> component has three child components, <code>Heading</code>, <code>ProductList</code>, and <code>FeaturedProducts</code>.</p>
</li>
<li><p>The <code>ProductPage</code> component receives two props, <code>products</code> and the <code>heading</code>.</p>
</li>
<li><p>The <code>ProductPage</code> component computes the total number of products and passes the value along with the heading text value to the <code>Heading</code> component.</p>
</li>
<li><p>The <code>ProductPage</code> component passes down the <code>products</code> prop to the <code>ProductList</code> child component.</p>
</li>
<li><p>Similarly, it computes the featured products and passes the <code>featuredProducts</code> prop to the <code>FeaturedProducts</code> child component.</p>
</li>
</ul>
<p>Here is how the source code of the <code>ProductPage</code> component may look:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> React <span class="hljs-keyword">from</span> <span class="hljs-string">'react'</span>

<span class="hljs-keyword">import</span> Heading <span class="hljs-keyword">from</span> <span class="hljs-string">'./Heading'</span>;
<span class="hljs-keyword">import</span> FeaturedProducts <span class="hljs-keyword">from</span> <span class="hljs-string">'./FeaturedProducts'</span>;
<span class="hljs-keyword">import</span> ProductList <span class="hljs-keyword">from</span> <span class="hljs-string">'./ProductList'</span>;

<span class="hljs-keyword">const</span> ProductPage = <span class="hljs-function">(<span class="hljs-params">{products, heading}</span>) =&gt;</span> {
  <span class="hljs-keyword">const</span> featuredProducts = products.filter(<span class="hljs-function"><span class="hljs-params">product</span> =&gt;</span> product.featured);
  <span class="hljs-keyword">const</span> totalProducts = products.length;

  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"m-2"</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">Heading</span>
        <span class="hljs-attr">heading</span>=<span class="hljs-string">{heading}</span>
        <span class="hljs-attr">totalProducts</span>=<span class="hljs-string">{totalProducts}</span> /&gt;</span>

      <span class="hljs-tag">&lt;<span class="hljs-name">ProductList</span>
        <span class="hljs-attr">products</span>=<span class="hljs-string">{products}</span> /&gt;</span>

      <span class="hljs-tag">&lt;<span class="hljs-name">FeaturedProducts</span>
        <span class="hljs-attr">featuredProducts</span>=<span class="hljs-string">{featuredProducts}</span> /&gt;</span>  

    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span></span>
  )
}

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> ProductPage
</code></pre>
<p>Also, assume we use the <code>ProductPage</code> component in the <code>App.js</code> file like this:</p>
<pre><code class="lang-javascript">
<span class="hljs-keyword">import</span> ProductPage <span class="hljs-keyword">from</span> <span class="hljs-string">"./components/compiler/ProductPage"</span>;

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">App</span>(<span class="hljs-params"></span>) </span>{

  <span class="hljs-comment">// A list of food products    </span>
  <span class="hljs-keyword">const</span> foodProducts = [
    {
      <span class="hljs-string">"id"</span>: <span class="hljs-string">"001"</span>,
      <span class="hljs-string">"name"</span>: <span class="hljs-string">"Hamburger"</span>,
      <span class="hljs-string">"image"</span>: <span class="hljs-string">"🍔"</span>,
      <span class="hljs-string">"featured"</span>: <span class="hljs-literal">true</span>
    },
    {
      <span class="hljs-string">"id"</span>: <span class="hljs-string">"002"</span>,
      <span class="hljs-string">"name"</span>: <span class="hljs-string">"French Fries"</span>,
      <span class="hljs-string">"image"</span>: <span class="hljs-string">"🍟"</span>,
      <span class="hljs-string">"featured"</span>: <span class="hljs-literal">false</span>
    },
    {
      <span class="hljs-string">"id"</span>: <span class="hljs-string">"003"</span>,
      <span class="hljs-string">"name"</span>: <span class="hljs-string">"Taco"</span>,
      <span class="hljs-string">"image"</span>: <span class="hljs-string">"🌮"</span>,
      <span class="hljs-string">"featured"</span>: <span class="hljs-literal">false</span>
    },
    {
      <span class="hljs-string">"id"</span>: <span class="hljs-string">"004"</span>,
      <span class="hljs-string">"name"</span>: <span class="hljs-string">"Hot Dog"</span>,
      <span class="hljs-string">"image"</span>: <span class="hljs-string">"🌭"</span>,
      <span class="hljs-string">"featured"</span>: <span class="hljs-literal">true</span>
    }
  ];

  <span class="hljs-keyword">return</span> (
      <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">ProductPage</span> 
            <span class="hljs-attr">products</span>=<span class="hljs-string">{foodProducts}</span> 
            <span class="hljs-attr">heading</span>=<span class="hljs-string">"The Food Product"</span> /&gt;</span></span>
  );
}

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> App;
</code></pre>
<p>That's all good – so where is the problem? The problem is that React proactively re-renders the child component when the parent component re-renders. An unnecessary rendering requires optimizations. Let's understand the problem fully first.</p>
<p>We'll add the current timestamp in each of the child components. Now the rendered user interface will look like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724241332454/5debcdce-0349-40a3-916f-78e479668c12.png" alt="With timestamp" class="image--center mx-auto" width="1374" height="926" loading="lazy"></p>
<p>The big number you see beside the headings is the timestamp (using the simple <code>Date.now()</code> function from the JavaScript Date API) we have added to the component code. Now what happens if we change the value of the heading prop of the <code>ProductPage</code> component?</p>
<p>Before:</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">ProductPage</span> 
   <span class="hljs-attr">products</span>=<span class="hljs-string">{foodProducts}</span> 
   <span class="hljs-attr">heading</span>=<span class="hljs-string">"The Food Product"</span> /&gt;</span>
</code></pre>
<p>And after (notice that we have made it plural for products by adding an <code>s</code> at the end of the <code>heading</code> value):</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">ProductPage</span> 
   <span class="hljs-attr">products</span>=<span class="hljs-string">{foodProducts}</span> 
   <span class="hljs-attr">heading</span>=<span class="hljs-string">"The Food Products"</span> /&gt;</span>
</code></pre>
<p>Now you will notice an immediate change in the user interface. All three timestamps got updated. This is because all three components were re-rendered when the parent component was re-rendered due to the props change.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724242207319/b3f2aa7e-d387-4de4-a2e6-9491f5cf7996.png" alt="compiler diff" class="image--center mx-auto" width="1200" height="630" loading="lazy"></p>
<p>If you notice, the <code>heading</code> prop was passed only to the <code>Heading</code> component, and even then the other two child components re-rendered. This is where we need the optimizations.</p>
<h3 id="heading-fixing-the-problem-without-the-react-compiler">Fixing the Problem: Without the React Compiler</h3>
<p>As discussed before, React provides us with various hooks and APIs for <code>memoization</code>. We can use <code>React.memo()</code> or <code>useMemo()</code> to safeguard the components that are re-rendering unnecessarily.</p>
<p>For example, we can use <code>React.memo()</code> to memoize the ProductList component to ensure that unless the <code>products</code> prop is changed, the <code>ProductList</code> component will not be re-rendered.</p>
<p>We can use the <code>useMemo()</code> hook to memoize the computation for the featured products. Both implementations are indicated in the image below.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724242889553/ec0d54fc-8c50-4fef-a4ea-e8c5951da9ad.png" alt="Applying memoization" class="image--center mx-auto" width="2462" height="1318" loading="lazy"></p>
<p>But again, recollecting the wise words of great Uncle Ben, over the last few years we have started over-using these optimization techniques. These over-optimizations can negatively impact the performance of your applications. So, the availability of the compiler is a boon for React developers as it lets them delegate many such optimizations to the compiler.</p>
<p>Let's now fix the problem using the React compiler.</p>
<h3 id="heading-fixing-the-problem-using-the-react-compiler">Fixing the problem: Using the React Compiler</h3>
<p>Again, React compiler is an opt-in build-time tool. It doesn't come bundled with React 19 RC. You need to install the required dependencies and configure the compiler with your React 19 project.</p>
<p>Before configuring the compiler, you can check if your codebase is compatible by executing this command on your project directory:</p>
<pre><code class="lang-bash">npx react-compiler-healthcheck@experimental
</code></pre>
<p>It will check and report:</p>
<ul>
<li><p>How many components can be optimized by the compiler</p>
</li>
<li><p>If the Rules of React are followed.</p>
</li>
<li><p>If there are any incompatible libraries.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724300204675/d7866215-5cda-4a64-b0d6-ecedb100a428.png" alt="d7866215-5cda-4a64-b0d6-ecedb100a428" class="image--center mx-auto" width="1832" height="448" loading="lazy"></p>
<p>If you find that things are compatible, it's time to install the ESLint plugin powered by the React compiler. This plugin will help you catch any violation of the rules of React in your code. Violating code will be skipped by the React compiler and no optimizations will be performed on it.</p>
<pre><code class="lang-bash">npm install eslint-plugin-react-compiler@experimental
</code></pre>
<p>Then open the ESLint configuration file (for example, <code>.eslintrc.cjs</code> for Vite) and add these configurations:</p>
<pre><code class="lang-javascript"><span class="hljs-built_in">module</span>.exports = {
  <span class="hljs-attr">plugins</span>: [
    <span class="hljs-string">'eslint-plugin-react-compiler'</span>,
  ],
  <span class="hljs-attr">rules</span>: {
    <span class="hljs-string">'react-compiler/react-compiler'</span>: <span class="hljs-string">"error"</span>,
  },
}
</code></pre>
<p>Next, you'll use the Babel plugin for the React compiler to enable the compiler for your entire project. If you're starting a new project with React 19, I recommend that you enable the React compiler for the entire project. Let's install the Babel plugin for the React compiler:</p>
<pre><code class="lang-bash">npm install babel-plugin-react-compiler@experimental
</code></pre>
<p>Once installed, you need to complete the configuration by adding the options in the Babel config file. As we're using Vite, open the <code>vite.config.js</code> file and replace the content with the following code snippet:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> { defineConfig } <span class="hljs-keyword">from</span> <span class="hljs-string">'vite'</span>
<span class="hljs-keyword">import</span> react <span class="hljs-keyword">from</span> <span class="hljs-string">'@vitejs/plugin-react'</span>

<span class="hljs-keyword">const</span> ReactCompilerConfig = {<span class="hljs-comment">/* ... */</span> };

<span class="hljs-comment">// https://vitejs.dev/config/</span>
<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> defineConfig({
  <span class="hljs-attr">plugins</span>: [react({
    <span class="hljs-attr">babel</span>: {
      <span class="hljs-attr">plugins</span>: [
        [
          <span class="hljs-string">"babel-plugin-react-compiler"</span>,
           ReactCompilerConfig
          ]
        ],
    },
  })],
})
</code></pre>
<p>Here, you've added the <code>babel-plugin-react-compiler</code> to the configuration. The <code>ReactCompilerConfig</code> is required to provide any advanced configuration like if you want to provide any custom runtime module or any other configurations. In this case, it's an empty object without any advanced configurations.</p>
<p>That's it. You are done configuring the React compiler with your code base to utilize its power. From now on, the React compiler will look into every component and hook in your project to try and apply optimizations to it.</p>
<p>If you want to configure the React compiler with Next.js, Remix, Webpack, and so on, you can <a target="_blank" href="https://react.dev/learn/react-compiler#installation">follow this guide</a>.</p>
<h3 id="heading-optimized-react-app-with-react-compiler">Optimized React App with React Compiler</h3>
<p>Now you should have an optimized React app with the inclusion of the React compiler. So, let's run the same tests you did before. Again, change the value of the <code>heading</code> prop of the <code>ProductPage</code> component.</p>
<p>This time, you will not see the child components re-rendering. So the timestamp will not be updated either. But you will see the portion of the component where the data changed, as it will reflect the changes alone. Also, you won't have to use <code>memo</code>, <code>useMemo()</code>, or <code>useCallback()</code> in your code anymore.</p>
<p>You can see it working visually <a target="_blank" href="https://youtu.be/bdWUVp0TbTU?t=1326">from here</a>.</p>
<h2 id="heading-react-compiler-in-react-devtools">React Compiler in React DevTools</h2>
<p><a target="_blank" href="https://react.dev/learn/react-developer-tools">React DevTools</a> version 5.0+ has built-in support for the React compiler. You will see a badge with the text <code>Memo ✨</code> beside the components optimized by the React compiler. This is fantastic!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724303700810/2888b91c-bcec-4da2-88a6-840c51876d83.png" alt="React DevTools" class="image--center mx-auto" width="2412" height="1302" loading="lazy"></p>
<h2 id="heading-diving-deep-how-does-the-react-compiler-work">Diving Deep – How Does the React Compiler Work?</h2>
<p>Now that you've seen how the React compiler works on React 19 code, let's deep dive into understanding what's happening in the background. We will use the React <a target="_blank" href="https://playground.react.dev/">Compiler Playground</a> to explore the translated code and the optimization steps.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724740109843/a5047d83-4407-491f-8e11-6522c1381313.png" alt="React Compiler Playground" class="image--center mx-auto" width="2998" height="1394" loading="lazy"></p>
<p>We'll use the <code>Heading</code> component as an example. Copy and paste the following code inside the leftmost section of the playground:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> Heading = <span class="hljs-function">(<span class="hljs-params">{ heading, totalProducts }</span>) =&gt;</span> {
  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">nav</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">h1</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"text-2xl"</span>&gt;</span>
          {heading}({totalProducts}) - {Date.now()}
      <span class="hljs-tag">&lt;/<span class="hljs-name">h1</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">nav</span>&gt;</span></span>
  )
}
</code></pre>
<p>You will see that some JavaScript code is generated immediately inside the <code>_JS</code> tab of the playground. The React compiler generates this JavaScript code as part of the compilation process. Let's go over it step-by-step:</p>
<pre><code class="lang-javascript"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">anonymous_0</span>(<span class="hljs-params">t0</span>) </span>{
  <span class="hljs-keyword">const</span> $ = _c(<span class="hljs-number">4</span>);
  <span class="hljs-keyword">const</span> { heading, totalProducts } = t0;
  <span class="hljs-keyword">let</span> t1;
  <span class="hljs-keyword">if</span> ($[<span class="hljs-number">0</span>] === <span class="hljs-built_in">Symbol</span>.for(<span class="hljs-string">"react.memo_cache_sentinel"</span>)) {
    t1 = <span class="hljs-built_in">Date</span>.now();
    $[<span class="hljs-number">0</span>] = t1;
  } <span class="hljs-keyword">else</span> {
    t1 = $[<span class="hljs-number">0</span>];
  }
  <span class="hljs-keyword">let</span> t2;
  <span class="hljs-keyword">if</span> ($[<span class="hljs-number">1</span>] !== heading || $[<span class="hljs-number">2</span>] !== totalProducts) {
    t2 = (
      <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">nav</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">h1</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"text-2xl"</span>&gt;</span>
          {heading}({totalProducts}) - {t1}
        <span class="hljs-tag">&lt;/<span class="hljs-name">h1</span>&gt;</span>
      <span class="hljs-tag">&lt;/<span class="hljs-name">nav</span>&gt;</span></span>
    );
    $[<span class="hljs-number">1</span>] = heading;
    $[<span class="hljs-number">2</span>] = totalProducts;
    $[<span class="hljs-number">3</span>] = t2;
  } <span class="hljs-keyword">else</span> {
    t2 = $[<span class="hljs-number">3</span>];
  }
  <span class="hljs-keyword">return</span> t2;
}
</code></pre>
<p>The compiler uses a hook called <code>_c()</code> to create an array of items to cache. In the code above, an array of four elements has been created to cache four items.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> $ = _c(<span class="hljs-number">4</span>);
</code></pre>
<p>But, what are the things to cache?</p>
<ul>
<li><p>The component takes two props, <code>heading</code> and <code>totalProducts</code>. The compiler needs to cache them. So, it needs two elements in the array of cacheable items.</p>
</li>
<li><p>The <code>Date.now()</code> part in the header should be cached.</p>
</li>
<li><p>The JSX itself should be cached. There is no point in computing JSX unless either of the above changes.</p>
</li>
</ul>
<p>So there are a total of four items to cache.</p>
<p>The compiler creates memoization blocks using the <code>if-block</code>. The final return value from the compiler is the JSX which depends on three dependencies:</p>
<ul>
<li><p>The <code>Date.now()</code> value.</p>
</li>
<li><p>Two props, a <code>heading</code> and <code>totalProducts</code></p>
</li>
</ul>
<p>The output JSX needs re-computation when any of the above changes. This means that the compiler needs to create two memoization blocks for each of the above.</p>
<p>The first memoization block looks like this:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">if</span> ($[<span class="hljs-number">0</span>] === <span class="hljs-built_in">Symbol</span>.for(<span class="hljs-string">"react.memo_cache_sentinel"</span>)) {
    t1 = <span class="hljs-built_in">Date</span>.now();
    $[<span class="hljs-number">0</span>] = t1;
} <span class="hljs-keyword">else</span> {
    t1 = $[<span class="hljs-number">0</span>];
}
</code></pre>
<p>The if-block stores the value of the Date.now() into the first index of the cacheable array. It re-uses the same every time unless it is changed.</p>
<p>Similarly, in the second memoization block:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">if</span> ($[<span class="hljs-number">1</span>] !== heading || $[<span class="hljs-number">2</span>] !== totalProducts) {
    t2 = (
      <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">nav</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">h1</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"text-2xl"</span>&gt;</span>
          {heading}({totalProducts}) - {t1}
        <span class="hljs-tag">&lt;/<span class="hljs-name">h1</span>&gt;</span>
      <span class="hljs-tag">&lt;/<span class="hljs-name">nav</span>&gt;</span></span>
    );
    $[<span class="hljs-number">1</span>] = heading;
    $[<span class="hljs-number">2</span>] = totalProducts;
    $[<span class="hljs-number">3</span>] = t2;
  } <span class="hljs-keyword">else</span> {
    t2 = $[<span class="hljs-number">3</span>];
  }
</code></pre>
<p>Here, the check is for the value changes for either <code>heading</code> or <code>totalProducts</code> props. If either of these changes, the JSX needs to be recomputed. All the values are then stored in the cacheable array. If there are no changes in the value, the previously computed JSX is returned from the cache.</p>
<p>You can now paste any other component source code into the left side and look into the generated JavaScript code to help you understand what's going on as we did above. This will help you to get a better grip on how the compiler performs the memoization techniques in the compilation process.</p>
<h2 id="heading-how-do-you-opt-in-and-out-of-the-react-compiler">How Do You Opt in and Out of the React Compiler?</h2>
<p>Once you've configured the React compiler the way we have done with our Vite project here, it's enabled for all the compilers and hooks of the project.</p>
<p>But in some cases, you may want to selectively opt-in for the React compiler. In that case, you can run the compiler in “opt-in” mode using the <code>compilationMode: "annotation"</code> option.</p>
<pre><code class="lang-javascript"><span class="hljs-comment">// Specify the option in the ReactCompilerConfig</span>
<span class="hljs-keyword">const</span> ReactCompilerConfig = {
  <span class="hljs-attr">compilationMode</span>: <span class="hljs-string">"annotation"</span>,
};
</code></pre>
<p>Then annotate the components and hooks you want to opt-in for compilation with the <code>"use memo"</code> directive.</p>
<pre><code class="lang-javascript"><span class="hljs-comment">// src/ProductPage.jsx</span>
<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">ProductPage</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-string">"use memo"</span>;
  <span class="hljs-comment">// ...</span>
}
</code></pre>
<p>Note that there is a <code>"use no memo"</code> directive as well. There might be some rare cases where your component may not be working as expected after compilation, and you want to opt out of the compilation temporarily until the issue is identified and fixed. In that case, you can use this directive:</p>
<pre><code class="lang-javascript"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">AComponent</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-string">"use no memo"</span>;
  <span class="hljs-comment">// ...</span>
}
</code></pre>
<h2 id="heading-can-we-use-the-react-compiler-with-react-18x">Can We Use the React Compiler with React 18.x?</h2>
<p>It is recommended to use the React compiler with React 19 as there are required compatibilities. If you can't upgrade your application to React 19, you'll need to have a custom implementation of the cache function. You can go over <a target="_blank" href="https://github.com/reactwg/react-compiler/discussions/6">this thread</a> describing the workaround.</p>
<h3 id="heading-repositories-to-look-into">Repositories to Look Into</h3>
<ul>
<li><p>All the source code used in this article is <a target="_blank" href="https://github.com/tapascript/react-compiler-lesson">in this repository</a>.</p>
</li>
<li><p>If you want to start coding with React 19 and its features, <a target="_blank" href="https://github.com/atapas/code-in-react-19">here is a template repository</a> configured with React 19 RC, Vite, and TailwindCSS. You may want to try it out.</p>
</li>
</ul>
<h2 id="heading-whats-next">What's Next?</h2>
<p>To learn further,</p>
<ul>
<li><p>Check out the official documentation of React Compiler <a target="_blank" href="https://react.dev/learn/react-compiler">from here</a>.</p>
</li>
<li><p>Check out the <a target="_blank" href="https://github.com/reactwg/react-compiler/discussions">discussions</a> in the Working Group.</p>
</li>
</ul>
<p>Up next, if you are willing to learn <code>React</code> and its ecosystem-like <code>Next.js</code> with both fundamental concepts and projects, I have great news for you: you can <a target="_blank" href="https://www.youtube.com/watch?v=VSB2h7mVhPg&amp;list=PLIJrr73KDmRwz_7QUvQ9Az82aDM9I8L_8">check out this playlist on my YouTube</a> channel with 22+ video tutorials and 12+ hours of engaging content so far, for free. I hope you like them as well.</p>
<p>That's all for now. Did you enjoy reading this article and have you learned something new? If so, I would love to know if the content was helpful.</p>
<ul>
<li><p>Subscribe to my <a target="_blank" href="https://www.youtube.com/tapasadhikary?sub_confirmation=1">YouTube Channel</a>.</p>
</li>
<li><p><a target="_blank" href="https://twitter.com/tapasadhikary">Follow me on X (Twitter</a>) or <a target="_blank" href="https://www.linkedin.com/in/tapasadhikary/">LinkedIn</a> if you don't want to miss the daily dose of up-skilling tips.</p>
</li>
<li><p>Check out and follow my Open Source work on <a target="_blank" href="https://github.com/atapas">GitHub</a>.</p>
</li>
<li><p>I regularly publish meaningful posts on my <a target="_blank" href="https://blog.greenroots.info/">GreenRoots Blog</a>, you may find them helpful, too.</p>
</li>
</ul>
<p>See you soon with my next article. Until then, please take care of yourself, and keep learning.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Practice Your Coding Skills by Building a Program in Different Ways ]]>
                </title>
                <description>
                    <![CDATA[ While we have 365 days in other years, this year (2024) is special because it has one ‘extra’ day.  So in the spirit of Leap Day, let's practice some coding to understand various aspects of programming. We'll focus on the same program but from differ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/practice-coding-skills-by-building-a-program-different-ways/</link>
                <guid isPermaLink="false">66c37466159a4cde589f8ce0</guid>
                
                    <category>
                        <![CDATA[ learning to code ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Niladri S. Jyoti ]]>
                </dc:creator>
                <pubDate>Mon, 04 Mar 2024 15:39:55 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/03/Build-A-Leap-Year-Program-in-Many-Different-Ways-1.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>While we have 365 days in other years, this year (2024) is special because it has one ‘extra’ day. </p>
<p>So in the spirit of Leap Day, let's practice some coding to understand various aspects of programming. We'll focus on the same program but from different perspectives. </p>
<p>Our example program will explore different ways you can code a program that determines whether a given year is a leap year. On other days, we code. But today, let’s decode what we do and get some extra knowledge out of that process.</p>
<h3 id="heading-table-of-contents">Table of Contents</h3>
<ul>
    <li><a href="#program-requirements">Program Requirements &amp; Prerequisites</a></li>
    <li><a href="#logical-approaches">Logical Approaches to Solving the Problem</a></li>
    <ul>
        <li><a href="#naive-approach">My Naïve Approach</a></li>
        <li><a href="#single-return">Reassignments and a Single Return Statement</a></li>
        <li><a href="#switch-case">Switching to Switch-Case from If-Else</a></li>
        <li><a href="#logical-deduction">Logical Deduction &amp; Subsets for Better Structure</a></li>
        <li><a href="#combine-conditions">Logical Operators Combining All True Conditions</a></li>
        <li><a href="#ternary-operator">Applying Nitro with the Ternary Operator</a></li>
        <li><a href="#arrow-function">Making it a Single Line Arrow Function</a></li>
    </ul>
    <li><a href="#programming-paradigm">Paradigm Shift: Declarative Programming</a></li>
    <ul>
        <li><a href="#side-effects">Functions with Side Effects</a></li>
        <li><a href="#functional-programming">More About Functional Programming</a></li>
        <li><a href="#short-circuiting">Side-Tracking: Short-Circuiting!</a></li>
        <li><a href="#declarative-programming">Encapsulation and Declarative Programming</a></li>
    </ul>
    <li><a href="#code-quality">Going Above &amp; Beyond with Code Quality</a></li>
    <ul>
        <li><a href="#validations">Validations: Beyond the Basic Specifications</a></li>
        <li><a href="#unit-testing">Testing it Out From the Outside</a></li>
    </ul>
    <li><a href="#end-note">End Note</a></li>
</ul>

<h2 id="program-requirements">Program Requirements &amp; Prerequisites</h2>

First, let’s discuss the requirements and set the specifications. The program should be able to get a year (expects a number, an integer to be specific) as an argument and returns either true or false (a boolean) depending on if it is a leap year or not. 

Through the examples, we will focus on the program logic (semantics) rather than the language (syntax). 

Over the years, I have used JavaScript most frequently so we'll use this language for the project. If you use a different language, no worries because many concepts are common between programming languages. For example, in this article, we would use arrow function which is similar to lambda function used in some other programming languages, such as Python.

So, as prerequisites, you should have a basic knowledge of programming and should be comfortable with the concepts of functions (different ways to define and call functions, return values, and so on) and conditional logic (if-else, switch-case, and so on). That would be enough to follow along, for the most part, if you want to read and try the code for yourself.

Just in the last bit, we also do unit testing of our code. If you aren't familiar with unit testing, here is a good refresher on <a target="_blank" href="https://dev.to/dstrekelj/how-to-write-unit-tests-in-javascript-with-jest-2e83">how to write unit tests in JavaScript with Jest</a>. 

<h2 id="logical-approaches">Logical Approaches to Solving the Problem</h2>

<h3 id="naive-approach">My Naïve Approach</h3>

<p>This is based on the pedagogical style of determining a leap year that I learned as a kid who knew how to divide numbers. If a year ( the number representing it) is divisible by 4, it is generally a leap year. But not always. When that year ends with two zeroes (meaning when the number is divisible by 100), it must also be divisible by 400 to be a leap year.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/02/FlowChart-LeapYear.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>How to determine if a year is a leap year - as described above</em></p>
<p>As a beginner programmer, my thoughts flowed like you can see in the above flowchart. As a result, I converted that logic into my program like so:</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">isLeapYear</span>(<span class="hljs-params">year</span>) </span>{
  <span class="hljs-keyword">if</span> (year % <span class="hljs-number">4</span> == <span class="hljs-number">0</span>) {
      <span class="hljs-keyword">if</span> (year % <span class="hljs-number">100</span> == <span class="hljs-number">0</span>) {
          <span class="hljs-keyword">if</span> (year % <span class="hljs-number">400</span> == <span class="hljs-number">0</span>) {
              <span class="hljs-keyword">return</span> <span class="hljs-literal">true</span>;
          } <span class="hljs-keyword">else</span> {
              <span class="hljs-keyword">return</span> <span class="hljs-literal">false</span>;
          }
      } <span class="hljs-keyword">else</span> {
          <span class="hljs-keyword">return</span> <span class="hljs-literal">true</span>;
      }
  } <span class="hljs-keyword">else</span> {
   <span class="hljs-keyword">return</span> <span class="hljs-literal">false</span>
  }
}

<span class="hljs-comment">// Example usage:</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2024</span>)); <span class="hljs-comment">// Output: true</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2023</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">1900</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2000</span>)); <span class="hljs-comment">// Output: true</span>
</code></pre>
<p>This makes the program easily understandable. But with time, as I have moved farther in my programming journey, this type of code looks ugly because of so many nested conditional checks. It's not bad, but because of the nested levels, my brain has to work extra hard to get the logic from the code snapshot quickly.</p>
<h3 id="single-return">Reassignments and a Single Return Statement</h3>

<p>To avoid nested loops, many programmers follow the strategy of consecutive if conditions, avoiding the else conditions (like how Kyle Cook of Web Dev Simplified shows in this <a target="_blank" href="https://www.youtube.com/watch?v=EumXak7TyQ0">video with examples</a>). It definitely improves readability. </p>
<p>Also, it lets us use only one return statement at the end while reassigning the returnable value. Let's not discuss it too much more when you can better see the code itself:</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">isLeapYear</span>(<span class="hljs-params">year</span>) </span>{
  <span class="hljs-keyword">let</span> isLeap = <span class="hljs-literal">false</span>;
  <span class="hljs-keyword">if</span> (year % <span class="hljs-number">4</span> == <span class="hljs-number">0</span>) {
      isLeap = <span class="hljs-literal">true</span>;
  }
  <span class="hljs-keyword">if</span> (year % <span class="hljs-number">100</span> == <span class="hljs-number">0</span>) {
      isLeap = <span class="hljs-literal">false</span>;
  }
  <span class="hljs-keyword">if</span> (year % <span class="hljs-number">400</span> == <span class="hljs-number">0</span>) {
      isLeap = <span class="hljs-literal">true</span>;
  }
  <span class="hljs-keyword">return</span> isLeap;
}

<span class="hljs-comment">// Example usage:</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2024</span>)); <span class="hljs-comment">// Output: true</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2023</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">1900</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2000</span>)); <span class="hljs-comment">// Output: true</span>
</code></pre>
<p>The above code looks shorter and quicker to interpret. But it does affect the efficiency of the code, as now you have to go through all of the if conditions in all cases. </p>
<p>In contrast, in our previous naïve approach, due to the if-else construct, if a year is not divisible by 4 (like the year 2023), it would just be checked against one if condition. It’s true, of course, that for small programs such as this one, you don’t have to be overly concerned with efficiency.</p>
<p>The pitfall in this approach, though, is that you need to be cautious to apply all the if conditions one after another — using ‘else if’ would create trouble, as that would skip some if condition checks if the previous if condition test passed.</p>
<p>Another important fact is that the order matters. Since you started with the more generic cases of years not being a leap year (that is, let isLeap = false;), you have to go from relatively generic to relatively more specific cases. </p>
<p>So if, out of your three condition checks, the check of divisibility by 4 comes at the end, it would make ‘isLeap’ true even for years that are divisible by 100 but not divisible by 400 (like years 1700, 1800, 1900, and so on). </p>
<p>The same logical error would occur if you interchange the order of divisibility checks involving 100 and 400.</p>
<p>One last point I must mention is that some beginner programmers may think that you can not use multiple return statements and you must return only once in a program (and that you can do reassignments until that point). But experienced programmers can only call that notion a beginners’ myth!</p>
<h3 id="switch-case">Switching to Switch-Case from If-Else</h3>

<p>While the if-else structure is used to choose between two options, you can also use switch-case to choose one from multiple options. You can compare it to nested if-else blocks (as in the first approach) or a series of if blocks (as in the second approach). </p>
<p>The benefit of the switch-case structure is that it is more efficient because it can find the matching success criteria in one go. </p>
<p>Note that there is one quirky thing with switch-case. When using switch-case, once a case is matched, all subsequent cases will also execute unless you are using break statements. So, the following program will not be correct even if it looks very similar to our previous version of the code.</p>
<p><strong>Incorrect code: to show problems with missing break statements </strong></p>
<p>```js example-bad
function isLeapYear(year) {
  let isLeap = false;
  switch (true) {
    case year % 4 == 0:
      isLeap = true;
    case year % 100 == 0:
      isLeap = false;
    case year % 400 == 0:
      isLeap = true;
  }
  return isLeap;
}</p>
<pre><code>
If we must use a <span class="hljs-keyword">switch</span>-<span class="hljs-keyword">case</span> structure, we need to use <span class="hljs-keyword">break</span> statements. We also need to go <span class="hljs-keyword">from</span> specific cases first to generic cases next. While not all <span class="hljs-keyword">if</span>-<span class="hljs-keyword">else</span> logic can be converted into a <span class="hljs-keyword">switch</span>-<span class="hljs-keyword">case</span> logic, we can successfully convert the previous <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">like</span> <span class="hljs-title">so</span>:

```<span class="hljs-title">js</span>
<span class="hljs-title">function</span> <span class="hljs-title">isLeapYear</span>(<span class="hljs-params">year</span>) </span>{
  <span class="hljs-keyword">let</span> isLeap = <span class="hljs-literal">false</span>;
  <span class="hljs-keyword">switch</span> (<span class="hljs-literal">true</span>) {
    <span class="hljs-keyword">case</span> year % <span class="hljs-number">400</span> == <span class="hljs-number">0</span>:
      isLeap = <span class="hljs-literal">true</span>;
      <span class="hljs-keyword">break</span>;
    <span class="hljs-keyword">case</span> year % <span class="hljs-number">100</span> == <span class="hljs-number">0</span>:
      isLeap = <span class="hljs-literal">false</span>;
      <span class="hljs-keyword">break</span>;
    <span class="hljs-keyword">case</span> year % <span class="hljs-number">4</span> == <span class="hljs-number">0</span>:
      isLeap = <span class="hljs-literal">true</span>;
      <span class="hljs-keyword">break</span>;
  }
  <span class="hljs-keyword">return</span> isLeap;
}

<span class="hljs-comment">// Example usage:</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2024</span>)); <span class="hljs-comment">// Output: true</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2023</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">1900</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2000</span>)); <span class="hljs-comment">// Output: true</span>
</code></pre><p>Notice that in the above, we don't have a 'default' case. And this is because we have initialized the isLeap variable with false. Had we just declared the variable without initialization with a value, we could've written a default case which would assign the value false to isLeap.</p>
<p>Also, the above version of switch-case code is slightly longer because we wanted to use one return statement in the end and used assignments until then. But if we refactor it, a shorter and more organized code would be this:   </p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">isLeapYear</span>(<span class="hljs-params">year</span>) </span>{
  <span class="hljs-keyword">switch</span> (<span class="hljs-literal">true</span>) {
    <span class="hljs-keyword">case</span> (year % <span class="hljs-number">400</span> === <span class="hljs-number">0</span>):
      <span class="hljs-keyword">return</span> <span class="hljs-literal">true</span>;
    <span class="hljs-keyword">case</span> (year % <span class="hljs-number">100</span> === <span class="hljs-number">0</span>):
      <span class="hljs-keyword">return</span> <span class="hljs-literal">false</span>;
    <span class="hljs-keyword">case</span> (year % <span class="hljs-number">4</span> === <span class="hljs-number">0</span>):
      <span class="hljs-keyword">return</span> <span class="hljs-literal">true</span>;
    <span class="hljs-keyword">default</span>:
      <span class="hljs-keyword">return</span> <span class="hljs-literal">false</span>;
  }
}

<span class="hljs-comment">// Example usage:</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2024</span>)); <span class="hljs-comment">// Output: true</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2023</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">1900</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2000</span>)); <span class="hljs-comment">// Output: true</span>
</code></pre>
<p>Notice that since execution of a return statement in a function automatically ends the function call, the program does not read lines that follow that statement. So, in this example, we don't have to use the break statements necessarily. </p>
<h3 id="logical-deduction">Logical Deduction &amp; Subsets for Better Structure</h3>

<p>Switching back from switch-case to if-else logic, let's do some logical deduction. In our previous if-else logic, we went from generic cases to specific cases. What if we go in reverse order? We consider that a given year will be a leap year unless negated. </p>
<p>So, we start with the narrower cases of centenary years — for them, the rule is simple: to be negated, they need to be divisible by 100 but not by 400 (like years such as 1700, 1800, 1900). </p>
<p>In this process, since we've already accepted years like 2000 (or years divisible by 400) to be a leap year, we won’t test them for divisibility by 4 (because a number divisible by 400 would anyway be divisible by 4 as well). </p>
<p>In the next step, as we consider only the non-centenary years, we would simply negate the cases where the year is not divisible by 4 (years like 2023, 1996, and so on). </p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">isLeapYear</span>(<span class="hljs-params">year</span>) </span>{
  <span class="hljs-keyword">let</span> isLeap = <span class="hljs-literal">true</span>;
  <span class="hljs-keyword">if</span> (year % <span class="hljs-number">100</span> == <span class="hljs-number">0</span> &amp;&amp; year % <span class="hljs-number">400</span> != <span class="hljs-number">0</span>) {
      isLeap = <span class="hljs-literal">false</span>;
  } <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (year % <span class="hljs-number">4</span> != <span class="hljs-number">0</span>) {
      isLeap = <span class="hljs-literal">false</span>;
  }
  <span class="hljs-keyword">return</span> isLeap;
}

<span class="hljs-comment">// Example usage:</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2024</span>)); <span class="hljs-comment">// Output: true</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2023</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">1900</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2000</span>)); <span class="hljs-comment">// Output: true</span>
</code></pre>
<p>Here you see, we first consider the centenary years and then non-centenary years — so they are mutually exclusive — and that’s why we use ‘else-if’ instead of if in the second conditional check. And in that process, we gain some efficiency over consecutive if blocks.</p>
<p>As this approach is about breaking the possible routes of being a leap year (or for that matter, not being a leap year) into subsets of years, depending upon how we break the possible years into subsets, we can construct the program alternatively as shown below:</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">isLeapYear</span>(<span class="hljs-params">year</span>) </span>{
  <span class="hljs-keyword">let</span> isLeap = <span class="hljs-literal">false</span>;
  <span class="hljs-keyword">if</span> (year % <span class="hljs-number">400</span> == <span class="hljs-number">0</span>) {
      isLeap = <span class="hljs-literal">true</span>;
  } <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (year % <span class="hljs-number">100</span> != <span class="hljs-number">0</span> &amp;&amp; year % <span class="hljs-number">4</span> == <span class="hljs-number">0</span>) {
      isLeap = <span class="hljs-literal">true</span>;
  }
  <span class="hljs-keyword">return</span> isLeap;
}

<span class="hljs-comment">// Example usage:</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2024</span>)); <span class="hljs-comment">// Output: true</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2023</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">1900</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2000</span>)); <span class="hljs-comment">// Output: true</span>
</code></pre>
<p>So, in brief, our deduction from the leap year rule is that  years divisible by 400 (like 1600, 2000) are leap years, and out of all the other years they must be divisible by 4 but not divisible by 100 to be a leap year.</p>
<p>In taking this approach, we have combined conditions and that’s why we involved logical operators (&amp;&amp;, the logical AND operator). This has helped us reduce the length of the function. Instead of three conditional blocks, we are currently using two blocks — an if block and then an else (where we further check the condition, and thus we call it else-if rather than just else).</p>
<p>But now that we are just using almost a single ‘if-else’ construct and we are also delving into logical operators, let's unleash more power from the logical operators in the following approach.</p>
<h3 id="combine-conditions">Logical Operators Combining All True Conditions</h3>

<p>This time let's just reorganize the logic from the previous approach (two subsets) to group all positive conditions together and then accept a year as a leap year. If that’s not met, then call it a non-leap year. </p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">isLeapYear</span>(<span class="hljs-params">year</span>) </span>{
    <span class="hljs-keyword">if</span> ((year % <span class="hljs-number">4</span> == <span class="hljs-number">0</span> &amp;&amp; year % <span class="hljs-number">100</span> != <span class="hljs-number">0</span>) || year % <span class="hljs-number">400</span> == <span class="hljs-number">0</span>) {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">true</span>;
    } <span class="hljs-keyword">else</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">false</span>;
    }
}

<span class="hljs-comment">// Example usage:</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2024</span>)); <span class="hljs-comment">// Output: true</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2023</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">1900</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2000</span>)); <span class="hljs-comment">// Output: true</span>
</code></pre>
<p>This one looks good because it increases readability by organizing the positive conditions together. The only cost we incur here is that the condition in the if block is longer. </p>
<p>But with logical operators, it looks visually shorter and not complex (at least to programmers habituated to combining logical operators like this).</p>
<p>Dissecting further, since in the previous approach we said we could break the subsets in two different ways, we can have two corresponding two versions for this approach as well. The second one is the following:</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">isLeapYear</span>(<span class="hljs-params">year</span>) </span>{
  <span class="hljs-keyword">if</span> ((year % <span class="hljs-number">100</span> == <span class="hljs-number">0</span> &amp;&amp; year % <span class="hljs-number">400</span> != <span class="hljs-number">0</span>) || year % <span class="hljs-number">4</span> != <span class="hljs-number">0</span>) {
      <span class="hljs-keyword">return</span> <span class="hljs-literal">false</span>;
  } <span class="hljs-keyword">else</span> {
      <span class="hljs-keyword">return</span> <span class="hljs-literal">true</span>;
  }
}

<span class="hljs-comment">// Example usage:</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2024</span>)); <span class="hljs-comment">// Output: true</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2023</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">1900</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2000</span>)); <span class="hljs-comment">// Output: true</span>
</code></pre>
<h3 id="ternary-operator">Applying Nitro with the Ternary Operator</h3>

<p>As you progress in your programming-learning journey, at some point or other, you must have been elated to discover the possibility of writing ultra-short programs. </p>
<p>While logical operators help us do that, to activate the ‘Nitro’ mode, we must use a Ternary Operator — which basically makes our if-else blocks a single line.</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">isLeapYear</span>(<span class="hljs-params">year</span>) </span>{
  <span class="hljs-keyword">return</span> ((year % <span class="hljs-number">4</span> == <span class="hljs-number">0</span> &amp;&amp; year % <span class="hljs-number">100</span> != <span class="hljs-number">0</span>) || year % <span class="hljs-number">400</span> == <span class="hljs-number">0</span>) ? <span class="hljs-literal">true</span> : <span class="hljs-literal">false</span>;
}

<span class="hljs-comment">// Example usage:</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2024</span>)); <span class="hljs-comment">// Output: true</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2023</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">1900</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2000</span>)); <span class="hljs-comment">// Output: true</span>
</code></pre>
<p>By now, as a pro programmer, you must be pitying your rookie self. You think of those times when you used to declare and initialize a variable with a default value first and then reassign it with the value you wanted to return, and finally return the value held by that variable. </p>
<p>It has been a long time since you shunned that practice, and you now return what you need to return, and don’t consume unnecessary memory space for useless variables.</p>
<h3 id="arrow-function">Making it a Single Line Arrow Function</h3>

<p>Now that you have been boosted with Nitro, your programming technique is advancing like an arrow, on a mission to tear away the remnants of ES5 and boldly fly into the post-ES6 world. So you welcome arrow functions with open arms.  </p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> isLeapYear = <span class="hljs-function"><span class="hljs-params">year</span> =&gt;</span> (year % <span class="hljs-number">4</span> === <span class="hljs-number">0</span> &amp;&amp; (year % <span class="hljs-number">100</span> !== <span class="hljs-number">0</span> || year % <span class="hljs-number">400</span> === <span class="hljs-number">0</span>));

<span class="hljs-comment">// Example usage:</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2024</span>)); <span class="hljs-comment">// Output: true</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2023</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">1900</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2000</span>)); <span class="hljs-comment">// Output: true</span>
</code></pre>
<p>Previously, you skipped variables, and you skipped ‘if-else’ blocks. And now, you can even skip the return statement thanks to the arrow function having a single statement in its body. You also skip the parentheses around your argument as it is a single argument.</p>
<p>While singing the saga of shorter code, a point must be made that the shorter code is not necessarily the better code. It all depends on your users of the code (people who might read it and possibly collaborate/improve upon it). </p>
<p>If you are working with experienced programmers, this level of concision is fine. Just make sure you don’t exceed the line width beyond a certain number of spaces (80 characters recommended) so you don't trouble your coworkers with the need to handle horizontal scrollbars. </p>
<p>But if you are working with team members with varying levels of experience, or you are a teacher working with learners, then you must be conscious of the readability of your code for everyone.</p>
<h2 id="programming-paradigm">Paradigm Shift: Declarative Programming</h2>

<p>Anyway, we have discussed the logic of determining the leap year in the above examples. But let’s now dissect further to find more nuances of programming. And in that process let's move from imperative programming (as we have used so far) towards declarative programming (which is the end goal in this section).</p>
<h3 id="side-effects">Functions with Side Effects</h3>

<p>Functions are said to have side effects when they modify non-local variables. In addition, a function that prints (logs) in the console is also considered a function with some side effects. That is because if a function does not have a side effect, a call to it can be replaced by its return value. </p>
<p>Functional Programming is a paradigm which dictates that our program should be like a pure function without side effects. A pure function means a function which always returns the same output given the same input. So, in its body, it depends on only the input parameter given from outside and no other global variable. Additionally, it should just return the output value without side effects or trying to modify anything outside its scope.</p>
<p>But consider the following variation of the program which does not specifically return any value representing the result. Instead, it logs the result as a statement (string) in the console. This is an example of a side effect. </p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">isLeapYear</span>(<span class="hljs-params">year</span>) </span>{
  <span class="hljs-keyword">if</span> ((year % <span class="hljs-number">4</span> == <span class="hljs-number">0</span> &amp;&amp; year % <span class="hljs-number">100</span> != <span class="hljs-number">0</span>) || year % <span class="hljs-number">400</span> == <span class="hljs-number">0</span>) {
      <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"leap year."</span>);
  } <span class="hljs-keyword">else</span> {
      <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"not leap year."</span>);
  }
}

<span class="hljs-comment">// Example usage:</span>
<span class="hljs-keyword">let</span> someValue = isLeapYear(<span class="hljs-number">2024</span>); <span class="hljs-comment">// Output: leap year.</span>
<span class="hljs-built_in">console</span>.log(someValue); <span class="hljs-comment">// Output: undefined</span>
</code></pre>
<p>Evidently, it does not follow the specification, as it needs to return a value of boolean type. A function can, of course, do both — printing and returning, like an alternative form of the above function. </p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">isLeapYear</span>(<span class="hljs-params">year</span>) </span>{
  <span class="hljs-keyword">if</span> ((year % <span class="hljs-number">4</span> == <span class="hljs-number">0</span> &amp;&amp; year % <span class="hljs-number">100</span> != <span class="hljs-number">0</span>) || year % <span class="hljs-number">400</span> == <span class="hljs-number">0</span>) {
      <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"leap year."</span>);
      <span class="hljs-keyword">return</span> <span class="hljs-literal">true</span>;
  } <span class="hljs-keyword">else</span> {
      <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"not leap year."</span>);
      <span class="hljs-keyword">return</span> <span class="hljs-literal">false</span>;
  }
}

<span class="hljs-comment">// Example usage:</span>
<span class="hljs-keyword">let</span> someValue = isLeapYear(<span class="hljs-number">2024</span>); <span class="hljs-comment">// Output: leap year.</span>
<span class="hljs-built_in">console</span>.log(someValue); <span class="hljs-comment">// Output: true</span>
</code></pre>
<p>But the mere fact that it is doing two things — returning a value and printing in the console —  is the problem. A function should be made to do one thing for proper reusability. The ‘isLeapYear’ function should just determine if a year is a leap year. If we need to print anything about it, let that onus of doing the side effects lie with some other logger function(s).</p>
<pre><code class="lang-js"><span class="hljs-comment">// pure function</span>

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">isLeapYear</span>(<span class="hljs-params">year</span>) </span>{
    <span class="hljs-keyword">if</span> ((year % <span class="hljs-number">4</span> == <span class="hljs-number">0</span> &amp;&amp; year % <span class="hljs-number">100</span> != <span class="hljs-number">0</span>) || year % <span class="hljs-number">400</span> == <span class="hljs-number">0</span>) {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">true</span>;
    } <span class="hljs-keyword">else</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">false</span>;
    }
}

<span class="hljs-comment">// functions with side effect</span>

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">simpleLeapYearLogger</span>(<span class="hljs-params">isLeap</span>) </span>{
    <span class="hljs-keyword">if</span> (isLeap) {
        <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Yes, a leap year!"</span>);
    } <span class="hljs-keyword">else</span> {
        <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Sorry, not a leap year."</span>);
    }
}

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">advancedLeapYearLogger</span>(<span class="hljs-params">year, isLeap</span>) </span>{
    <span class="hljs-keyword">if</span> (isLeap) {
        <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`The year <span class="hljs-subst">${year}</span> is a leap year!`</span>);
    } <span class="hljs-keyword">else</span> {
        <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`The year <span class="hljs-subst">${year}</span> is not a leap year!`</span>);
    }
}

<span class="hljs-comment">// Example usage:</span>
<span class="hljs-keyword">let</span> currYear = <span class="hljs-number">2024</span>;
<span class="hljs-keyword">let</span> check2024 = isLeapYear(currYear); <span class="hljs-comment">// No Output/Side Effect, just retuned value.</span>
simpleLeapYearLogger(check2024); <span class="hljs-comment">// Output: Yes, a leap year!</span>
advancedLeapYearLogger(currYear, check2024); <span class="hljs-comment">// Output: The year 2024 is a leap year!</span>
</code></pre>
<p>As you can see above, the function ‘isLeapYear’ is more reusable — with two different use cases in two separate logger functions. Also, had there been any mistake in the logic for the ‘isLeapYear’ function, it would have been easier to fix without touching the logger functions’ code. </p>
<p>Similarly, if you need to display the string logged in the console differently, you could modify the respective logger function without touching the leap year’s logic function. Thus, a function doing just one thing that it was supposed to do increases the reusability and maintainability of that function.</p>
<h3 id="functional-programming">More About Functional Programming</h3>

<p>In the above section, you have already ventured into the space of functional programming. And now is the time to delve deeper. </p>
<p>If I search the term ‘Functional Programming’ in Wikipedia, the first line states</p>
<blockquote>
<p>“functional programming is a programming paradigm where programs are constructed by applying and <a target="_blank" href="https://en.wikipedia.org/wiki/Function_composition_%28computer_science%29">composing</a> <a target="_blank" href="https://en.wikipedia.org/wiki/Function_%28computer_science%29">functions</a>.”</p>
</blockquote>
<p>The phrase ‘composing function’ means building complex functions from simple ones. In our example, the leap year function is quite simple already. But to showcase the mechanism of function composition, let's create it out of component functions.</p>
<pre><code class="lang-js"><span class="hljs-comment">// component function</span>
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">divisible</span>(<span class="hljs-params">dividend, divisor</span>) </span>{
    <span class="hljs-keyword">return</span> dividend % divisor == <span class="hljs-number">0</span>
}

<span class="hljs-comment">// composed function</span>
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">isLeapYear</span>(<span class="hljs-params">year</span>) </span>{
    <span class="hljs-keyword">let</span> isLeap = <span class="hljs-literal">false</span>;
    divisible(year, <span class="hljs-number">4</span>) &amp;&amp; (isLeap = <span class="hljs-literal">true</span>);
    divisible(year, <span class="hljs-number">100</span>) &amp;&amp; (isLeap = <span class="hljs-literal">false</span>);
    divisible(year, <span class="hljs-number">400</span>) &amp;&amp; (isLeap = <span class="hljs-literal">true</span>);
    <span class="hljs-keyword">return</span> isLeap;
}

<span class="hljs-comment">// Example usage:</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2024</span>)); <span class="hljs-comment">// Output: true</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2023</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">1900</span>)); <span class="hljs-comment">// Output: false</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2000</span>)); <span class="hljs-comment">// Output: true</span>
</code></pre>
<h3 id="short-circuiting">Side-Tracking: Short-Circuiting!</h3>

<p>Above, you are using a function to build another function — a component-based approach that you also follow in the React JavaScript-based front-end library.</p>
<p>But wait, before we go further into React, what is that ‘&amp;&amp;’ doing in those three lines in the 'isLeapYear' function when we are not using any if-else statements there? </p>
<p>Welcome to the short-circuit evaluation of logical operators. In that process, an expression stops being evaluated as soon as its outcome is determined. So if two sides contain a logical AND (&amp;&amp;) in between, if the first side is false, this makes the whole expression false – so it does not read (not execute) the second side. </p>
<p>But if the first side is evaluated to be true, it further reads (executes) the second side for evaluation. And in that process, it does that assignment on the right-hand side of &amp;&amp; in our example.</p>
<p>Similarly, the process when logical OR (||) is involved is such that if the left-hand side is evaluated as true, the whole expression is true (it needs one condition evaluated as true for || for the whole expression to be true). Then, the second side is ignored. The second side is read or executed only when the first side is evaluated as false.</p>
<p>You can use this kind of evaluation logic as a replacement for the ‘if’ condition checks. For more examples of how it works in different scenarios, read the section ‘Short-Circuiting of Logical Operators (&amp;&amp; and ||)’ in my blog post where I have discussed <a target="_blank" href="https://codenil.medium.com/javascript-operators-some-nuances-57300eb2c354">some nuances of JavaScript Operators</a>.</p>
<h3 id="declarative-programming">Encapsulation and Declarative Programming</h3>

<p>Returning to REACT and components, the idea of building composing functions or components is rooted in the need for encapsulation. With encapsulation, you can hide the complex details, like in a capsule, and use it repeatedly without bothering much about its underlying complexity. </p>
<p>Essentially, you just proclaim (declare) what you need rather than straining yourself with the workload and headache of how you can make it happen step-by-step with ‘do-this’ and ‘do-that’ type statements (imperatives). </p>
<p>That, briefly, is declarative programming for you.</p>
<h2 id="code-quality">Going Above &amp; Beyond with Code Quality </h2>

<p>So far, we have covered the logical structures and the programming paradigms, but now, let’s look at the third aspect: code quality.</p>
<h3 id="validations">Validations: Beyond the Basic Specifications</h3>

<p>The requirements that we laid out at first just considered valid inputs. What if the function is called with arguments that are not the ideal ones — like a non-number, or even if a number but a non-integer? </p>
<p>To address that, we can build validation logic. To build validation logic, you need to think about all the different ways in which the input value (the argument passed to your function) may not be workable for you. </p>
<p>If one of those non-workable ways does come along, you need to return something that makes more sense — you can not give a verdict like true or false in that case. You may return something more neutral (like undefined or null) to indicate that the function encountered an invalid entry.</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">isLeapYear</span>(<span class="hljs-params">year</span>) </span>{
  <span class="hljs-keyword">if</span> (<span class="hljs-keyword">typeof</span> year!=<span class="hljs-string">"number"</span> || year % <span class="hljs-number">1</span> != <span class="hljs-number">0</span> || year &lt;= <span class="hljs-number">0</span>) <span class="hljs-keyword">return</span> <span class="hljs-literal">undefined</span>;
  <span class="hljs-keyword">return</span> ((year % <span class="hljs-number">4</span> == <span class="hljs-number">0</span> &amp;&amp; year % <span class="hljs-number">100</span> != <span class="hljs-number">0</span>) || year % <span class="hljs-number">400</span> == <span class="hljs-number">0</span>) ? <span class="hljs-literal">true</span> : <span class="hljs-literal">false</span>;
}

<span class="hljs-comment">// Example usage:</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2024</span>)); <span class="hljs-comment">// Output: true</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-string">"TwentyTwentyFour"</span>)); <span class="hljs-comment">// Output: undefined</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2023.99</span>)); <span class="hljs-comment">// Output: undefined</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">0</span>)); <span class="hljs-comment">// Output: undefined</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">-1</span>)); <span class="hljs-comment">// Output: undefined</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-string">"2024"</span>)); <span class="hljs-comment">// Output: undefined</span>
</code></pre>
<p>But if you noticed carefully, in our leap year logic check, we have evaluated just ordinary equality (==) instead of strict equality (===). We can't reap the benefit of that for a string format entry for a year like "2024". </p>
<p>If our intention is to strictly accept a number, the kind of validation we wrote is fine, and it would then be even more proper to use ===. </p>
<p>But if, on the other hand, we want to accept values like "2024", we must enhance our validation logic like so:</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">isLeapYear</span>(<span class="hljs-params">year</span>) </span>{
  <span class="hljs-keyword">if</span> (<span class="hljs-built_in">isNaN</span>(<span class="hljs-built_in">Number</span>(year)) || year % <span class="hljs-number">1</span> != <span class="hljs-number">0</span> || year &lt;= <span class="hljs-number">0</span>) <span class="hljs-keyword">return</span> <span class="hljs-literal">undefined</span>;
  <span class="hljs-keyword">return</span> ((year % <span class="hljs-number">4</span> == <span class="hljs-number">0</span> &amp;&amp; year % <span class="hljs-number">100</span> != <span class="hljs-number">0</span>) || year % <span class="hljs-number">400</span> == <span class="hljs-number">0</span>) ? <span class="hljs-literal">true</span> : <span class="hljs-literal">false</span>;
}

<span class="hljs-comment">// Example usage:</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2024</span>)); <span class="hljs-comment">// Output: true</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-string">"TwentyTwentyFour"</span>)); <span class="hljs-comment">// Output: undefined</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">2023.99</span>)); <span class="hljs-comment">// Output: undefined</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">0</span>)); <span class="hljs-comment">// Output: undefined</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-number">-1</span>)); <span class="hljs-comment">// Output: undefined</span>
<span class="hljs-built_in">console</span>.log(isLeapYear(<span class="hljs-string">"2024"</span>)); <span class="hljs-comment">// Output: true</span>
</code></pre>
<h3 id="unit-testing">Testing it Out From the Outside</h3>

<p>In the above two code blocks, we write our code and test it in the same place. But the code that goes into production will not have the opportunity to include such console logs that we have used extensively for demonstrating 'example usage' in the above code blocks. </p>
<p>This is where unit testing comes in. In unit testing, we first export the function for use in other places (files), then import that function in a test file. In that test file is where we run the test, build our cases, and finally run that test file to execute those tests.</p>
<p>I have used the Jest package to do this unit testing, and here is the code from my index file and test script file:</p>
<p><strong>index.js</strong></p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">isLeapYear</span>(<span class="hljs-params">year</span>) </span>{
  <span class="hljs-keyword">if</span> (<span class="hljs-built_in">isNaN</span>(<span class="hljs-built_in">Number</span>(year)) || year % <span class="hljs-number">1</span> != <span class="hljs-number">0</span> || year &lt;= <span class="hljs-number">0</span>) <span class="hljs-keyword">return</span> <span class="hljs-literal">undefined</span>;
  <span class="hljs-keyword">return</span> ((year % <span class="hljs-number">4</span> == <span class="hljs-number">0</span> &amp;&amp; year % <span class="hljs-number">100</span> != <span class="hljs-number">0</span>) || year % <span class="hljs-number">400</span> == <span class="hljs-number">0</span>) ? <span class="hljs-literal">true</span> : <span class="hljs-literal">false</span>;
}

<span class="hljs-built_in">module</span>.exports = isLeapYear;
</code></pre>
<p><strong>index.test.js</strong></p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> isLeapYear = <span class="hljs-built_in">require</span>(<span class="hljs-string">'./index.js'</span>);

describe(<span class="hljs-string">'Test isLeapYear'</span>, <span class="hljs-function">() =&gt;</span> {
  it(<span class="hljs-string">'should return true for leap year'</span>, <span class="hljs-function">() =&gt;</span> {
    expect(isLeapYear(<span class="hljs-number">2020</span>)).toBe(<span class="hljs-literal">true</span>);
  });
  it(<span class="hljs-string">'should return false for non-leap year'</span>, <span class="hljs-function">() =&gt;</span> {
    expect(isLeapYear(<span class="hljs-number">2023</span>)).toBe(<span class="hljs-literal">false</span>);
  });
  it(<span class="hljs-string">'should return undefined for invalid input'</span>, <span class="hljs-function">() =&gt;</span> {
    expect(isLeapYear(<span class="hljs-string">'TwentyTwentyFour'</span>)).toBe(<span class="hljs-literal">undefined</span>);
    expect(isLeapYear(<span class="hljs-string">'2023.99'</span>)).toBe(<span class="hljs-literal">undefined</span>);
    expect(isLeapYear(<span class="hljs-string">'0'</span>)).toBe(<span class="hljs-literal">undefined</span>);
    expect(isLeapYear(<span class="hljs-string">'-1'</span>)).toBe(<span class="hljs-literal">undefined</span>);
  });
  it(<span class="hljs-string">'should return true for a leap year in string format'</span>, <span class="hljs-function">() =&gt;</span> {
    expect(isLeapYear(<span class="hljs-string">"2024"</span>)).toBe(<span class="hljs-literal">true</span>);
  });
});
</code></pre>
<p>I installed Jest using the command <code>npm i jest</code>. Then, I added <code>jest</code> as a value for <code>test</code> in the <code>scripts</code> object inside my package.json file. Then, as I ran <code>npm test</code>, it passed all my test cases, like so:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/02/Screenshot-2024-02-29-05.25.03.png" alt="Image" width="600" height="400" loading="lazy">
<em>testing output</em></p>
<p>If you want to tweak and try this unit testing code, you can use and fork this <a target="_blank" href="https://replit.com/@nil-sj/UnitTestingExample">replit project</a>.</p>
<h2 id="end-note">End Note</h2>

<p>We've reviewed many programming concepts in the above exercise. And one key takeaway is that a program can be written in multiple ways. </p>
<p>There are typically many correct solutions to a programming problem. So beginner programmers should, therefore, think of the logic part of it (the algorithm) more than the exact execution steps when starting to solve a problem.</p>
<p>And by the way, if you're wondering why we have leap years, then this is for you: the time Earth takes to complete one revolution around the sun is not exactly 365 days (or 365 x 24 hours) but approximately one-quarter of a day extra. </p>
<p>This process may remind you of the modulus operator, represented by the symbol %, which returns the remainder of a division operation. Here, the approximate time (in hours) taken for one revolution of earth is being divided by 24 hours (that is, a day). It gives a remainder of about 6 hours. </p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> approxTimeHrsRev = <span class="hljs-number">8766</span>;
<span class="hljs-keyword">const</span> hrsPerDay = <span class="hljs-number">24</span>;
<span class="hljs-keyword">let</span> completedDaysEachYear;

<span class="hljs-keyword">let</span> remainderHrsPerYear = <span class="hljs-number">8766</span> % hrsPerDay;
completedDaysEachYear = (approxTimeHrsRev - remainderHrsPerYear) / hrsPerDay;

<span class="hljs-built_in">console</span>.log(<span class="hljs-string">`After <span class="hljs-subst">${completedDaysEachYear}</span> complete days, there is still about <span class="hljs-subst">${remainderHrsPerYear}</span> hours left out each year.`</span>);
<span class="hljs-comment">// Output: After 365 complete days, there is still about 6 hours left out each year.</span>
</code></pre>
<p>To account for those missed hours, we must adjust our calendars once every four years when those left-out portions add up to make — again approximately — a day. </p>
<p>Finally, because it is not exactly 6 hours, and a tiny bit more than that, we have to adjust every 100 and 400 years further.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Create a Music Bot Using Discord.js – Step-by-Step Tutorial ]]>
                </title>
                <description>
                    <![CDATA[ By Gabriel Tanner The Discord API provides you with an easy tool to create and use your own bots and tools.  In this tutorial, you'll learn how you can create a basic music bot and add it to your server. The bot will be able to play, skip, and ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-create-a-music-bot-using-discord-js-4436f5f3f0f8/</link>
                <guid isPermaLink="false">66d45ee33a8352b6c5a2aa55</guid>
                
                    <category>
                        <![CDATA[ bots ]]>
                    </category>
                
                    <category>
                        <![CDATA[ JavaScript ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ technology ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Web Development ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Wed, 28 Feb 2024 13:00:00 +0000</pubDate>
                <media:content url="https://cdn-media-1.freecodecamp.org/images/0*rFQhPUqebJY9N4Ue" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Gabriel Tanner</p>
<p>The Discord API provides you with an easy tool to create and use your own bots and tools. </p>
<p>In this tutorial, you'll learn how you can create a basic music bot and add it to your server. The bot will be able to play, skip, and stop the music, and will also support queuing functionality.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></li>
<li><a class="post-section-overview" href="#heading-how-to-set-up-a-discord-bot">How to Set Up a Discord Bot</a><br>– <a class="post-section-overview" href="#heading-how-to-add-the-bot-to-your-server">How to add the bot to your server</a><br>– <a class="post-section-overview" href="#heading-how-to-create-your-project">How to create your project</a><br>– <a class="post-section-overview" href="#heading-discordjs-basics">Discord.js basics</a></li>
<li><a class="post-section-overview" href="#heading-discord-bot-version-013">Discord Bot Version 0.13</a><br>– <a class="post-section-overview" href="#heading-how-to-create-the-discord-player">How to create the Discord player</a><br>– <a class="post-section-overview" href="#heading-how-to-add-slash-commands">How to add slash commands</a><br>– <a class="post-section-overview" href="#id=&quot;how-to-implement-interactions&quot;">How to implement interactions</a><br>– <a class="post-section-overview" href="#heading-how-to-play-songs">How to play songs</a><br>– <a class="post-section-overview" href="#heading-how-to-skip-songs">How to skip songs</a><br>– <a class="post-section-overview" href="#heading-how-to-stop-songs">How to stop songs</a><br>– <a class="post-section-overview" href="#heading-complete-source-code-for-the-indexjs">Complete source code for index.js</a></li>
<li><a class="post-section-overview" href="#heading-discord-bot-version-012">Discord Bot Version 0.12</a><br>– <a class="post-section-overview" href="#heading-how-to-read-messages">How to read messages</a><br>– <a class="post-section-overview" href="#heading-how-to-add-songs">How to add songs</a><br>– <a class="post-section-overview" href="#heading-how-to-play-songs">How to play songs</a><br>– <a class="post-section-overview" href="#heading-how-to-skip-songs">How to skip songs</a><br>– <a class="post-section-overview" href="#heading-how-to-stop-songs">How to stop songs</a><br>– <a class="post-section-overview" href="#heading-complete-source-code-for-the-indexjs">Complete source code for index.js</a></li>
<li><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></li>
</ol>
<h2 id="heading-prerequisites"><strong>Prerequisites</strong></h2>
<p>Before we get started creating the bot, make sure that you have installed all the tools you'll need:</p>
<ul>
<li><a target="_blank" href="https://nodejs.org/en/">Node</a></li>
<li><a target="_blank" href="https://www.npmjs.com/">NPM</a></li>
<li><a target="_blank" href="https://www.ffmpeg.org/">FFMPEG</a></li>
</ul>
<p>After you've installed these, you can continue by setting up your discord bot.</p>
<h2 id="heading-how-to-set-up-a-discord-bot"><strong>How to Set Up a Discord Bot</strong></h2>
<p>First, you need to create a new application on the discord development portal.</p>
<p>You can do so by visiting the <a target="_blank" href="https://discordapp.com/developers/applications/">portal</a> and clicking on New Application.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/02/Creating-application.webp.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Create a new Discord application</em></p>
<p>After that, you need to give your application a name and click the Create button.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/02/create-bot.webp.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Give your bot whatever name you like - I've chosen "music-bot"</em></p>
<p>After that, select the bot tab and click on Add Bot.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/02/image-148.png" alt="Image" width="600" height="400" loading="lazy">
<em>Add your bot under the "Bot" tab</em></p>
<p>Now your bot is created and you can continue with inviting it to your server.</p>
<h3 id="heading-how-to-add-the-bot-to-your-server">How to add the bot to your server</h3>
<p>After creating your bot, you can invite it using the OAuth2 URL Generator.</p>
<p>For that, you need to navigate to the OAuth2 page and select bot in the scope tap.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/02/oauth-url-generator.png.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Selecting "bot" on the 0Auth2 Generator page</em></p>
<p>After that, you need to select the needed permissions to play music and read messages.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/02/bot-permissions.png.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Select the permissions you'll need - "read messages/view channels", "send messages", "manage messages", "add reactions", "use slash commands", "connect", and "speak.</em></p>
<p>Then you can copy your generated URL and paste it into your browser.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/02/bot-invite-url.png.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Copy the URL</em></p>
<p>After pasting it, add it to your server by selecting the server and clicking the authorize button.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/02/english-image.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h3 id="heading-how-to-create-your-project">How to create your project</h3>
<p>Now you can start creating your project using the terminal.</p>
<p>First, create a directory and move into it. You can do so by using these two commands:</p>
<pre><code class="lang-bash">mkdir musicbot &amp;&amp; <span class="hljs-built_in">cd</span> musicbot
</code></pre>
<p>After that, create your project modules using the <code>npm init</code> command. After entering the command, you will be asked some questions – just answer them and continue.</p>
<p>Then you just need to create the two files you will work in.</p>
<pre><code>touch index.js &amp;&amp; touch config.json
</code></pre><p>Now, open your project in your text editor. I personally use VS Code and can open it with the following command:</p>
<pre><code class="lang-bash">code .
</code></pre>
<h3 id="heading-discordjs-basics">Discord.js basics</h3>
<p>Now you need to install some dependencies before we can get started.</p>
<pre><code>npm install discord.js@^<span class="hljs-number">12.5</span><span class="hljs-number">.3</span> ffmpeg fluent-ffmpeg @discordjs/opus ytdl-core --save
</code></pre><p>After the installation finishes, you can continue with writing your config.json file. Here, save the token of your bot and the prefix it should listen for.</p>
<pre><code class="lang-json">{
<span class="hljs-attr">"prefix"</span>: <span class="hljs-string">"!"</span>,
<span class="hljs-attr">"token"</span>: <span class="hljs-string">"your-token"</span>
}
</code></pre>
<p>To get your token, you need to visit the discord developer portal again and copy it from the bot section.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/02/get-token.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Get your bot token by clicking "Copy" and save it somewhere safe</em></p>
<p>Those are the only things you need to do in your <code>config.json</code> file. So now it's time to start writing your JavaScript code. </p>
<p>The article includes two versions: one for the new discord.js v13, which uses slash commands combined with the discord-player library to implement the music functionality, and one for discord.js v12.5.3, which implements the functionality without a library. </p>
<p>The older version is better for learning purposes, and the newer version works with the current discord.js and is a lot easier to implement – so choose which you prefer.</p>
<h2 id="heading-discord-bot-version-013"><strong>Discord Bot Version 0.13</strong></h2>
<p>Now you just need to install some more dependencies before we can get started.</p>
<pre><code>npm install discord.js discord-player @discordjs/opus
</code></pre><p>After installing the dependencies, import them in your dependencies.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> { Client, GuildMember, Intents } = <span class="hljs-built_in">require</span>(<span class="hljs-string">"discord.js"</span>);
<span class="hljs-keyword">const</span> { Player, QueryType } = <span class="hljs-built_in">require</span>(<span class="hljs-string">"discord-player"</span>);
<span class="hljs-keyword">const</span> config = <span class="hljs-built_in">require</span>(<span class="hljs-string">"./config.json"</span>);
</code></pre>
<p>After that, create your client and log in using your token.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> client = <span class="hljs-keyword">new</span> Client({
    <span class="hljs-attr">intents</span>: [Intents.FLAGS.GUILD_VOICE_STATES, Intents.FLAGS.GUILD_MESSAGES, Intents.FLAGS.GUILDS]
});
client.login(config.token);
</code></pre>
<p>Now add some basic listeners that console.log when they get executed.</p>
<pre><code>client.once(<span class="hljs-string">'ready'</span>, <span class="hljs-function">() =&gt;</span> {
 <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Ready!'</span>);
});

client.on(<span class="hljs-string">"error"</span>, <span class="hljs-built_in">console</span>.error);
client.on(<span class="hljs-string">"warn"</span>, <span class="hljs-built_in">console</span>.warn);
</code></pre><p>After that, you can start your bot using the <code>node</code> command and the bot should be online on Discord and print “Ready!” in the console.</p>
<pre><code class="lang-bash">node index.js
</code></pre>
<h3 id="heading-how-to-create-the-discord-player">How to create the Discord player</h3>
<p>Now that you've created the client for the discord bot, you can continue by initializing your player. This will allow you to play and manage music in your Discord channel.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> player = <span class="hljs-keyword">new</span> Player(client);
</code></pre>
<p>You can also add some error handlers that will be called if an error occurs.</p>
<pre><code class="lang-javascript">player.on(<span class="hljs-string">"error"</span>, <span class="hljs-function">(<span class="hljs-params">queue, error</span>) =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`[<span class="hljs-subst">${queue.guild.name}</span>] Error emitted from the queue: <span class="hljs-subst">${error.message}</span>`</span>);
});
player.on(<span class="hljs-string">"connectionError"</span>, <span class="hljs-function">(<span class="hljs-params">queue, error</span>) =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`[<span class="hljs-subst">${queue.guild.name}</span>] Error emitted from the connection: <span class="hljs-subst">${error.message}</span>`</span>);
});
</code></pre>
<p>The last thing you need to do is add listeners for the different player events like a song starting or being added.</p>
<pre><code class="lang-javascript">player.on(<span class="hljs-string">"trackStart"</span>, <span class="hljs-function">(<span class="hljs-params">queue, track</span>) =&gt;</span> {
    queue.metadata.send(<span class="hljs-string">`🎶 | Started playing: **<span class="hljs-subst">${track.title}</span>** in **<span class="hljs-subst">${queue.connection.channel.name}</span>**!`</span>);
});

player.on(<span class="hljs-string">"trackAdd"</span>, <span class="hljs-function">(<span class="hljs-params">queue, track</span>) =&gt;</span> {
    queue.metadata.send(<span class="hljs-string">`🎶 | Track **<span class="hljs-subst">${track.title}</span>** queued!`</span>);
});

player.on(<span class="hljs-string">"botDisconnect"</span>, <span class="hljs-function">(<span class="hljs-params">queue</span>) =&gt;</span> {
    queue.metadata.send(<span class="hljs-string">"❌ | I was manually disconnected from the voice channel, clearing queue!"</span>);
});

player.on(<span class="hljs-string">"channelEmpty"</span>, <span class="hljs-function">(<span class="hljs-params">queue</span>) =&gt;</span> {
    queue.metadata.send(<span class="hljs-string">"❌ | Nobody is in the voice channel, leaving..."</span>);
});

player.on(<span class="hljs-string">"queueEnd"</span>, <span class="hljs-function">(<span class="hljs-params">queue</span>) =&gt;</span> {
    queue.metadata.send(<span class="hljs-string">"✅ | Queue finished!"</span>);
});
</code></pre>
<p>In most cases, you just send a message into the Discord text channel using the <code>send()</code> function.</p>
<h3 id="heading-how-to-add-slash-commands">How to add slash commands</h3>
<p>After you've set up the player successfully, you can continue by adding your Slash commands to your client. This step lets Discord know which commands the bot can execute.</p>
<pre><code class="lang-javascript">client.on(<span class="hljs-string">"messageCreate"</span>, <span class="hljs-keyword">async</span> (message) =&gt; {
        <span class="hljs-keyword">if</span> (message.author.bot || !message.guild) <span class="hljs-keyword">return</span>;
    <span class="hljs-keyword">if</span> (!client.application?.owner) <span class="hljs-keyword">await</span> client.application?.fetch();
});
</code></pre>
<p>You can do this by implementing a simple <code>!deploy</code> command that saves your commands in the <code>guild.commands</code> variable of a message. </p>
<p>A slash command has a name, a description, and an optional options field that contains the command’s parameters. For example, the play command takes a song query as an argument.</p>
<pre><code class="lang-javascript">client.on(<span class="hljs-string">"messageCreate"</span>, <span class="hljs-keyword">async</span> (message) =&gt; {
        ...

        if (message.content === <span class="hljs-string">"!deploy"</span> &amp;&amp; message.author.id === client.application?.owner?.id) {
        <span class="hljs-keyword">await</span> message.guild.commands.set([
            {
                <span class="hljs-attr">name</span>: <span class="hljs-string">"play"</span>,
                <span class="hljs-attr">description</span>: <span class="hljs-string">"Plays a song from youtube"</span>,
                <span class="hljs-attr">options</span>: [
                    {
                        <span class="hljs-attr">name</span>: <span class="hljs-string">"query"</span>,
                        <span class="hljs-attr">type</span>: <span class="hljs-string">"STRING"</span>,
                        <span class="hljs-attr">description</span>: <span class="hljs-string">"The song you want to play"</span>,
                        <span class="hljs-attr">required</span>: <span class="hljs-literal">true</span>
                    }
                ]
            },
            {
                <span class="hljs-attr">name</span>: <span class="hljs-string">"skip"</span>,
                <span class="hljs-attr">description</span>: <span class="hljs-string">"Skip to the current song"</span>
            },
            {
                <span class="hljs-attr">name</span>: <span class="hljs-string">"queue"</span>,
                <span class="hljs-attr">description</span>: <span class="hljs-string">"See the queue"</span>
            },
            {
                <span class="hljs-attr">name</span>: <span class="hljs-string">"stop"</span>,
                <span class="hljs-attr">description</span>: <span class="hljs-string">"Stop the player"</span>
            },
        ]);

        <span class="hljs-keyword">await</span> message.reply(<span class="hljs-string">"Deployed!"</span>);
    }
});
</code></pre>
<p>After entering <code>!deploy</code> in your Discord text chat, the slash commands will be added to your application. When typing <code>/</code> into the chat you should see something similar to this:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/02/bot-slash-commands.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Example of using the slash commands</em></p>
<h3 id="heading-how-to-implement-interactions">How to implement interactions</h3>
<p>Once the interactions (slash commands) are defined, now you'll need to implement them. </p>
<p>All slash commands trigger the <code>interactionCreate</code> event and can be implemented inside the async function below. Before executing any functionality, run a few conditionals to check if the user is allowed to perform the given functionality.</p>
<pre><code class="lang-javascript">client.on(<span class="hljs-string">"interactionCreate"</span>, <span class="hljs-keyword">async</span> (interaction) =&gt; {
    <span class="hljs-keyword">if</span> (!interaction.isCommand() || !interaction.guildId) <span class="hljs-keyword">return</span>;

    <span class="hljs-keyword">if</span> (!(interaction.member <span class="hljs-keyword">instanceof</span> GuildMember) || !interaction.member.voice.channel) {
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">void</span> interaction.reply({ <span class="hljs-attr">content</span>: <span class="hljs-string">"You are not in a voice channel!"</span>, <span class="hljs-attr">ephemeral</span>: <span class="hljs-literal">true</span> });
    }

    <span class="hljs-keyword">if</span> (interaction.guild.me.voice.channelId &amp;&amp; interaction.member.voice.channelId !== interaction.guild.me.voice.channelId) {
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">void</span> interaction.reply({ <span class="hljs-attr">content</span>: <span class="hljs-string">"You are not in my voice channel!"</span>, <span class="hljs-attr">ephemeral</span>: <span class="hljs-literal">true</span> });
    }
});
</code></pre>
<p>After that, check which command is being executed by matching the <code>commandName</code> with the name of the commands you defined above.</p>
<pre><code class="lang-javascript">client.on(<span class="hljs-string">"interactionCreate"</span>, <span class="hljs-keyword">async</span> (interaction) =&gt; {
    ...

        if (interaction.commandName === <span class="hljs-string">"play"</span>) {
            <span class="hljs-comment">// <span class="hljs-doctag">TODO:</span> Implement play command</span>
        }
});
</code></pre>
<p>You can then add the implementation inside of the <code>if</code> statement.</p>
<h3 id="heading-how-to-play-songs">How to play songs</h3>
<p>The play command requires you to search for the provided song and add the result to the current queue of songs. </p>
<p>Let’s start by retrieving the user-provided query using the <code>options.get()</code> function. After that you can use the <code>player.search()</code> function to search for the desired song.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">if</span> (interaction.commandName === <span class="hljs-string">"play"</span>) {
    <span class="hljs-keyword">await</span> interaction.deferReply();

    <span class="hljs-keyword">const</span> query = interaction.options.get(<span class="hljs-string">"query"</span>).value;
    <span class="hljs-keyword">const</span> searchResult = <span class="hljs-keyword">await</span> player
        .search(query, {
            <span class="hljs-attr">requestedBy</span>: interaction.user,
            <span class="hljs-attr">searchEngine</span>: QueryType.AUTO
        })
        .catch(<span class="hljs-function">() =&gt;</span> {});
    <span class="hljs-keyword">if</span> (!searchResult || !searchResult.tracks.length) <span class="hljs-keyword">return</span> <span class="hljs-keyword">void</span> interaction.followUp({ <span class="hljs-attr">content</span>: <span class="hljs-string">"No results were found!"</span> });
}
</code></pre>
<p>Now that you have the song, you can create a queue for the songs (if there is already a queue, the <code>createQueue</code> function will return the existing one). </p>
<p>Once the queue is created, you can try joining the user’s voice channel. If that is successful, add the song to the current queue using the <code>addTracks</code> function.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">if</span> (interaction.commandName === <span class="hljs-string">"play"</span>) {
    ...

        const queue = <span class="hljs-keyword">await</span> player.createQueue(interaction.guild, {
        <span class="hljs-attr">metadata</span>: interaction.channel
    });

    <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">if</span> (!queue.connection) <span class="hljs-keyword">await</span> queue.connect(interaction.member.voice.channel);
    } <span class="hljs-keyword">catch</span> {
        <span class="hljs-keyword">void</span> player.deleteQueue(interaction.guildId);
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">void</span> interaction.followUp({ <span class="hljs-attr">content</span>: <span class="hljs-string">"Could not join your voice channel!"</span> });
    }

    <span class="hljs-keyword">await</span> interaction.followUp({ <span class="hljs-attr">content</span>: <span class="hljs-string">`⏱ | Loading your <span class="hljs-subst">${searchResult.playlist ? <span class="hljs-string">"playlist"</span> : <span class="hljs-string">"track"</span>}</span>...`</span> });
    searchResult.playlist ? queue.addTracks(searchResult.tracks) : queue.addTrack(searchResult.tracks[<span class="hljs-number">0</span>]);
    <span class="hljs-keyword">if</span> (!queue.playing) <span class="hljs-keyword">await</span> queue.play();
}
</code></pre>
<p>Lastly, if the queue isn’t already playing, let’s start it using the <code>play()</code> function.</p>
<h3 id="heading-how-to-skip-songs">How to skip songs</h3>
<p>Skipping is quite easy – you can do it by calling the <code>skip()</code> function on the queue.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">if</span> (interaction.commandName === <span class="hljs-string">"skip"</span>) {
    <span class="hljs-keyword">await</span> interaction.deferReply();
    <span class="hljs-keyword">const</span> queue = player.getQueue(interaction.guildId);
    <span class="hljs-keyword">if</span> (!queue || !queue.playing) <span class="hljs-keyword">return</span> <span class="hljs-keyword">void</span> interaction.followUp({ <span class="hljs-attr">content</span>: <span class="hljs-string">"❌ | No music is being played!"</span> });
    <span class="hljs-keyword">const</span> currentTrack = queue.current;
    <span class="hljs-keyword">const</span> success = queue.skip();
    <span class="hljs-keyword">return</span> <span class="hljs-keyword">void</span> interaction.followUp({
        <span class="hljs-attr">content</span>: success ? <span class="hljs-string">`✅ | Skipped **<span class="hljs-subst">${currentTrack}</span>**!`</span> : <span class="hljs-string">"❌ | Something went wrong!"</span>
    });
}
</code></pre>
<p>If the action is successful, you can write a message to the Discord text channel using <code>interaction.followUp()</code>.</p>
<h3 id="heading-how-to-stop-songs">How to stop songs</h3>
<p>The stop functionality will remove all the songs from the queue and the bot will leave the voice channel. You can do this by destroying the current queue which automatically makes the bot leave the voice channel (unless you configure it otherwise in the player configuration).</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (interaction.commandName === <span class="hljs-string">"stop"</span>) {
        <span class="hljs-keyword">await</span> interaction.deferReply();
        <span class="hljs-keyword">const</span> queue = player.getQueue(interaction.guildId);
        <span class="hljs-keyword">if</span> (!queue || !queue.playing) <span class="hljs-keyword">return</span> <span class="hljs-keyword">void</span> interaction.followUp({ <span class="hljs-attr">content</span>: <span class="hljs-string">"❌ | No music is being played!"</span> });
        queue.destroy();
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">void</span> interaction.followUp({ <span class="hljs-attr">content</span>: <span class="hljs-string">"🛑 | Stopped the player!"</span> });
    }
</code></pre>
<h3 id="heading-complete-source-code-for-the-indexjs">Complete source code for the index.js:</h3>
<p>Here you can get the complete source code for the music bot:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> { Client, GuildMember, Intents } = <span class="hljs-built_in">require</span>(<span class="hljs-string">"discord.js"</span>);
<span class="hljs-keyword">const</span> { Player, QueryType } = <span class="hljs-built_in">require</span>(<span class="hljs-string">"discord-player"</span>);
<span class="hljs-keyword">const</span> config = <span class="hljs-built_in">require</span>(<span class="hljs-string">"./config.json"</span>);

<span class="hljs-keyword">const</span> client = <span class="hljs-keyword">new</span> Client({
    <span class="hljs-attr">intents</span>: [Intents.FLAGS.GUILD_VOICE_STATES, Intents.FLAGS.GUILD_MESSAGES, Intents.FLAGS.GUILDS]
});

client.on(<span class="hljs-string">"ready"</span>, <span class="hljs-function">() =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Bot is online!"</span>);
    client.user.setActivity({
        <span class="hljs-attr">name</span>: <span class="hljs-string">"🎶 | Music Time"</span>,
        <span class="hljs-attr">type</span>: <span class="hljs-string">"LISTENING"</span>
    });
});
client.on(<span class="hljs-string">"error"</span>, <span class="hljs-built_in">console</span>.error);
client.on(<span class="hljs-string">"warn"</span>, <span class="hljs-built_in">console</span>.warn);

<span class="hljs-keyword">const</span> player = <span class="hljs-keyword">new</span> Player(client);

player.on(<span class="hljs-string">"error"</span>, <span class="hljs-function">(<span class="hljs-params">queue, error</span>) =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`[<span class="hljs-subst">${queue.guild.name}</span>] Error emitted from the queue: <span class="hljs-subst">${error.message}</span>`</span>);
});
player.on(<span class="hljs-string">"connectionError"</span>, <span class="hljs-function">(<span class="hljs-params">queue, error</span>) =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`[<span class="hljs-subst">${queue.guild.name}</span>] Error emitted from the connection: <span class="hljs-subst">${error.message}</span>`</span>);
});

player.on(<span class="hljs-string">"trackStart"</span>, <span class="hljs-function">(<span class="hljs-params">queue, track</span>) =&gt;</span> {
    queue.metadata.send(<span class="hljs-string">`🎶 | Started playing: **<span class="hljs-subst">${track.title}</span>** in **<span class="hljs-subst">${queue.connection.channel.name}</span>**!`</span>);
});

player.on(<span class="hljs-string">"trackAdd"</span>, <span class="hljs-function">(<span class="hljs-params">queue, track</span>) =&gt;</span> {
    queue.metadata.send(<span class="hljs-string">`🎶 | Track **<span class="hljs-subst">${track.title}</span>** queued!`</span>);
});

player.on(<span class="hljs-string">"botDisconnect"</span>, <span class="hljs-function">(<span class="hljs-params">queue</span>) =&gt;</span> {
    queue.metadata.send(<span class="hljs-string">"❌ | I was manually disconnected from the voice channel, clearing queue!"</span>);
});

player.on(<span class="hljs-string">"channelEmpty"</span>, <span class="hljs-function">(<span class="hljs-params">queue</span>) =&gt;</span> {
    queue.metadata.send(<span class="hljs-string">"❌ | Nobody is in the voice channel, leaving..."</span>);
});

player.on(<span class="hljs-string">"queueEnd"</span>, <span class="hljs-function">(<span class="hljs-params">queue</span>) =&gt;</span> {
    queue.metadata.send(<span class="hljs-string">"✅ | Queue finished!"</span>);
});

client.on(<span class="hljs-string">"messageCreate"</span>, <span class="hljs-keyword">async</span> (message) =&gt; {
    <span class="hljs-keyword">if</span> (message.author.bot || !message.guild) <span class="hljs-keyword">return</span>;
    <span class="hljs-keyword">if</span> (!client.application?.owner) <span class="hljs-keyword">await</span> client.application?.fetch();

    <span class="hljs-keyword">if</span> (message.content === <span class="hljs-string">"!deploy"</span> &amp;&amp; message.author.id === client.application?.owner?.id) {
        <span class="hljs-keyword">await</span> message.guild.commands.set([
            {
                <span class="hljs-attr">name</span>: <span class="hljs-string">"play"</span>,
                <span class="hljs-attr">description</span>: <span class="hljs-string">"Plays a song from youtube"</span>,
                <span class="hljs-attr">options</span>: [
                    {
                        <span class="hljs-attr">name</span>: <span class="hljs-string">"query"</span>,
                        <span class="hljs-attr">type</span>: <span class="hljs-string">"STRING"</span>,
                        <span class="hljs-attr">description</span>: <span class="hljs-string">"The song you want to play"</span>,
                        <span class="hljs-attr">required</span>: <span class="hljs-literal">true</span>
                    }
                ]
            },
            {
                <span class="hljs-attr">name</span>: <span class="hljs-string">"skip"</span>,
                <span class="hljs-attr">description</span>: <span class="hljs-string">"Skip to the current song"</span>
            },
            {
                <span class="hljs-attr">name</span>: <span class="hljs-string">"stop"</span>,
                <span class="hljs-attr">description</span>: <span class="hljs-string">"Stop the player"</span>
            },
        ]);

        <span class="hljs-keyword">await</span> message.reply(<span class="hljs-string">"Deployed!"</span>);
    }
});

client.on(<span class="hljs-string">"interactionCreate"</span>, <span class="hljs-keyword">async</span> (interaction) =&gt; {
    <span class="hljs-keyword">if</span> (!interaction.isCommand() || !interaction.guildId) <span class="hljs-keyword">return</span>;

    <span class="hljs-keyword">if</span> (!(interaction.member <span class="hljs-keyword">instanceof</span> GuildMember) || !interaction.member.voice.channel) {
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">void</span> interaction.reply({ <span class="hljs-attr">content</span>: <span class="hljs-string">"You are not in a voice channel!"</span>, <span class="hljs-attr">ephemeral</span>: <span class="hljs-literal">true</span> });
    }

    <span class="hljs-keyword">if</span> (interaction.guild.me.voice.channelId &amp;&amp; interaction.member.voice.channelId !== interaction.guild.me.voice.channelId) {
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">void</span> interaction.reply({ <span class="hljs-attr">content</span>: <span class="hljs-string">"You are not in my voice channel!"</span>, <span class="hljs-attr">ephemeral</span>: <span class="hljs-literal">true</span> });
    }

    <span class="hljs-keyword">if</span> (interaction.commandName === <span class="hljs-string">"play"</span>) {
        <span class="hljs-keyword">await</span> interaction.deferReply();

        <span class="hljs-keyword">const</span> query = interaction.options.get(<span class="hljs-string">"query"</span>).value;
        <span class="hljs-keyword">const</span> searchResult = <span class="hljs-keyword">await</span> player
            .search(query, {
                <span class="hljs-attr">requestedBy</span>: interaction.user,
                <span class="hljs-attr">searchEngine</span>: QueryType.AUTO
            })
            .catch(<span class="hljs-function">() =&gt;</span> {});
        <span class="hljs-keyword">if</span> (!searchResult || !searchResult.tracks.length) <span class="hljs-keyword">return</span> <span class="hljs-keyword">void</span> interaction.followUp({ <span class="hljs-attr">content</span>: <span class="hljs-string">"No results were found!"</span> });

        <span class="hljs-keyword">const</span> queue = <span class="hljs-keyword">await</span> player.createQueue(interaction.guild, {
            <span class="hljs-attr">metadata</span>: interaction.channel
        });

        <span class="hljs-keyword">try</span> {
            <span class="hljs-keyword">if</span> (!queue.connection) <span class="hljs-keyword">await</span> queue.connect(interaction.member.voice.channel);
        } <span class="hljs-keyword">catch</span> {
            <span class="hljs-keyword">void</span> player.deleteQueue(interaction.guildId);
            <span class="hljs-keyword">return</span> <span class="hljs-keyword">void</span> interaction.followUp({ <span class="hljs-attr">content</span>: <span class="hljs-string">"Could not join your voice channel!"</span> });
        }

        <span class="hljs-keyword">await</span> interaction.followUp({ <span class="hljs-attr">content</span>: <span class="hljs-string">`⏱ | Loading your <span class="hljs-subst">${searchResult.playlist ? <span class="hljs-string">"playlist"</span> : <span class="hljs-string">"track"</span>}</span>...`</span> });
        searchResult.playlist ? queue.addTracks(searchResult.tracks) : queue.addTrack(searchResult.tracks[<span class="hljs-number">0</span>]);
        <span class="hljs-keyword">if</span> (!queue.playing) <span class="hljs-keyword">await</span> queue.play();
    } <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (interaction.commandName === <span class="hljs-string">"skip"</span>) {
        <span class="hljs-keyword">await</span> interaction.deferReply();
        <span class="hljs-keyword">const</span> queue = player.getQueue(interaction.guildId);
        <span class="hljs-keyword">if</span> (!queue || !queue.playing) <span class="hljs-keyword">return</span> <span class="hljs-keyword">void</span> interaction.followUp({ <span class="hljs-attr">content</span>: <span class="hljs-string">"❌ | No music is being played!"</span> });
        <span class="hljs-keyword">const</span> currentTrack = queue.current;
        <span class="hljs-keyword">const</span> success = queue.skip();
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">void</span> interaction.followUp({
            <span class="hljs-attr">content</span>: success ? <span class="hljs-string">`✅ | Skipped **<span class="hljs-subst">${currentTrack}</span>**!`</span> : <span class="hljs-string">"❌ | Something went wrong!"</span>
        });
    } <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (interaction.commandName === <span class="hljs-string">"stop"</span>) {
        <span class="hljs-keyword">await</span> interaction.deferReply();
        <span class="hljs-keyword">const</span> queue = player.getQueue(interaction.guildId);
        <span class="hljs-keyword">if</span> (!queue || !queue.playing) <span class="hljs-keyword">return</span> <span class="hljs-keyword">void</span> interaction.followUp({ <span class="hljs-attr">content</span>: <span class="hljs-string">"❌ | No music is being played!"</span> });
        queue.destroy();
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">void</span> interaction.followUp({ <span class="hljs-attr">content</span>: <span class="hljs-string">"🛑 | Stopped the player!"</span> });
    } <span class="hljs-keyword">else</span> {
        interaction.reply({
            <span class="hljs-attr">content</span>: <span class="hljs-string">"Unknown command!"</span>,
            <span class="hljs-attr">ephemeral</span>: <span class="hljs-literal">true</span>
        });
    }
});

client.login(config.token);
</code></pre>
<h2 id="heading-discord-bot-version-012"><strong>Discord Bot Version 0.12</strong></h2>
<p>Now you'll just need to install some dependencies before we can get started.</p>
<pre><code>npm install discord.js ffmpeg fluent-ffmpeg @discordjs/opus ytdl-core --save
</code></pre><p>After installing the dependencies, import them in your dependencies.</p>
<pre><code><span class="hljs-keyword">const</span> Discord = <span class="hljs-built_in">require</span>(<span class="hljs-string">'discord.js'</span>);
<span class="hljs-keyword">const</span> {
    prefix,
    token,
} = <span class="hljs-built_in">require</span>(<span class="hljs-string">'./config.json'</span>);
<span class="hljs-keyword">const</span> ytdl = <span class="hljs-built_in">require</span>(<span class="hljs-string">'ytdl-core'</span>);
</code></pre><p>After that, create your client and login using your token.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> client = <span class="hljs-keyword">new</span> Discord.Client();
client.login(token);
</code></pre>
<p>Now let’s add some basic listeners that console.log when they get executed.</p>
<pre><code>client.once(<span class="hljs-string">'ready'</span>, <span class="hljs-function">() =&gt;</span> {
 <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Ready!'</span>);
});
client.once(<span class="hljs-string">'reconnecting'</span>, <span class="hljs-function">() =&gt;</span> {
 <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Reconnecting!'</span>);
});
client.once(<span class="hljs-string">'disconnect'</span>, <span class="hljs-function">() =&gt;</span> {
 <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Disconnect!'</span>);
});
</code></pre><p>After that, you can start your bot using the <code>node</code> command and it should be online on Discord and print “Ready!” in the console.</p>
<pre><code class="lang-bash">node index.js
</code></pre>
<h3 id="heading-how-to-read-messages">How to read messages</h3>
<p>Now that your bot is on your server and able to go online, you can start reading chat messages and responding to them.</p>
<p>To read messages, you only need to write one simple function:</p>
<pre><code class="lang-javascript">client.on(<span class="hljs-string">'message'</span>, <span class="hljs-keyword">async</span> message =&gt; {

}
</code></pre>
<p>Here, you're creating a listener for the message event, getting the message, and saving it into a message object if it's triggered.</p>
<p>Now you need to check if the message is from your own bot and ignore it if it is.</p>
<pre><code><span class="hljs-keyword">if</span> (message.author.bot) <span class="hljs-keyword">return</span>;
</code></pre><p>In this line, you're checking if the author of the message is your bot and returning if it is.</p>
<p>After that, check if the message starts with the prefix you defined earlier and return if it doesn’t.</p>
<pre><code><span class="hljs-keyword">if</span> (!message.content.startsWith(prefix)) <span class="hljs-keyword">return</span>;
</code></pre><p>After that, you can check which command you need to execute. You can do so using some simple if statements:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> serverQueue = queue.get(message.guild.id);

<span class="hljs-keyword">if</span> (message.content.startsWith(<span class="hljs-string">`<span class="hljs-subst">${prefix}</span>play`</span>)) {
    execute(message, serverQueue);
    <span class="hljs-keyword">return</span>;
} <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (message.content.startsWith(<span class="hljs-string">`<span class="hljs-subst">${prefix}</span>skip`</span>)) {
    skip(message, serverQueue);
    <span class="hljs-keyword">return</span>;
} <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (message.content.startsWith(<span class="hljs-string">`<span class="hljs-subst">${prefix}</span>stop`</span>)) {
    stop(message, serverQueue);
    <span class="hljs-keyword">return</span>;
} <span class="hljs-keyword">else</span> {
    message.channel.send(<span class="hljs-string">"You need to enter a valid command!"</span>);
}
</code></pre>
<p>In this code block, you're checking which command to execute and calling the command. If the input command isn’t valid, you're writing an error message into the chat using the <code>send()</code> function.</p>
<p>Now that you know which command you need to execute, you can start implementing these commands.</p>
<h3 id="heading-how-to-add-songs">How to add songs</h3>
<p>Let’s start by adding the play command. For that, you'll need a song and a guild (a guild represents an isolated collection of users and channels and is often referred to as a server). You'll also need the ytdl library you installed earlier.</p>
<p>First, create a map with the name of the queue where you save all the songs you type in the chat.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> queue = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Map</span>();
</code></pre>
<p>After that, create an async function called execute and check if the user is in a voice chat and if the bot has the right permissions. If not, write an error message and return.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">execute</span>(<span class="hljs-params">message, serverQueue</span>) </span>{
  <span class="hljs-keyword">const</span> args = message.content.split(<span class="hljs-string">" "</span>);

  <span class="hljs-keyword">const</span> voiceChannel = message.member.voice.channel;
  <span class="hljs-keyword">if</span> (!voiceChannel)
    <span class="hljs-keyword">return</span> message.channel.send(
      <span class="hljs-string">"You need to be in a voice channel to play music!"</span>
    );
  <span class="hljs-keyword">const</span> permissions = voiceChannel.permissionsFor(message.client.user);
  <span class="hljs-keyword">if</span> (!permissions.has(<span class="hljs-string">"CONNECT"</span>) || !permissions.has(<span class="hljs-string">"SPEAK"</span>)) {
    <span class="hljs-keyword">return</span> message.channel.send(
      <span class="hljs-string">"I need the permissions to join and speak in your voice channel!"</span>
    );
  }
}
</code></pre>
<p>Now you can continue with getting the song info and saving it into a song object. For that, use your <code>ytdl</code> library which gets the song information from the YouTube link.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> songInfo = <span class="hljs-keyword">await</span> ytdl.getInfo(args[<span class="hljs-number">1</span>]);
<span class="hljs-keyword">const</span> song = {
 <span class="hljs-attr">title</span>: songInfo.title,
 <span class="hljs-attr">url</span>: songInfo.video_url,
};
</code></pre>
<p>This will get the information of the song using the <code>ytdl</code> library you installed earlier. Then, save the information you need into a song object.</p>
<p>After saving the song info, you just need to create a contract you can add to your queue. </p>
<p>To do so, first check if your serverQueue is already defined which means that music is already playing. If so, add the song to your existing serverQueue and send a success message. If not, create it and try to join the voice channel and start playing music.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">if</span> (!serverQueue) {

}<span class="hljs-keyword">else</span> {
 serverQueue.songs.push(song);
 <span class="hljs-built_in">console</span>.log(serverQueue.songs);
 <span class="hljs-keyword">return</span> message.channel.send(<span class="hljs-string">`<span class="hljs-subst">${song.title}</span> has been added to the queue!`</span>);
}
</code></pre>
<p>Here, check if the <code>serverQueue</code>is empty and add the song to it if it’s not. Now you just need to create your contract if the <code>serverQueue</code> is null.</p>
<pre><code class="lang-javascript"><span class="hljs-comment">// Creating the contract for our queue</span>
<span class="hljs-keyword">const</span> queueContruct = {
 <span class="hljs-attr">textChannel</span>: message.channel,
 <span class="hljs-attr">voiceChannel</span>: voiceChannel,
 <span class="hljs-attr">connection</span>: <span class="hljs-literal">null</span>,
 <span class="hljs-attr">songs</span>: [],
 <span class="hljs-attr">volume</span>: <span class="hljs-number">5</span>,
 <span class="hljs-attr">playing</span>: <span class="hljs-literal">true</span>,
};
<span class="hljs-comment">// Setting the queue using our contract</span>
queue.set(message.guild.id, queueContruct);
<span class="hljs-comment">// Pushing the song to our songs array</span>
queueContruct.songs.push(song);

<span class="hljs-keyword">try</span> {
 <span class="hljs-comment">// Here we try to join the voicechat and save our connection into our object.</span>
 <span class="hljs-keyword">var</span> connection = <span class="hljs-keyword">await</span> voiceChannel.join();
 queueContruct.connection = connection;
 <span class="hljs-comment">// Calling the play function to start a song</span>
 play(message.guild, queueContruct.songs[<span class="hljs-number">0</span>]);
} <span class="hljs-keyword">catch</span> (err) {
 <span class="hljs-comment">// Printing the error message if the bot fails to join the voicechat</span>
 <span class="hljs-built_in">console</span>.log(err);
 queue.delete(message.guild.id);
 <span class="hljs-keyword">return</span> message.channel.send(err);
}
</code></pre>
<p>In this code block, you created a contract and added your song to the songs array. After that, you tried to join the voice chat of the user and called your <code>play()</code> function you'll implement after that.</p>
<h3 id="heading-how-to-play-songs-1">How to play songs</h3>
<p>Now that you can add our songs to your queue and create a contract if there isn’t one yet, you can implement the play functionality.</p>
<p>First, create a function called play which takes two parameters (the guild and the song you want to play) and checks if the song is empty. If so, just leave the voice channel and delete the queue.</p>
<pre><code class="lang-javascript"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">play</span>(<span class="hljs-params">guild, song</span>) </span>{
  <span class="hljs-keyword">const</span> serverQueue = queue.get(guild.id);
  <span class="hljs-keyword">if</span> (!song) {
    serverQueue.voiceChannel.leave();
    queue.delete(guild.id);
    <span class="hljs-keyword">return</span>;
  }
}
</code></pre>
<p>After that, start playing your song using the <code>play()</code> function of the connection and passing the URL of your song.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> dispatcher = serverQueue.connection
    .play(ytdl(song.url))
    .on(<span class="hljs-string">"finish"</span>, <span class="hljs-function">() =&gt;</span> {
        serverQueue.songs.shift();
        play(guild, serverQueue.songs[<span class="hljs-number">0</span>]);
    })
    .on(<span class="hljs-string">"error"</span>, <span class="hljs-function"><span class="hljs-params">error</span> =&gt;</span> <span class="hljs-built_in">console</span>.error(error));
dispatcher.setVolumeLogarithmic(serverQueue.volume / <span class="hljs-number">5</span>);
serverQueue.textChannel.send(<span class="hljs-string">`Start playing: **<span class="hljs-subst">${song.title}</span>**`</span>);
</code></pre>
<p>Here, you created a stream and passed it the URL of our song. You also added two listeners that handle the end and error events.</p>
<p><strong>Note:</strong> This is a recursive function which means that it calls itself over and over again. We're using recursion so it plays the next song when the song is finished.</p>
<p>Now you're ready to play a song by just typing the !play URL in the chat.</p>
<h3 id="heading-how-to-skip-songs-1">How to skip songs</h3>
<p>Now you can implement the skipping functionality. For that, you just need to end the dispatcher you created in your <code>play()</code> function so it starts the next song.</p>
<pre><code class="lang-javascript"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">skip</span>(<span class="hljs-params">message, serverQueue</span>) </span>{
  <span class="hljs-keyword">if</span> (!message.member.voice.channel)
    <span class="hljs-keyword">return</span> message.channel.send(
      <span class="hljs-string">"You have to be in a voice channel to stop the music!"</span>
    );
  <span class="hljs-keyword">if</span> (!serverQueue)
    <span class="hljs-keyword">return</span> message.channel.send(<span class="hljs-string">"There is no song that I could skip!"</span>);
  serverQueue.connection.dispatcher.end();
}
</code></pre>
<p>Here, you're checking if the user that typed the command is in a voice channel and if there is a song to skip.</p>
<h3 id="heading-how-to-stop-songs-1">How to stop songs</h3>
<p>The <code>stop()</code> function is almost the same as <code>skip()</code>, except that you clear the songs array which will make your bot delete the queue and leave the voice chat.</p>
<pre><code class="lang-javascript"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">stop</span>(<span class="hljs-params">message, serverQueue</span>) </span>{
  <span class="hljs-keyword">if</span> (!message.member.voice.channel)
    <span class="hljs-keyword">return</span> message.channel.send(
      <span class="hljs-string">"You have to be in a voice channel to stop the music!"</span>
    );
  serverQueue.songs = [];
  serverQueue.connection.dispatcher.end();
}
</code></pre>
<h3 id="heading-complete-source-code-for-the-indexjs-1">Complete source code for the index.js:</h3>
<p>Here you can get the complete source code for the music bot:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> Discord = <span class="hljs-built_in">require</span>(<span class="hljs-string">"discord.js"</span>);
<span class="hljs-keyword">const</span> { prefix, token } = <span class="hljs-built_in">require</span>(<span class="hljs-string">"./config.json"</span>);
<span class="hljs-keyword">const</span> ytdl = <span class="hljs-built_in">require</span>(<span class="hljs-string">"ytdl-core"</span>);

<span class="hljs-keyword">const</span> client = <span class="hljs-keyword">new</span> Discord.Client();

<span class="hljs-keyword">const</span> queue = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Map</span>();

client.once(<span class="hljs-string">"ready"</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Ready!"</span>);
});

client.once(<span class="hljs-string">"reconnecting"</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Reconnecting!"</span>);
});

client.once(<span class="hljs-string">"disconnect"</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Disconnect!"</span>);
});

client.on(<span class="hljs-string">"message"</span>, <span class="hljs-keyword">async</span> message =&gt; {
  <span class="hljs-keyword">if</span> (message.author.bot) <span class="hljs-keyword">return</span>;
  <span class="hljs-keyword">if</span> (!message.content.startsWith(prefix)) <span class="hljs-keyword">return</span>;

  <span class="hljs-keyword">const</span> serverQueue = queue.get(message.guild.id);

  <span class="hljs-keyword">if</span> (message.content.startsWith(<span class="hljs-string">`<span class="hljs-subst">${prefix}</span>play`</span>)) {
    execute(message, serverQueue);
    <span class="hljs-keyword">return</span>;
  } <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (message.content.startsWith(<span class="hljs-string">`<span class="hljs-subst">${prefix}</span>skip`</span>)) {
    skip(message, serverQueue);
    <span class="hljs-keyword">return</span>;
  } <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (message.content.startsWith(<span class="hljs-string">`<span class="hljs-subst">${prefix}</span>stop`</span>)) {
    stop(message, serverQueue);
    <span class="hljs-keyword">return</span>;
  } <span class="hljs-keyword">else</span> {
    message.channel.send(<span class="hljs-string">"You need to enter a valid command!"</span>);
  }
});

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">execute</span>(<span class="hljs-params">message, serverQueue</span>) </span>{
  <span class="hljs-keyword">const</span> args = message.content.split(<span class="hljs-string">" "</span>);

  <span class="hljs-keyword">const</span> voiceChannel = message.member.voice.channel;
  <span class="hljs-keyword">if</span> (!voiceChannel)
    <span class="hljs-keyword">return</span> message.channel.send(
      <span class="hljs-string">"You need to be in a voice channel to play music!"</span>
    );
  <span class="hljs-keyword">const</span> permissions = voiceChannel.permissionsFor(message.client.user);
  <span class="hljs-keyword">if</span> (!permissions.has(<span class="hljs-string">"CONNECT"</span>) || !permissions.has(<span class="hljs-string">"SPEAK"</span>)) {
    <span class="hljs-keyword">return</span> message.channel.send(
      <span class="hljs-string">"I need the permissions to join and speak in your voice channel!"</span>
    );
  }

  <span class="hljs-keyword">const</span> songInfo = <span class="hljs-keyword">await</span> ytdl.getInfo(args[<span class="hljs-number">1</span>]);
  <span class="hljs-keyword">const</span> song = {
    <span class="hljs-attr">title</span>: songInfo.title,
    <span class="hljs-attr">url</span>: songInfo.video_url
  };

  <span class="hljs-keyword">if</span> (!serverQueue) {
    <span class="hljs-keyword">const</span> queueContruct = {
      <span class="hljs-attr">textChannel</span>: message.channel,
      <span class="hljs-attr">voiceChannel</span>: voiceChannel,
      <span class="hljs-attr">connection</span>: <span class="hljs-literal">null</span>,
      <span class="hljs-attr">songs</span>: [],
      <span class="hljs-attr">volume</span>: <span class="hljs-number">5</span>,
      <span class="hljs-attr">playing</span>: <span class="hljs-literal">true</span>
    };

    queue.set(message.guild.id, queueContruct);

    queueContruct.songs.push(song);

    <span class="hljs-keyword">try</span> {
      <span class="hljs-keyword">var</span> connection = <span class="hljs-keyword">await</span> voiceChannel.join();
      queueContruct.connection = connection;
      play(message.guild, queueContruct.songs[<span class="hljs-number">0</span>]);
    } <span class="hljs-keyword">catch</span> (err) {
      <span class="hljs-built_in">console</span>.log(err);
      queue.delete(message.guild.id);
      <span class="hljs-keyword">return</span> message.channel.send(err);
    }
  } <span class="hljs-keyword">else</span> {
    serverQueue.songs.push(song);
    <span class="hljs-keyword">return</span> message.channel.send(<span class="hljs-string">`<span class="hljs-subst">${song.title}</span> has been added to the queue!`</span>);
  }
}

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">skip</span>(<span class="hljs-params">message, serverQueue</span>) </span>{
  <span class="hljs-keyword">if</span> (!message.member.voice.channel)
    <span class="hljs-keyword">return</span> message.channel.send(
      <span class="hljs-string">"You have to be in a voice channel to stop the music!"</span>
    );
  <span class="hljs-keyword">if</span> (!serverQueue)
    <span class="hljs-keyword">return</span> message.channel.send(<span class="hljs-string">"There is no song that I could skip!"</span>);
  serverQueue.connection.dispatcher.end();
}

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">stop</span>(<span class="hljs-params">message, serverQueue</span>) </span>{
  <span class="hljs-keyword">if</span> (!message.member.voice.channel)
    <span class="hljs-keyword">return</span> message.channel.send(
      <span class="hljs-string">"You have to be in a voice channel to stop the music!"</span>
    );
  serverQueue.songs = [];
  serverQueue.connection.dispatcher.end();
}

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">play</span>(<span class="hljs-params">guild, song</span>) </span>{
  <span class="hljs-keyword">const</span> serverQueue = queue.get(guild.id);
  <span class="hljs-keyword">if</span> (!song) {
    serverQueue.voiceChannel.leave();
    queue.delete(guild.id);
    <span class="hljs-keyword">return</span>;
  }

  <span class="hljs-keyword">const</span> dispatcher = serverQueue.connection
    .play(ytdl(song.url))
    .on(<span class="hljs-string">"finish"</span>, <span class="hljs-function">() =&gt;</span> {
      serverQueue.songs.shift();
      play(guild, serverQueue.songs[<span class="hljs-number">0</span>]);
    })
    .on(<span class="hljs-string">"error"</span>, <span class="hljs-function"><span class="hljs-params">error</span> =&gt;</span> <span class="hljs-built_in">console</span>.error(error));
  dispatcher.setVolumeLogarithmic(serverQueue.volume / <span class="hljs-number">5</span>);
  serverQueue.textChannel.send(<span class="hljs-string">`Start playing: **<span class="hljs-subst">${song.title}</span>**`</span>);
}

client.login(token);
</code></pre>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>You made it all the way until the end! Hope that this article helped you understand the Discord API and how you can use it to create a simple bot. </p>
<p>If you want to see an example of a more advanced discord bot, you can visit my <a target="_blank" href="https://github.com/TannerGabriel/discord-bot">GitHub repository</a>.</p>
<p>If you have found this useful, please consider recommending and sharing it with other fellow developers.</p>
<p>If you have any questions or feedback, let me know and I'd be happy to help.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Idempotence in HTTP Methods – Explained with CRUD Examples ]]>
                </title>
                <description>
                    <![CDATA[ Idempotence refers to a program's ability to maintain a particular result even after repeated actions.  For example, let's say you have a button that only opens a door when pressed. This button does not have the ability to close the door, so it stays... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/idempotency-in-http-methods/</link>
                <guid isPermaLink="false">66c3742fad70110156766fe2</guid>
                
                    <category>
                        <![CDATA[ http ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Web Development ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Yemi Ojedapo ]]>
                </dc:creator>
                <pubDate>Fri, 22 Dec 2023 21:19:43 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/12/pexels-robert-lens-10382808.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Idempotence refers to a program's ability to maintain a particular result even after repeated actions. </p>
<p>For example, let's say you have a button that only opens a door when pressed. This button does not have the ability to close the door, so it stays open even when it's pressed repeatedly. It simply remains in the state it was changed to by the first press.</p>
<p>This same logic applies to HTTP methods that are idempotent. Operating on idempotent HTTP methods repeatedly won't have any additional effect beyond the initial execution. </p>
<p>Understanding idempotence is important for maintaining the consistency of HTTP methods and API design. Idempotence has a significant impact on API design, as it influences how API endpoints should behave when processing requests from clients. </p>
<p>In this tutorial, I'll explain the concept of idempotence and the role it plays in building robust and functional APIs. You'll also learn about what safe methods are, how they relate to idempotence, and how to implement idempotency in non-idempotent methods. </p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before understanding and implementing idempotence in API design, it's essential to have a solid foundation in the following areas:</p>
<ul>
<li>RESTful Principles</li>
<li>Fundamentals of HTTP methods</li>
<li>API Development </li>
<li>HTTP Status codes</li>
<li>Basics of Web development.</li>
</ul>
<h2 id="heading-idempotence-example">Idempotence Example</h2>
<p>Let's start off with an example of idempotence in action. We'll create a function that uses the DELETE method to delete data from a web page:</p>
<pre><code class="lang-python">
<span class="hljs-keyword">from</span> flask <span class="hljs-keyword">import</span> Flask, jsonify, abort

app = Flask(__name__)

web_page_data = [
   {<span class="hljs-string">"id"</span>: <span class="hljs-number">1</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"Row 1 data"</span>},
   {<span class="hljs-string">"id"</span>: <span class="hljs-number">2</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"Row 2 data"</span>},
   <span class="hljs-comment"># Add more rows as needed</span>
]

<span class="hljs-meta">@app.route('/delete_row/&lt;int:row_id&gt;', methods=['DELETE'])</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">delete_row</span>(<span class="hljs-params">row_id</span>):</span>
   <span class="hljs-comment"># Find the row to delete</span>
   row_to_delete = next((row <span class="hljs-keyword">for</span> row <span class="hljs-keyword">in</span> web_page_data <span class="hljs-keyword">if</span> row[<span class="hljs-string">"id"</span>] == row_id), <span class="hljs-literal">None</span>)

   <span class="hljs-keyword">if</span> row_to_delete:
       <span class="hljs-comment"># Simulate deletion</span>
       web_page_data.remove(row_to_delete)
       <span class="hljs-keyword">return</span> jsonify({<span class="hljs-string">"message"</span>: <span class="hljs-string">f"Row <span class="hljs-subst">{row_id}</span> deleted successfully."</span>}), <span class="hljs-number">200</span>
   <span class="hljs-keyword">else</span>:
       abort(<span class="hljs-number">404</span>, description=<span class="hljs-string">f"Row <span class="hljs-subst">{row_id}</span> not found."</span>)

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">'__main__'</span>:
   app.run(debug=<span class="hljs-literal">True</span>)
</code></pre>
<p>This function is expected to delete the rows chosen by the user. Now because of the idempotent nature of the DELETE method, the data will be deleted once, even when called repeatedly. But subsequent calls will return a 404 error since the data has already been deleted by the first call.  </p>
<p>Let’s look at another example with the GET method. The GET method is used to retrieve data from a resource. Let’s create a function that uses the GET method to retrieve a username:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> requests

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_username</span>():</span>
    url = <span class="hljs-string">'https://api.example.com/get_username'</span>
    <span class="hljs-keyword">try</span>:
        response = requests.get(url)
        <span class="hljs-keyword">if</span> response.status_code == <span class="hljs-number">200</span>:
            <span class="hljs-keyword">return</span> response.json()[<span class="hljs-string">'username'</span>]
        <span class="hljs-keyword">else</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>
    <span class="hljs-keyword">except</span> requests.RequestException <span class="hljs-keyword">as</span> e:
        print(<span class="hljs-string">f"Error occurred: <span class="hljs-subst">{e}</span>"</span>)
        <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

<span class="hljs-comment"># Usage</span>
username = get_username()
<span class="hljs-keyword">if</span> username:
    print(<span class="hljs-string">f"The username is: <span class="hljs-subst">{username}</span>"</span>)
<span class="hljs-keyword">else</span>:
    print(<span class="hljs-string">"Failed to retrieve the username."</span>)
</code></pre>
<p>In this example, we define the <code>get_username()</code> function, which sends a GET request to the API endpoint to retrieve the username. If the request is successful, we extract the username from the JSON response and return it. But if any error occurs during the request, we handle it and return <code>None</code>.</p>
<p>Now the idempotent nature of the GET method ensures that even if you call <code>get_username()</code> multiple times, the same username will be fetched from the API each time. The result will always be the same which is to fetch the username from the resource.</p>
<h3 id="heading-idempotent-vs-non-idempotent-http-methods">Idempotent vs. Non-Idempotent HTTP Methods:</h3>
<p>HTTP methods play crucial roles in determining how data is fetched, modified, or created when interacting with APIs. And Idempotency is one of the important concepts that influences data consistency and reliability in the methods used . </p>
<p>Here's a breakdown of the different methods based on their idempotency.</p>
<h4 id="heading-idempotent-methods">Idempotent methods:</h4>
<ul>
<li>GET</li>
<li>HEAD</li>
<li>PUT</li>
<li>DELETE</li>
<li>OPTIONS</li>
<li>TRACE</li>
</ul>
<h4 id="heading-non-idempotent-methods">Non-idempotent methods:</h4>
<ul>
<li>POST</li>
<li>PATCH</li>
<li>CONNECT</li>
</ul>
<h2 id="heading-safe-methods">Safe Methods</h2>
<p>In our previous example, we used the GET method to retrieve a username and this had no side effect on the server. This is because it is a safe method. </p>
<p>A safe method is a type of method that doesn’t modify the server’s state or the resource being accessed. In other words, they perform read-only operations used to retrieve data or for resource representation.</p>
<p>When you make a request using a safe method, the server does not perform any operations that modify the resource's state. Like in our previous example, we retrieved the username from the webpage which is the resource without changing anything in the server. </p>
<p>All safe methods are automatically idempotent, but not all idempotent methods are safe. This is because while idempotent methods produce consistent results when called repeatedly, some of them may still modify the server's state or the resource being accessed.</p>
<p>Like in our first example, the DELETE method is idempotent, because deleting a resource multiple times will have the same effect. But it's not safe, as it changes the server's state by removing the resource.</p>
<p>Here’s a classification of HTTP methods based on their safe status:</p>
<h4 id="heading-safe-methods-1">Safe methods:</h4>
<ul>
<li>GET</li>
<li>OPTIONS</li>
<li>HEAD</li>
</ul>
<h4 id="heading-unsafe-methods">Unsafe methods:</h4>
<ul>
<li>DELETE</li>
<li>POST</li>
<li>PUT</li>
<li>PATCH</li>
</ul>
<h3 id="heading-why-is-post-not-idempotent">Why is POST not idempotent?</h3>
<p>POST is an HTTP method that sends information to a server. When you make a POST request, you typically submit data to create a new resource or trigger a server-side action. Therefore, making the same request multiple times can result in different outcomes and side effects on the server. This can lead to duplicated data, starting server resources, and reducing performance because of the repeated action.</p>
<p>Unlike idempotent methods like GET, PUT, and DELETE, which have consistent results regardless of repetition, POST requests can cause changes to the server's state with each invocation. </p>
<p>POST requests often create new resources on the server. Repeating the same POST request will generate multiple identical resources, potentially leading to duplication.</p>
<p>This is similar to DELETE which is an idempotent method but not a safe method. Deleting the last entry in a collection using a single DELETE request would be considered idempotent. But if a developer creates a function that deletes the last entry, that would trigger DELETE multiple times. Subsequent DELETE calls would have different effects, as each one removes a unique entry. This would be considered non-idempotent.</p>
<h2 id="heading-how-to-achieve-idempotency-with-non-idempotent-methods">How to Achieve Idempotency with Non-Idempotent Methods</h2>
<p>Idempotency isn't only a property inherent to certain methods – it can also be implemented as a feature of a non-idempotent method.  </p>
<p>Here are some techniques to achieve idempotency even with non-idempotent methods.</p>
<h3 id="heading-unique-identifiers">Unique Identifiers</h3>
<p>Adding unique identifiers to every request is one of the most common techniques used to implement idempotency. It works by tracking whether the operation has already been performed or not. If it's a duplicate (a repeat request), the server knows it's already dealt with that request and simply ignores it, ensuring that no side effects occur. </p>
<p>Here's an example of how it works:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> uuid <span class="hljs-keyword">import</span> uuid4

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">process_order</span>(<span class="hljs-params">unique_id, order_data</span>):</span>
    <span class="hljs-keyword">if</span> Order.objects.filter(unique_id=unique_id).exists():
        <span class="hljs-keyword">return</span> HttpResponse(status=<span class="hljs-number">409</span>)  <span class="hljs-comment"># Conflict</span>
    order = Order.objects.create(unique_id=unique_id, **order_data)
    <span class="hljs-keyword">return</span> HttpResponse(status=<span class="hljs-number">201</span>, content_type=<span class="hljs-string">"application/json"</span>)

<span class="hljs-comment"># Example usage</span>
post_data = {<span class="hljs-string">"products"</span>: [...]}
headers = {<span class="hljs-string">"X-Unique-ID"</span>: str(uuid4())}
requests.post(<span class="hljs-string">"https://api.example.com/orders"</span>, data=post_data, headers=headers)
</code></pre>
<p>In this code snippet, we define a function called <code>process_order</code> that creates orders in an API, using unique identifiers to implement idempotency. </p>
<p>Here's a breakdown of the code:</p>
<h4 id="heading-importing-the-unique-identifier-generator">Importing the Unique Identifier Generator:</h4>
<p><code>from uuid import uuid4</code>: The code snippet starts by importing the <code>uuid4</code> function from the <code>uuid</code> module. This function generates unique identifiers, which are used to achieve idempotency in this code.</p>
<h4 id="heading-defining-the-processorder-function">Defining the <code>process_order</code> Function:</h4>
<p><code>def process_order(unique_id, order_data)</code>: This line defines a function named process_order that takes two arguments:</p>
<ul>
<li><code>unique_id</code>: This is a string representing a unique identifier for the request. This ensures no duplicate orders are created with the same identifier.</li>
<li><code>order_data</code>: This is a dictionary containing the actual order data, like product information and customer details.</li>
</ul>
<h4 id="heading-checking-for-existing-orders">Checking for Existing Orders:</h4>
<p><code>if Order.objects.filter(unique_id=unique_id).exists()</code>: This line checks if an order with the same unique_id already exists in the database.</p>
<p><code>Order.objects.filter(unique_id=unique_id).exists()</code> queries the Order model for orders with the matching unique_id and checks if any orders were found in the query result. If an order is found, it means the same request was already processed.</p>
<h4 id="heading-handling-existing-orders">Handling existing orders:</h4>
<p><code>return HttpResponse(status=409)</code>: If an order with the same unique_id already exists, the function immediately returns an HTTP response with status code 409 indicating a conflict. This prevents duplicate orders from being created.</p>
<h4 id="heading-creating-a-new-order-if-unique">Creating a new order (if unique):</h4>
<p><code>order = Order.objects.create(unique_id=unique_id, **order_data )</code>: This line only runs if no existing order is found.</p>
<p><code>Order.objects.create:</code> creates a new object in the Order model.</p>
<p><code>unique_id=unique_id</code> sets the unique_id attribute of the new order to the provided unique_id.</p>
<p><code>order_data</code>: spreads the dictionary order_data as keyword arguments to the order model's constructor, setting other relevant attributes like products and customer information.</p>
<h4 id="heading-sending-a-success-response">Sending a success response:</h4>
<p><code>return HttpResponse(status=201, content_type="application/json")</code>: If the order creation is successful, the function will return an HTTP response with status code 201 which shows a successful creation. It also specifies the response content type as JSON, assuming the order data might be returned in JSON format.</p>
<p><code>post_data = {"products": [...]}</code>: an example request, defines a dictionary containing the actual order data, like a list of products.</p>
<p><code>headers = {"X-Unique-ID": str(uuid4())}</code>: This line creates a dictionary containing a custom header named X-Unique-ID. It generates a unique identifier string using uuid4() and adds it to the header.</p>
<p><code>requests.post("https://api.example.com/orders", data=post_data, headers=headers</code>): This line sends a POST request to the API endpoint <code>https://api.example.com/orders</code>  with the provided <code>post_data</code> and headers.</p>
<h4 id="heading-how-does-this-implement-idempotence">How does this implement idempotence?</h4>
<p>It does so by using a unique identifier <code>(unique_id)</code> for each order. </p>
<p>It checks if an order with the same identifier already exists in the database. If it returns true, it returns a 409 Conflict status. Otherwise, it creates a new order and responds with a 201 Created status. The unique identifier prevents duplicate orders, making the system idempotent.</p>
<h3 id="heading-token-based-authorization">Token-based Authorization</h3>
<p>Token-based authorization is a form of authorization that assigns temporary tokens for each non-idempotent action. Once the action is completed, the token is invalidated. If the same request comes again with the same token, the server recognizes it as invalid and refuses the request, thereby preventing duplicate actions.</p>
<pre><code class="lang-javascript"><span class="hljs-comment">// Generate a unique token for this action</span>
<span class="hljs-keyword">const</span> token = generateToken();

fetch(<span class="hljs-string">"https://api.example.com/create-user"</span>, {
    <span class="hljs-attr">method</span>: <span class="hljs-string">"POST"</span>,
    <span class="hljs-attr">body</span>: <span class="hljs-built_in">JSON</span>.stringify({ username, password }),
    <span class="hljs-attr">headers</span>: {
        <span class="hljs-attr">Authorization</span>: <span class="hljs-string">`Bearer <span class="hljs-subst">${token}</span>`</span>,
        <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span>,
    },
})
    .then(<span class="hljs-function"><span class="hljs-params">response</span> =&gt;</span> {
        <span class="hljs-comment">// Handle successful response</span>
        <span class="hljs-keyword">if</span> (response.ok) {
            <span class="hljs-comment">// Do something with the successful response</span>
        } <span class="hljs-keyword">else</span> {
            <span class="hljs-comment">// Handle non-successful response</span>
        }
    })
    .catch(<span class="hljs-function"><span class="hljs-params">error</span> =&gt;</span> {
        <span class="hljs-comment">// Handle error</span>
        <span class="hljs-built_in">console</span>.error(<span class="hljs-string">"Error occurred:"</span>, error);
    })
    .finally(<span class="hljs-function">() =&gt;</span> {
        <span class="hljs-comment">// Invalidate token after successful action or in case of an error</span>
        invalidateToken(token);
    });

<span class="hljs-comment">// Simple implementation for generating a token</span>
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">generateToken</span>(<span class="hljs-params"></span>) </span>{
    <span class="hljs-keyword">return</span> <span class="hljs-built_in">Math</span>.random().toString(<span class="hljs-number">36</span>).substr(<span class="hljs-number">2</span>);
}

<span class="hljs-comment">// Simple implementation for invalidating a token</span>
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">invalidateToken</span>(<span class="hljs-params">token</span>) </span>{
    <span class="hljs-comment">// Add your logic to invalidate the token, e.g., remove it from storage</span>
}
</code></pre>
<p>Here's a breakdown of the code:</p>
<h4 id="heading-generating-a-unique-token">Generating a unique token:</h4>
<p><code>const token = generateToken()</code>: This line calls a function named <code>generateToken()</code> (which is assumed to be defined elsewhere) that generates a unique token string. This token will be used for authorization and idempotency.</p>
<h4 id="heading-sending-the-post-request">Sending the <code>POST</code> request:</h4>
<p><code>fetch("https://api.example.com/create-user", { ... })</code>: This line uses the fetch API to send a POST request to the API endpoint <code>https://api.example.com/create-user</code>. </p>
<p><code>method: "POST"</code>: This specifies the HTTP method as POST, indicating the intention to create a new user.</p>
<p><code>body: JSON.stringify({ username, password })</code>: This defines the request body with user details like username and password. The data is converted to JSON format before sending.</p>
<p><code>headers: { Authorization:Bearer ${token}}</code>: This sets the Authorization header in the request. The header value includes the generated token prefixed with "Bearer ".</p>
<h4 id="heading-handling-the-response">Handling the Response:</h4>
<p><code>.then(response =&gt; { ... })</code>: This block defines the code to execute if the request is successful. You would handle things like storing user information or redirecting the user upon successful user creation.</p>
<p><code>.catch(error =&gt; { ... }):</code> This block defines the code to execute if the request encounters an error. You would handle any error messages or handle specific error scenarios here.</p>
<h4 id="heading-invalidating-the-token">Invalidating the Token:</h4>
<p><code>invalidateToken(token)</code>: This line calls a function named <code>invalidateToken(token)</code> ( which is assumed to be defined elsewhere) which would likely mark the used token as invalid. This ensures the same token cannot be used for subsequent requests, adding to the idempotency guarantee.</p>
<h4 id="heading-how-does-this-implement-idempotence-1">How does this implement Idempotence?</h4>
<p>This code snippet uses token-based authorization to implement idempotency in a POST request to create a user on an API. If a user creation request is accidentally sent multiple times, a new unique token is generated each time and used in the Authorization header.</p>
<p>The API server can recognize and verify the unique token, and since the user creation action has already been performed (assuming it's successful the first time), it won't create duplicate users due to subsequent identical requests.</p>
<h3 id="heading-etag-header">ETag Header:</h3>
<p>An ETag header (Entity Tag) is an HTTP header used for web cache validation and conditional requests. It is mainly used for  PUT requests, that only update resources if they haven't changed since the last check.</p>
<p>When you want to update a resource, the server sends you its ETag which is then included in your PUT request along with the updated data. If the ETag hasn't changed (meaning the resource remains the same), the server accepts the update. But if the ETag has changed, the server rejects the update, preventing it from overwriting someone else's changes.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">update_article</span>(<span class="hljs-params">article_id, content</span>):</span>
    <span class="hljs-comment"># Get existing article and its ETag</span>
    article = Article.objects.get(pk=article_id)
    etag = article.etag

    <span class="hljs-comment"># Check if ETag matches with request header</span>
    <span class="hljs-keyword">if</span> request.headers.get(<span class="hljs-string">"If-Match"</span>) != etag:
        <span class="hljs-keyword">return</span> HttpResponse(status=<span class="hljs-number">409</span>)  <span class="hljs-comment"># Conflict</span>

    <span class="hljs-comment"># Update article content and generate new ETag</span>
    article.content = content
    article.save()
    new_etag = article.etag

    <span class="hljs-comment"># Return success response with updated ETag</span>
    <span class="hljs-keyword">return</span> HttpResponse(status=<span class="hljs-number">200</span>, content_type=<span class="hljs-string">"text/plain"</span>, content=new_etag)
</code></pre>
<p>In this code snippet, we define a function called <code>update_article</code> that allows you to update the content of an existing article based on its ID and new content. It implements idempotency using the ETag header technique.</p>
<p>Here's a step-by-step explanation of how it works;</p>
<h4 id="heading-getting-the-existing-article-and-its-etag">Getting the Existing Article and its ETag:</h4>
<p><code>article = Article.objects.get(pk=article_id):</code> This line fetches the article with the provided article_id from the database using the Article model.</p>
<p><code>etag = article.etag:</code> This line extracts the ETag value from the retrieved article object. The ETag serves as a unique identifier for the article's current state.</p>
<h4 id="heading-checking-for-a-match">Checking for a Match:</h4>
<p><code>if request.headers.get("If-Match") != etag:</code> This line checks if the ETag header provided in the request matches the ETag of the retrieved article.</p>
<p><code>return HttpResponse(status=409)</code>: If the ETag doesn't match, it indicates that the article might have been updated by another request since the client retrieved its information. The function returns a 409 Conflict response, which prevents accidental data corruption.</p>
<h4 id="heading-updating-the-article-content-and-generating-a-new-etag">Updating the Article Content and generating a new ETag:</h4>
<p><code>article.content = content:</code> This line updates the article's content with the new content received in the request.</p>
<p><code>article.save():</code> This line saves the updated article back to the database.</p>
<p><code>new_etag = article.etag:</code> This line retrieves the new ETag generated for the updated article after saving it.</p>
<h4 id="heading-returning-the-success-response-with-the-new-etag">Returning the Success Response with the new ETag:</h4>
<p><code>return HttpResponse(status=200, content_type="text/plain", content=new_etag)</code>: returns a successful 200 OK response, including the content type ("text/plain") and the updated ETag of the article in the response body.</p>
<h4 id="heading-how-does-this-implement-idempotence-2">How does this implement idempotence?</h4>
<p>This code ensures that if the same update request is sent multiple times with the same ETag, the update will only be performed once, preventing duplicate updates and maintaining data consistency. The new ETag is then provided in the response to help the client keep track of the article's state for future interactions.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this tutorial, we highlighted the difference between safe methods like GET, which retrieves data without side effects, and non-idempotent methods like POST, which can have different outcomes with each repetition. </p>
<p>We also explored techniques you can apply to achieve idempotence in non-idempotent methods, emphasizing the importance of designing APIs that prioritize consistency and reliability.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Movie Recommendation System Based on Collaborative Filtering ]]>
                </title>
                <description>
                    <![CDATA[ By Jess Wilk In today’s world of technology, we get more recommendations from Artificial Intelligence models than from our friends.  Surprised? Think of the content you see and the apps you use daily. We get product recommendations on Amazon, clothin... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-movie-recommendation-system-based-on-collaborative-filtering/</link>
                <guid isPermaLink="false">66d45f6573634435aafcefac</guid>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Wed, 29 Nov 2023 15:45:18 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/08/pexels-nathan-engel-50858-436413.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Jess Wilk</p>
<p>In today’s world of technology, we get more recommendations from Artificial Intelligence models than from our friends. </p>
<p>Surprised? Think of the content you see and the apps you use daily. We get product recommendations on Amazon, clothing recommendations on Myntra, and movie suggestions on Netflix based on our past preferences, purchases, and so on. </p>
<p>Have you ever wondered what’s under the hood? The answer is machine learning-powered Recommender systems. Recommender systems are machine learning algorithms developed using historical data and social media information to find products personalized to our preferences. </p>
<p>In this article, I’ll walk you through the different types of ML methods for building a recommendation system and focus on the <strong>collaborative filtering method</strong>. We will obtain a sample dataset and create a collaborative filtering recommender system step by step. </p>
<p>Make sure to grab a cup of cappuccino (or whatever is your beverage of choice) and get ready!</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before we embark on this journey, you should have a basic understanding of machine learning concepts and familiarity with Python programming. Knowledge of data processing and experience with libraries like Pandas, NumPy, and Scikit-learn will also be beneficial. </p>
<p>If you're new to these topics, you can check out the <a target="_blank" href="https://hyperskill.org/tracks/28">Introduction to Data Science</a> course on Hyperskill, where I contribute as an expert.</p>
<h2 id="heading-different-types-of-recommendation-systems">Different Types of Recommendation Systems</h2>
<p>You'll probably agree that there is more than one way to decide what to suggest or recommend when a friend asks our opinion. This applies to AI, too! </p>
<p>In machine learning, two primary methods of building recommendation engines are Content-based and Collaborative filtering methods.</p>
<p>When using the content-based filter method, the suggested products or items are based on what you liked or purchased. This method feeds the machine learning model with historical data such as customer search history, purchase records, and items in their wishlists. The model finds other products that share features similar to your past preferences. </p>
<p>Let’s understand this better with an example of a movie recommendation. Let’s say you saw Inception and gave it a five-star rating. Finding movies of similar themes and genres, like Interstellar and Matrix, and recommending them is called content-based filtering.</p>
<p>Imagine if all the recommendation systems just suggested things based on what you have seen. How would you discover new genres and movies? That’s where the Collaborative filtering method comes in. So what is it?  </p>
<p>Rather than finding similar content, the Collaborative filtering method finds other users and customers similar to you and recommends their choices. The algorithm doesn’t consider the product features as in the case of content-based filtering. </p>
<p>To understand how it works, let’s go back to our example of movie recommendations. The system looks at the movies you've enjoyed and finds other users who liked the same movies. Then, it sees what else these similar users enjoyed and suggests those movies to you. </p>
<p>For example, if you and a friend both love The Shawshank Redemption, and your friend also loves Forrest Gump, the system will recommend Forrest Gump to you, thinking you might share your friend's taste. </p>
<p>In the upcoming sections, I’ll show you how to build a movie recommendation engine using Python based on collaborative filtering.</p>
<p><img src="https://lh7-us.googleusercontent.com/wJ_Zjqr5YvwCMHqnbazh_QBZU6mXFVbtWfk9JoLvvpB5xj9YyuQ-uLAs3wUBMkqhvYGzo4w2ORz9H8qwDm1U97TlLUpjkQDH-8liZE7OUAadKG9rXH18VsIuWqhVKKEnsXfSaJZH3_Hu7lL-Y_cVNuQ" alt="Image" width="600" height="400" loading="lazy">
<em>Learning how to build a movie recommendation engine using Python based on collaborative filtering</em></p>
<h2 id="heading-how-to-prepare-and-process-the-movies-dataset">How to Prepare and Process the Movies Dataset</h2>
<p>The first step of any machine learning project is collecting and preparing the data. As our goal is to build a movie recommendation engine, I have chosen a movie rating dataset. The dataset is publicly available for free on <a target="_blank" href="https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset/">Kaggle</a>.</p>
<p>The dataset has two main files in the format of CSV:</p>
<ol>
<li><em>Ratings.csv</em>: Contains the rating given by each user to each movie they watched</li>
<li>_Movies<em>metadata.csv</em>: Contains information on genre, budget, release date, revenue, and so on for all the movies in the dataset.</li>
</ol>
<p>Let’s first import the Python packages needed to read the CSV files. </p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
</code></pre>
<p>Next, read the <em>Ratings</em> file into Pandas dataframes and look at the columns.</p>
<pre><code class="lang-python">user_ratings_df = pd.read_csv(<span class="hljs-string">"../input/the-movies-dataset/ratings.csv"</span>)
user_ratings_df.head()
</code></pre>
<p><img src="https://lh7-us.googleusercontent.com/gwVTHmk5vVq5272EqnszziBLxG0jUPFZifPyzvWKicWgN8CRf_Qit01kdDwrtcOrkUSJJJkwRPInDb5evsAuk98c1x9CeSWZFEX6yjio8syzg5H5LhpB2UWFq_BAQCNzR5xPwlZWNfv8dsD6CWqsMmM" alt="Image" width="600" height="400" loading="lazy">
<em>Columns in Pandas dataframe</em></p>
<p>The <strong>UserId</strong> column has the unique ID for every customer, and <strong>movieId</strong> has the unique identification number for every movie. The <strong>rating</strong> column contains the rating given by the particular user to the movie out of 5. The <strong>timestamp</strong> column can be dropped, as we won’t need it for our analysis.</p>
<p>Next, let’s read the movie metadata information into a dataframe. Let’s keep only the relevant columns of Movie Title and genre for each MovieID.</p>
<pre><code class="lang-python">movie_metadata = pd.read_csv(<span class="hljs-string">"../input/the-movies-dataset/movies_metadata.csv"</span>)
movie_metadata = movie_names[[<span class="hljs-string">'title'</span>, <span class="hljs-string">'genres'</span>]]
movie_metadata.head()
</code></pre>
<p><img src="https://lh7-us.googleusercontent.com/85sTR6KWMLUBRn7vtlCtLK-pfZzuBdgy13w76iCQ6elqnOhFmSsHk2me6Sh35eAV277VkWKpTWIFy6fL3Bl6T6gvyHwXu8eZ0mK18snH-M78u9sb-CvNGXL25LE9j6d_WzLRgzqEOyl-8C7dLth_tBI" alt="Image" width="600" height="400" loading="lazy">
<em>The columns of Movie Title and genre for each MovieID</em></p>
<p>Next, combine these dataframes on the common column <strong>movieID</strong>.</p>
<pre><code class="lang-pyhton">movie_data = user_ratings_df.merge(movie_metadata, on='movieId')
movie_data.head()
</code></pre>
<p>This dataset can be used for Exploratory Data Analysis. You can find the movie with the top number of ratings, the best rating, and so on. Try it out to better grasp the data you are dealing with.</p>
<h2 id="heading-how-to-build-the-user-item-matrix">How to Build the User-Item Matrix</h2>
<p>Now that our dataset is ready, let's focus on how collaborative-based filtering works. The machine learning algorithm aims to discover user preference patterns used to make recommendations. </p>
<p>One common approach is to use a <strong>user-item matrix</strong>. It involves a large spreadsheet where users are listed on one side and movies on the other. Each cell in the spreadsheet shows if a user likes a particular movie. The system then uses various algorithms to analyze this matrix, find patterns, and generate recommendations.</p>
<p>This matrix leads us to one of the advantages of collaborative filtering: it's excellent at discovering new and unexpected recommendations. Since it's based on user behavior, it can suggest a movie you might never have considered but will probably like.</p>
<p>Let’s create a user-movie rating matrix for our dataset. You can do this using the built-in pivot function of a Pandas dataframe, as shown below. We also use the <strong><code>fillna()</code></strong> method to impute missing or null values with 0.</p>
<pre><code class="lang-python">user_item_matrix = ratings_data.pivot(index=[<span class="hljs-string">'userId'</span>], columns=[<span class="hljs-string">'movieId'</span>], values=<span class="hljs-string">'rating'</span>).fillna(<span class="hljs-number">0</span>)
user_item_matrix
</code></pre>
<p>Here’s our output matrix:</p>
<p><img src="https://lh7-us.googleusercontent.com/pSpOQE0CsFOdRl1Rkf4Udo0FvTz7N7NDEHi82vYkHkZRwXp0cjsfgTW2OubIg1gHOgX27lBTsVExbsJoTO93M9THzmGduM_PulBPTXvv_df6U-bLxUzCXKKDFfVjk5lP8CvphnVglBGwWvNn-neQjEI" alt="Image" width="600" height="400" loading="lazy">
<em>A user-movie rating matrix for our dataset</em></p>
<p>Sometimes, the matrix can be sparse. Sparsity refers to null values. It could significantly increase the amount of computation resources needed. Compressing the sparse matrixes using the <strong>scipy</strong> Python package is recommended when working with a large dataset.</p>
<h2 id="heading-how-to-define-and-train-the-model">How to Define and Train the Model</h2>
<p>You can use multiple machine learning algorithms for collaborative filtering, like <strong>K-nearest neighbors</strong> (KNN) and <strong>SVD</strong>. I’ll be using a KNN model here. </p>
<p>KNN is super straightforward. Picture a giant, colorful board with dots representing different items (like movies). Each dot is close to others that are similar. When you ask KNN for recommendations, it finds the spot of your favorite item on this board and then looks around to see the nearest dots—these are your recommendations. </p>
<p>Now, the metric parameter in KNN is crucial. It's like the ruler the system uses to measure the distance between these dots. The metric used here is Cosine similarity.</p>
<h3 id="heading-what-is-cosine-similarity">What is cosine similarity?</h3>
<p>It is a metric that measures how similar two entities are (like documents or vectors in a multi-dimensional space), irrespective of size. Cosine similarity is widely used in NLP to find similar context words.  </p>
<p>Follow the snippet below to define a KNN model, the metric, and other parameters. The model is fit on the user-item matrix created in the previous section.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Define a KNN model on cosine similarity</span>
cf_knn_model= NearestNeighbors(metric=<span class="hljs-string">'cosine'</span>, algorithm=<span class="hljs-string">'brute'</span>, n_neighbors=<span class="hljs-number">10</span>, n_jobs=<span class="hljs-number">-1</span>)


<span class="hljs-comment"># Fitting the model on our matrix</span>
cf_knn_model.fit(user_item_matrix)
</code></pre>
<p>Next, let's define a function to provide the desired number of movie recommendations, given a movie title as input. The code below finds the closest neighbor data, and points to the input movie name using the KNN algorithm. The input parameters for the function are:</p>
<ol>
<li><code>**n_recs**:</code> Controls the number of final recommendations that we would get as output</li>
<li><code>**Movie_name**:</code> Input movie name, based on which we find new recommendations</li>
<li><code>**Matrix**:</code> The User-Movie Rating matrix</li>
</ol>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">movie_recommender_engine</span>(<span class="hljs-params">movie_name, matrix, cf_model, n_recs</span>):</span>
    <span class="hljs-comment"># Fit model on matrix</span>
    cf_knn_model.fit(matrix)

    <span class="hljs-comment"># Extract input movie ID</span>
    movie_id = process.extractOne(movie_name, movie_names[<span class="hljs-string">'title'</span>])[<span class="hljs-number">2</span>]

    <span class="hljs-comment"># Calculate neighbour distances</span>
    distances, indices = cf_model.kneighbors(matrix[movie_id], n_neighbors=n_recs)
    movie_rec_ids = sorted(list(zip(indices.squeeze().tolist(),distances.squeeze().tolist())),key=<span class="hljs-keyword">lambda</span> x: x[<span class="hljs-number">1</span>])[:<span class="hljs-number">0</span>:<span class="hljs-number">-1</span>]

    <span class="hljs-comment"># List to store recommendations</span>
    cf_recs = []
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> movie_rec_ids:
        cf_recs.append({<span class="hljs-string">'Title'</span>:movie_names[<span class="hljs-string">'title'</span>][i[<span class="hljs-number">0</span>]],<span class="hljs-string">'Distance'</span>:i[<span class="hljs-number">1</span>]})

    <span class="hljs-comment"># Select top number of recommendations needed</span>
    df = pd.DataFrame(cf_recs, index = range(<span class="hljs-number">1</span>,n_recs))

    <span class="hljs-keyword">return</span> df
</code></pre>
<h2 id="heading-how-to-get-recommendations-from-the-model">How to Get Recommendations from the Model</h2>
<p>Let's call our defined function to get movie recommendations. For instance, we can obtain a list of the top 10 recommended movies for someone who is a fan of Batman.</p>
<pre><code class="lang-python">n_recs = <span class="hljs-number">10</span>
movie_recommender_engine(<span class="hljs-string">'Batman'</span>, user_rating_matrix, cf_knn_model, n_recs)
</code></pre>
<p><img src="https://lh7-us.googleusercontent.com/PRRCkFh6z1KyQkE4lDUCf8acQFlCwV9WBBVfiGeG7Fn77dD9412QDW54tCH7On9HXdIR4dLYvyA0zs7LXHgmeLqXHIXgQ3yaMt6g5GGdiT2BHNo1o2IZ56gfg4jfKY86wG_pRB7vKsPg5JLsme9AMig" alt="Image" width="600" height="400" loading="lazy">
<em>A list of the top 10 recommended movies for someone who is a fan of Batman</em></p>
<p>Hurray! We have got the result we needed.</p>
<h2 id="heading-advantages-and-limitations-of-collaborative-filtering">Advantages and Limitations of Collaborative Filtering</h2>
<p>The advantages of this method include:</p>
<ul>
<li><strong>Personalized Recommendations:</strong> Offers tailored suggestions based on user behavior, leading to highly customized experiences.</li>
<li><strong>Diverse Content Discovery:</strong> Capable of recommending a wide range of items, helping users discover content they might not find on their own. It gives diverse content discovery the edge over content-based filtering.</li>
<li><strong>Community Wisdom:</strong> Leverages the collective preferences of users, often leading to more accurate recommendations than individual or content-based analysis alone.</li>
<li><strong>Dynamic Adaptation:</strong> The model continuously gets updated with user interactions, keeping the recommendations relevant and up-to-date.</li>
</ul>
<p>It’s not all sunshine, though. One big challenge is the <em>cold start</em> problem. For example, this happens when new movies or users are added to the system. The system struggles to make accurate recommendations since there's not enough data on these new entries. </p>
<p>Another issue is popularity bias. Popular movies get recommended a lot, overshadowing lesser-known gems. There are also scalability issues that come with managing such a large dataset. </p>
<p>While developing collaborative filtering-based engines, computational expenses and data sparsity must be kept in mind for an efficient process. It’s also recommended to take action to ensure data privacy and security.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Using Collaborative Filtering to build a movie recommendation system significantly advances digital content personalization. This system reflects our preferences and exposes us to a broader range of choices based on similar users' tastes. </p>
<p>Despite its challenges, such as the cold start problem and popularity bias, the benefits of personalized recommendations make it a powerful tool in the machine learning industry. As technology advances, these systems will become even more sophisticated, offering refined and enjoyable user experiences in the digital world.</p>
<p>Thank you for reading! I'm Jess, and I'm an expert at Hyperskill. You can check out an <a target="_blank" href="https://hyperskill.org/tracks/28">Introduction to Data Science</a> course on the platform.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Web Storage Explained – How to Use localStorage and sessionStorage in JavaScript Projects ]]>
                </title>
                <description>
                    <![CDATA[ Web Storage is what the JavaScript API browsers provide for storing data locally and securely within a user’s browser. Session and local storage are the two main types of web storage. They are similar to regular properties objects, but they persist (... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/web-storage-localstorage-vs-sessionstorage-in-javascript/</link>
                <guid isPermaLink="false">66ba0e11256f04965e2bd0c7</guid>
                
                    <category>
                        <![CDATA[ api ]]>
                    </category>
                
                    <category>
                        <![CDATA[ JavaScript ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ storage ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Web Development ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Oluwatobi Sofela ]]>
                </dc:creator>
                <pubDate>Mon, 09 Oct 2023 16:45:31 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/10/web-storage-explained-local-and-session-storage-objects-in-javascript-codesweetly.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p><strong>Web Storage</strong> is what the JavaScript <a target="_blank" href="https://codesweetly.com/application-programming-interface-api-explained">API</a> browsers provide for storing data locally and securely within a user’s browser.</p>
<p>Session and local storage are the two main types of web storage. They are similar to regular <a target="_blank" href="https://codesweetly.com/javascript-properties-object">properties objects</a>, but they persist (do not disappear) when the webpage reloads.</p>
<p>This article aims to show you exactly how the two storage objects work in JavaScript. We will also use a To-Do list exercise to practice using web storage in a web app project.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><a class="post-section-overview" href="#heading-what-is-the-session-storage-object">What is the Session Storage Object?</a></li>
<li><a class="post-section-overview" href="#heading-what-is-the-local-storage-object">What is the Local Storage Object?</a></li>
<li><a class="post-section-overview" href="#heading-how-to-access-the-session-and-local-storage-objects">How to Access the Session and Local Storage Objects</a></li>
<li><a class="post-section-overview" href="#heading-what-are-web-storages-built-in-interfaces">What are Web Storage’s Built-In Interfaces?</a><ul>
<li><a class="post-section-overview" href="#heading-what-is-web-storages-setitem-method">What is web storage’s <code>setItem()</code> method?</a></li>
<li><a class="post-section-overview" href="#heading-what-is-web-storages-key-method">What is web storage’s <code>key()</code> method?</a></li>
<li><a class="post-section-overview" href="#heading-what-is-web-storages-getitem-method">What is web storage’s <code>getItem()</code> method?</a></li>
<li><a class="post-section-overview" href="#heading-what-is-web-storages-length-property">What is web storage’s <code>length</code> property?</a></li>
<li><a class="post-section-overview" href="#heading-what-is-web-storages-removeitem-method">What is web storage’s <code>removeItem()</code> method?</a></li>
<li><a class="post-section-overview" href="#heading-what-is-web-storages-clear-method">What is web storage’s <code>clear()</code> method?</a></li>
</ul>
</li>
<li><a class="post-section-overview" href="#heading-time-to-practice-with-web-storage">Time to Practice with Web Storage 🤸‍♂️🏋️‍♀️</a><ul>
<li><a class="post-section-overview" href="#heading-the-problem">The Problem</a></li>
<li><a class="post-section-overview" href="#heading-your-exercise">Your Exercise</a></li>
<li><a class="post-section-overview" href="#heading-bonus-exercise">Bonus Exercise</a></li>
</ul>
</li>
<li><a class="post-section-overview" href="#heading-how-did-you-go-about-solving-the-web-storage-exercise">How Did You Go About Solving the Web Storage Exercise?</a><ul>
<li><a class="post-section-overview" href="#heading-how-to-prevent-the-session-storage-panes-to-do-items-from-disappearing-on-page-reload">How to prevent the Session Storage pane’s To-Do items from disappearing on page reload</a></li>
<li><a class="post-section-overview" href="#heading-how-to-prevent-the-local-storage-panes-to-do-items-from-disappearing-on-page-reload-or-reopen">How to prevent the Local Storage pane’s To-Do items from disappearing on page reload or reopen</a></li>
<li><a class="post-section-overview" href="#heading-how-to-auto-display-the-session-sections-previously-added-tasks-on-page-reload">How to auto-display the Session section’s previously added tasks on page reload</a></li>
<li><a class="post-section-overview" href="#heading-how-to-auto-display-the-local-sections-previously-added-tasks-on-page-reload-or-reopen">How to auto-display the Local section’s previously added tasks on page reload or reopen</a></li>
<li><a class="post-section-overview" href="#heading-how-to-check-the-total-items-in-the-browsers-session-storage">How to check the total items in the browser’s session storage</a></li>
<li><a class="post-section-overview" href="#heading-how-to-display-the-local-storages-zeroth-index-items-name">How to display the local storage’s zeroth index item’s name</a></li>
<li><a class="post-section-overview" href="#heading-how-to-empty-the-browsers-session-storage">How to empty the browser’s session storage</a></li>
</ul>
</li>
<li><a class="post-section-overview" href="#heading-how-to-continue-practicing-with-web-storage">How to Continue Practicing with Web Storage 🧗‍♀️🚀</a></li>
<li><a class="post-section-overview" href="#heading-web-storage-vs-cookies-what-is-the-difference">Web Storage vs. Cookies: What is the Difference?</a></li>
<li><a class="post-section-overview" href="#heading-wrapping-up">Wrapping up</a></li>
</ol>
<p>Without further ado, let’s discuss session storage.</p>
<h2 id="heading-what-is-the-session-storage-object">What is the Session Storage Object?</h2>
<p>The session storage object (<code>window.sessionStorage</code>) stores data that persists for only one session of an opened tab.</p>
<p>In other words, whatever gets stored in the <code>window.sessionStorage</code> object will not disappear on a reload of the web page. Instead, the computer will delete the stored data only when users close the browser tab or window.</p>
<p><strong>Note the following:</strong></p>
<ul>
<li>The data stored inside the session storage is per-<a target="_blank" href="https://developer.mozilla.org/en-US/docs/Glossary/Origin">origin</a> and per-instance. In other words, <code>http://freecodecamp.com</code>’s <code>sessionStorage</code> object is different from <code>https://freecodecamp.com</code>’s <code>sessionStorage</code> object because the two origins use different <a target="_blank" href="https://codesweetly.com/web-address-url#scheme">schemes</a> (<code>http</code> and <code>https</code>).</li>
<li>Per-instance means per-window or per-tab. In other words, the <code>sessionStorage</code> object’s lifespan expires once users close the instance (window or tab).</li>
<li>Browsers create a unique page session for each new tab or window. Therefore, users can run multiple instances of an app without interfering with each instance’s session storage. (Note: Cookies do not have good support for running multiple instances of the same app. Such an attempt can cause errors such as <a target="_blank" href="https://html.spec.whatwg.org/multipage/webstorage.html#introduction-15">double entry of bookings</a>.)</li>
<li>Session storage is a property of the global <code>Window</code> object. So <code>sessionStorage.setItem()</code> is equivalent to <code>window.sessionStorage.setItem()</code>.</li>
</ul>
<h2 id="heading-what-is-the-local-storage-object">What is the Local Storage Object?</h2>
<p>The local storage object (<code>window.localStorage</code>) stores data that persists even when users close their browser tab (or window).</p>
<p>In other words, whatever gets stored in the <code>window.localStorage</code> object will not disappear during a reload or reopening of the web page or when users close their browsers. Those data have no expiration time. Browsers never clear them automatically.</p>
<p>The computer will delete the <code>window.localStorage</code> object’s content in the following instances only:</p>
<ol>
<li>When the content gets cleared through JavaScript</li>
<li>When the browser’s cache gets cleared</li>
</ol>
<p><strong>Note the following:</strong></p>
<ul>
<li>The <code>window.localStorage</code> object’s storage limit is larger than the <code>window.sessionStorage</code>.</li>
<li>The data stored inside the local storage is per-<a target="_blank" href="https://developer.mozilla.org/en-US/docs/Glossary/Origin">origin</a>. In other words, <code>http://freecodecamp.com</code>’s <code>localStorage</code> object is different from <code>https://freecodecamp.com</code>’s <code>localStorage</code> object because the two origins use different <a target="_blank" href="https://codesweetly.com/web-address-url#scheme">schemes</a> (<code>http</code> and <code>https</code>).</li>
<li>There are inconsistencies with how browsers handle the local storage of documents not served from a web server (for instance, pages with a <code>file:</code> URL scheme). Therefore, the <code>localStorage</code> object may behave differently among browsers when used with non-HTTP URLs, such as <code>file:///document/on/users/local/system.html</code>.</li>
<li>Local storage is a property of the global <code>Window</code> object. Therefore, <code>localStorage.setItem()</code> is equivalent to <code>window.localStorage.setItem()</code>.</li>
</ul>
<h2 id="heading-how-to-access-the-session-and-local-storage-objects">How to Access the Session and Local Storage Objects</h2>
<p>You can access the two web storages by:</p>
<ol>
<li>Using the same technique as you'd use for <a target="_blank" href="https://codesweetly.com/javascript-properties-object#how-to-access-an-objects-value">accessing regular JavaScript objects</a></li>
<li>Using web storage’s built-in interfaces</li>
</ol>
<p>For instance, consider the snippet below:</p>
<pre><code class="lang-js">sessionStorage.bestColor = <span class="hljs-string">"Green"</span>;
sessionStorage[<span class="hljs-string">"bestColor"</span>] = <span class="hljs-string">"Green"</span>;
sessionStorage.setItem(<span class="hljs-string">"bestColor"</span>, <span class="hljs-string">"Green"</span>);
</code></pre>
<p>The three statements above do the same thing—they set <code>bestColor</code>’s value. But the third line is recommended because it uses web storage’s <code>setItem()</code> method.</p>
<p><strong>Tip:</strong> you should prefer using the web storage’s built-in interfaces to avoid <a target="_blank" href="https://2ality.com/2012/01/objects-as-maps.html">the pitfalls of using objects as key/value stores</a>.</p>
<p>Let’s discuss more on the web storage’s built-in interfaces below.</p>
<h2 id="heading-what-are-web-storages-built-in-interfaces">What are Web Storage’s Built-In Interfaces?</h2>
<p>The web storage built-in interfaces are the recommended tools for reading and manipulating a browser’s <code>sessionStorage</code> and <code>localStorage</code> objects.</p>
<p>The six (6) built-in interfaces are:</p>
<ul>
<li><code>setItem()</code></li>
<li><code>key()</code></li>
<li><code>getItem()</code></li>
<li><code>length</code></li>
<li><code>removeItem()</code></li>
<li><code>clear()</code></li>
</ul>
<p>Let’s discuss each one now.</p>
<h3 id="heading-what-is-web-storages-setitem-method">What is web storage’s <code>setItem()</code> method?</h3>
<p>The <code>setItem()</code> method stores its <code>key</code> and <code>value</code> arguments inside the specified web storage object.</p>
<h4 id="heading-syntax-of-the-setitem-method">Syntax of the <code>setItem()</code> method</h4>
<p><code>setItem()</code> accepts two required <a target="_blank" href="https://codesweetly.com/javascript-arguments">arguments</a>. Here is the syntax:</p>
<pre><code class="lang-js">webStorageObject.setItem(key, value);
</code></pre>
<ul>
<li><code>webStorageObject</code> represents the storage object (<code>localStorage</code> or <code>sessionStorage</code>) you wish to manipulate.</li>
<li><code>key</code> is the first argument accepted by <code>setItem()</code>. It is a required string argument representing the name of the web storage property you want to create or update.</li>
<li><code>value</code> is the second argument accepted by <code>setItem()</code>. It is a required string argument specifying the value of the <code>key</code> you are creating or updating.</li>
</ul>
<p><strong>Note:</strong></p>
<ul>
<li>The <code>key</code> and <code>value</code> arguments are always strings.</li>
<li>Suppose you provide an integer as a <code>key</code> or <code>value</code>. In that case, browsers will convert them to strings automatically.</li>
<li><code>setItem()</code> may display an error message if the storage object is full.</li>
</ul>
<h4 id="heading-example-1-how-to-store-data-in-the-session-storage-object">Example 1: How to store data in the session storage object</h4>
<ol>
<li>Invoke <code>sessionStorage</code>’s <code>setItem()</code> method.</li>
<li>Provide the name and value of the data you wish to store.</li>
</ol>
<pre><code class="lang-js"><span class="hljs-comment">// Store color: "Pink" inside the browser's session storage object:</span>
sessionStorage.setItem(<span class="hljs-string">"color"</span>, <span class="hljs-string">"Pink"</span>);

<span class="hljs-comment">// Log the session storage object to the console:</span>
<span class="hljs-built_in">console</span>.log(sessionStorage);

<span class="hljs-comment">// The invocation above will return:</span>
{<span class="hljs-attr">color</span>: <span class="hljs-string">"Pink"</span>}
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/setitem/js-25hgkp"><strong>Try Editing It</strong></a></p>
<p><strong>Note:</strong> Your browser’s session storage may contain additional data if it already uses the storage object to store information.</p>
<h4 id="heading-example-2-how-to-store-data-in-the-local-storage-object">Example 2: How to store data in the local storage object</h4>
<ol>
<li>Invoke <code>localStorage</code>’s <code>setItem()</code> method.</li>
<li>Provide the name and value of the data you wish to store.</li>
</ol>
<pre><code class="lang-js"><span class="hljs-comment">// Store color: "Pink" inside the browser's local storage object:</span>
<span class="hljs-built_in">localStorage</span>.setItem(<span class="hljs-string">"color"</span>, <span class="hljs-string">"Pink"</span>);

<span class="hljs-comment">// Log the local storage object to the console:</span>
<span class="hljs-built_in">console</span>.log(<span class="hljs-built_in">localStorage</span>);

<span class="hljs-comment">// The invocation above will return:</span>
{<span class="hljs-attr">color</span>: <span class="hljs-string">"Pink"</span>}
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/setitem/js-2hluvw"><strong>Try Editing It</strong></a></p>
<p><strong>Note:</strong></p>
<ul>
<li>Your browser’s local storage may contain additional data if it already uses the storage object to store information.</li>
<li>It is best to serialize objects before storing them in local or session storage. Otherwise, the computer will store the object as <code>"[object Object]"</code>.</li>
</ul>
<h4 id="heading-example-3-browsers-use-object-object-for-non-serialized-objects-in-the-web-storage">Example 3: Browsers use <code>"[object Object]"</code> for non-serialized objects in the web storage</h4>
<pre><code class="lang-js"><span class="hljs-comment">// Store myBio object inside the browser's session storage object:</span>
sessionStorage.setItem(<span class="hljs-string">"myBio"</span>, { <span class="hljs-attr">name</span>: <span class="hljs-string">"Oluwatobi"</span> });

<span class="hljs-comment">// Log the session storage object to the console:</span>
<span class="hljs-built_in">console</span>.log(sessionStorage);

<span class="hljs-comment">// The invocation above will return:</span>
{<span class="hljs-attr">myBio</span>: <span class="hljs-string">"[object Object]"</span>, <span class="hljs-attr">length</span>: <span class="hljs-number">1</span>}
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/setitem/js-n8m7hc"><strong>Try Editing It</strong></a></p>
<p>You can see that the computer stored the object as <code>"[object Object]"</code> because we did not serialize it.</p>
<h4 id="heading-example-4-how-to-store-serialized-objects-in-the-web-storage">Example 4: How to store serialized objects in the web storage</h4>
<pre><code class="lang-js"><span class="hljs-comment">// Store myBio object inside the browser's session storage object:</span>
sessionStorage.setItem(<span class="hljs-string">"myBio"</span>, <span class="hljs-built_in">JSON</span>.stringify({ <span class="hljs-attr">name</span>: <span class="hljs-string">"Oluwatobi"</span> }));

<span class="hljs-comment">// Log the session storage object to the console:</span>
<span class="hljs-built_in">console</span>.log(sessionStorage);

<span class="hljs-comment">// The invocation above will return:</span>
{<span class="hljs-attr">myBio</span>: <span class="hljs-string">'{"name":"Oluwatobi"}'</span>, <span class="hljs-attr">length</span>: <span class="hljs-number">1</span>}
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/setitem/js-edfh43"><strong>Try Editing It</strong></a></p>
<p>We used <code>JSON.stringify()</code> to convert the object to JSON before storing it in the web storage.</p>
<p><strong>Tip:</strong> Learn <a target="_blank" href="https://codesweetly.com/json-explained#how-to-convert-a-json-text-to-a-javascript-object">how to convert JSON to JavaScript objects</a>.</p>
<h3 id="heading-what-is-web-storages-key-method">What is web storage’s <code>key()</code> method?</h3>
<p>The <code>key()</code> method retrieves a specified web storage item’s name (key).</p>
<h4 id="heading-syntax-of-the-key-method">Syntax of the <code>key()</code> method</h4>
<p><code>key()</code> accepts one required argument. Here is the syntax:</p>
<pre><code class="lang-js">webStorageObject.key(index);
</code></pre>
<ul>
<li><code>webStorageObject</code> represents the storage object (<code>localStorage</code> or <code>sessionStorage</code>) whose key you wish to get.</li>
<li><code>index</code> is a required argument. It is an <a target="_blank" href="https://codesweetly.com/web-tech-terms-i#integer">integer</a> specifying the <a target="_blank" href="https://codesweetly.com/web-tech-terms-i#index">index</a> of the item whose key you want to get.</li>
</ul>
<h4 id="heading-example-1-how-to-get-the-name-of-an-item-in-the-session-storage-object">Example 1: How to get the name of an item in the session storage object</h4>
<ol>
<li>Invoke <code>sessionStorage</code>’s <code>key()</code> method.</li>
<li>Provide the index of the item whose name you wish to get.</li>
</ol>
<pre><code class="lang-js"><span class="hljs-comment">// Store carColor: "Pink" inside the browser's session storage object:</span>
sessionStorage.setItem(<span class="hljs-string">"carColor"</span>, <span class="hljs-string">"Pink"</span>);

<span class="hljs-comment">// Store pcColor: "Yellow" inside the session storage object:</span>
sessionStorage.setItem(<span class="hljs-string">"pcColor"</span>, <span class="hljs-string">"Yellow"</span>);

<span class="hljs-comment">// Store laptopColor: "White" inside the session storage object:</span>
sessionStorage.setItem(<span class="hljs-string">"laptopColor"</span>, <span class="hljs-string">"White"</span>);

<span class="hljs-comment">// Get the name of the item at index 1:</span>
sessionStorage.key(<span class="hljs-number">1</span>);
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/key/js-tptqtg"><strong>Try Editing It</strong></a></p>
<p><strong>Important:</strong> The <a target="_blank" href="https://en.wikipedia.org/wiki/User_agent">user-agent</a> defines the order of items in the session storage. In other words, <code>key()</code>’s output may vary based on how the user-agent orders the web storage’s items. So you shouldn't rely on <code>key()</code> to return a constant value.</p>
<h4 id="heading-example-2-how-to-get-the-name-of-an-item-in-the-local-storage-object">Example 2: How to get the name of an item in the local storage object</h4>
<ol>
<li>Invoke <code>localStorage</code>’s <code>key()</code> method.</li>
<li>Provide the index of the item whose name you wish to get.</li>
</ol>
<pre><code class="lang-js"><span class="hljs-comment">// Store carColor: "Pink" inside the browser's local storage object:</span>
<span class="hljs-built_in">localStorage</span>.setItem(<span class="hljs-string">"carColor"</span>, <span class="hljs-string">"Pink"</span>);

<span class="hljs-comment">// Store pcColor: "Yellow" inside the local storage object:</span>
<span class="hljs-built_in">localStorage</span>.setItem(<span class="hljs-string">"pcColor"</span>, <span class="hljs-string">"Yellow"</span>);

<span class="hljs-comment">// Store laptopColor: "White" inside the local storage object:</span>
<span class="hljs-built_in">localStorage</span>.setItem(<span class="hljs-string">"laptopColor"</span>, <span class="hljs-string">"White"</span>);

<span class="hljs-comment">// Get the name of the item at index 1:</span>
<span class="hljs-built_in">localStorage</span>.key(<span class="hljs-number">1</span>);
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/key/js-tclrbd"><strong>Try Editing It</strong></a></p>
<p><strong>Important:</strong> The user-agent defines the order of items in the local storage. In other words, <code>key()</code>’s output may vary based on how the user-agent orders the web storage’s items. So you shouldn't rely on <code>key()</code> to return a constant value.</p>
<h3 id="heading-what-is-web-storages-getitem-method">What is web storage’s <code>getItem()</code> method?</h3>
<p>The <code>getItem()</code> method retrieves the value of a specified web storage item.</p>
<h4 id="heading-syntax-of-the-getitem-method">Syntax of the <code>getItem()</code> method</h4>
<p><code>getItem()</code> accepts one required argument. Here is the syntax:</p>
<pre><code class="lang-js">webStorageObject.getItem(key);
</code></pre>
<ul>
<li><code>webStorageObject</code> represents the storage object (<code>localStorage</code> or <code>sessionStorage</code>) whose item you wish to get.</li>
<li><code>key</code> is a required argument. It is a <a target="_blank" href="https://codesweetly.com/javascript-primitive-data-type#string-primitive-data-type">string</a> specifying the name of the web storage <a target="_blank" href="https://codesweetly.com/javascript-properties-object#syntax-of-a-javascript-object">property</a> whose value you want to get.</li>
</ul>
<h4 id="heading-example-1-how-to-get-data-from-the-session-storage-object">Example 1: How to get data from the session storage object</h4>
<ol>
<li>Invoke <code>sessionStorage</code>’s <code>getItem()</code> method.</li>
<li>Provide the name of the data you wish to get.</li>
</ol>
<pre><code class="lang-js"><span class="hljs-comment">// Store color: "Pink" inside the browser's session storage object:</span>
sessionStorage.setItem(<span class="hljs-string">"color"</span>, <span class="hljs-string">"Pink"</span>);

<span class="hljs-comment">// Get color's value from the session storage:</span>
sessionStorage.getItem(<span class="hljs-string">"color"</span>);

<span class="hljs-comment">// The invocation above will return:</span>
<span class="hljs-string">"Pink"</span>
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/getitem/js-xk9auv"><strong>Try Editing It</strong></a></p>
<h4 id="heading-example-2-how-to-get-data-from-the-local-storage-object">Example 2: How to get data from the local storage object</h4>
<ol>
<li>Invoke <code>localStorage</code>’s <code>getItem()</code> method.</li>
<li>Provide the name of the data you wish to get.</li>
</ol>
<pre><code class="lang-js"><span class="hljs-comment">// Store color: "Pink" inside the browser's local storage object:</span>
<span class="hljs-built_in">localStorage</span>.setItem(<span class="hljs-string">"color"</span>, <span class="hljs-string">"Pink"</span>);

<span class="hljs-comment">// Get color's value from the local storage:</span>
<span class="hljs-built_in">localStorage</span>.getItem(<span class="hljs-string">"color"</span>);

<span class="hljs-comment">// The invocation above will return:</span>
<span class="hljs-string">"Pink"</span>
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/getitem/js-terw5e"><strong>Try Editing It</strong></a></p>
<p><strong>Note:</strong> The <code>getItem()</code> method will return <code>null</code> if its argument does not exist in the specified web storage.</p>
<h3 id="heading-what-is-web-storages-length-property">What is web storage’s <code>length</code> property?</h3>
<p>The <code>length</code> property returns the number of <a target="_blank" href="https://codesweetly.com/javascript-properties-object#syntax-of-a-javascript-object">properties</a> in the specified web storage.</p>
<h4 id="heading-syntax-of-the-length-property">Syntax of the <code>length</code> property</h4>
<p>Here is <code>length</code>’s syntax:</p>
<pre><code class="lang-js">webStorageObject.length;
</code></pre>
<p><code>webStorageObject</code> represents the storage object (<code>localStorage</code> or <code>sessionStorage</code>) whose length you wish to verify.</p>
<h4 id="heading-example-1-how-to-verify-the-number-of-items-in-the-session-storage-object">Example 1: How to verify the number of items in the session storage object</h4>
<p>Invoke <code>sessionStorage</code>’s <code>length</code> property.</p>
<pre><code class="lang-js"><span class="hljs-comment">// Store carColor: "Pink" inside the browser's session storage object:</span>
sessionStorage.setItem(<span class="hljs-string">"carColor"</span>, <span class="hljs-string">"Pink"</span>);

<span class="hljs-comment">// Store pcColor: "Yellow" inside the session storage object:</span>
sessionStorage.setItem(<span class="hljs-string">"pcColor"</span>, <span class="hljs-string">"Yellow"</span>);

<span class="hljs-comment">// Store laptopColor: "White" inside the session storage object:</span>
sessionStorage.setItem(<span class="hljs-string">"laptopColor"</span>, <span class="hljs-string">"White"</span>);

<span class="hljs-comment">// Verify the number of items in the session storage:</span>
sessionStorage.length;

<span class="hljs-comment">// The invocation above may return:</span>
<span class="hljs-number">3</span>
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/length/js-zasgst"><strong>Try Editing It</strong></a></p>
<p><strong>Note:</strong> Your <code>sessionStorage.length</code> invocation may return a value greater than <code>3</code> if your browser’s session storage already contains some stored information.</p>
<h4 id="heading-example-2-how-to-verify-the-number-of-items-in-the-local-storage-object">Example 2: How to verify the number of items in the local storage object</h4>
<p>Invoke <code>localStorage</code>’s <code>length</code> property.</p>
<pre><code class="lang-js"><span class="hljs-comment">// Store carColor: "Pink" inside the browser's local storage object:</span>
<span class="hljs-built_in">localStorage</span>.setItem(<span class="hljs-string">"carColor"</span>, <span class="hljs-string">"Pink"</span>);

<span class="hljs-comment">// Store pcColor: "Yellow" inside the local storage object:</span>
<span class="hljs-built_in">localStorage</span>.setItem(<span class="hljs-string">"pcColor"</span>, <span class="hljs-string">"Yellow"</span>);

<span class="hljs-comment">// Store laptopColor: "White" inside the local storage object:</span>
<span class="hljs-built_in">localStorage</span>.setItem(<span class="hljs-string">"laptopColor"</span>, <span class="hljs-string">"White"</span>);

<span class="hljs-comment">// Verify the number of items in the local storage:</span>
<span class="hljs-built_in">localStorage</span>.length;

<span class="hljs-comment">// The invocation above may return:</span>
<span class="hljs-number">3</span>
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/length/js-3f6lac"><strong>Try Editing It</strong></a></p>
<p><strong>Note:</strong> Your <code>localStorage.length</code> invocation may return a value greater than <code>3</code> if your browser's local storage already contains some stored information.</p>
<h3 id="heading-what-is-web-storages-removeitem-method">What is web storage’s <code>removeItem()</code> method?</h3>
<p>The <code>removeItem()</code> method removes a property from the specified web storage.</p>
<h4 id="heading-syntax-of-the-removeitem-method">Syntax of the <code>removeItem()</code> Method</h4>
<p><code>removeItem()</code> accepts one required argument. Here is the syntax:</p>
<pre><code class="lang-js">webStorageObject.removeItem(key);
</code></pre>
<ul>
<li><code>webStorageObject</code> represents the storage object (<code>localStorage</code> or <code>sessionStorage</code>) whose item you wish to remove.</li>
<li><code>key</code> is a required argument. It is a string specifying the name of the web storage property you want to remove.</li>
</ul>
<h4 id="heading-example-1-how-to-remove-data-from-the-session-storage-object">Example 1: How to remove data from the session storage object</h4>
<ol>
<li>Invoke <code>sessionStorage</code>’s <code>removeItem()</code> method.</li>
<li>Provide the name of the data you wish to remove.</li>
</ol>
<pre><code class="lang-js"><span class="hljs-comment">// Store carColor: "Pink" inside the browser's session storage object:</span>
sessionStorage.setItem(<span class="hljs-string">"carColor"</span>, <span class="hljs-string">"Pink"</span>);

<span class="hljs-comment">// Store pcColor: "Yellow" inside the session storage object:</span>
sessionStorage.setItem(<span class="hljs-string">"pcColor"</span>, <span class="hljs-string">"Yellow"</span>);

<span class="hljs-comment">// Store laptopColor: "White" inside the session storage object:</span>
sessionStorage.setItem(<span class="hljs-string">"laptopColor"</span>, <span class="hljs-string">"White"</span>);

<span class="hljs-comment">// Remove the pcColor item from the session storage:</span>
sessionStorage.removeItem(<span class="hljs-string">"pcColor"</span>);

<span class="hljs-comment">// Confirm whether the pcColor item still exists in the session storage:</span>
sessionStorage.getItem(<span class="hljs-string">"pcColor"</span>);

<span class="hljs-comment">// The invocation above will return:</span>
<span class="hljs-literal">null</span>
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/removeitem/js-1mywnh"><strong>Try Editing It</strong></a></p>
<p><strong>Note:</strong> The <code>removeItem()</code> method will do nothing if its argument does not exist in the session storage.</p>
<h4 id="heading-example-2-how-to-remove-data-from-the-local-storage-object">Example 2: How to remove data from the local storage object</h4>
<ol>
<li>Invoke <code>localStorage</code>’s <code>removeItem()</code> method.</li>
<li>Provide the name of the data you wish to remove.</li>
</ol>
<pre><code class="lang-js"><span class="hljs-comment">// Store carColor: "Pink" inside the browser's local storage object:</span>
<span class="hljs-built_in">localStorage</span>.setItem(<span class="hljs-string">"carColor"</span>, <span class="hljs-string">"Pink"</span>);

<span class="hljs-comment">// Store pcColor: "Yellow" inside the local storage object:</span>
<span class="hljs-built_in">localStorage</span>.setItem(<span class="hljs-string">"pcColor"</span>, <span class="hljs-string">"Yellow"</span>);

<span class="hljs-comment">// Store laptopColor: "White" inside the local storage object:</span>
<span class="hljs-built_in">localStorage</span>.setItem(<span class="hljs-string">"laptopColor"</span>, <span class="hljs-string">"White"</span>);

<span class="hljs-comment">// Remove the pcColor item from the local storage:</span>
<span class="hljs-built_in">localStorage</span>.removeItem(<span class="hljs-string">"pcColor"</span>);

<span class="hljs-comment">// Confirm whether the pcColor item still exists in the local storage:</span>
<span class="hljs-built_in">localStorage</span>.getItem(<span class="hljs-string">"pcColor"</span>);

<span class="hljs-comment">// The invocation above will return:</span>
<span class="hljs-literal">null</span>
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/removeitem/js-8doou3"><strong>Try Editing It</strong></a></p>
<p><strong>Note:</strong> The <code>removeItem()</code> method will do nothing if its argument does not exist in the local storage.</p>
<h3 id="heading-what-is-web-storages-clear-method">What is web storage’s <code>clear()</code> method?</h3>
<p>The <code>clear()</code> method clears (deletes) all the items in the specified web storage.</p>
<h4 id="heading-syntax-of-the-clear-method">Syntax of the <code>clear()</code> Method</h4>
<p><code>clear()</code> accepts no argument. Here is the syntax:</p>
<pre><code class="lang-js">webStorageObject.clear();
</code></pre>
<p><code>webStorageObject</code> represents the storage object (<code>localStorage</code> or <code>sessionStorage</code>) whose items you wish to clear.</p>
<h4 id="heading-example-1-how-to-clear-all-items-from-the-session-storage-object">Example 1: How to clear all items from the session storage object</h4>
<p>Invoke <code>sessionStorage</code>’s <code>clear()</code> method.</p>
<pre><code class="lang-js"><span class="hljs-comment">// Store carColor: "Pink" inside the browser's session storage object:</span>
sessionStorage.setItem(<span class="hljs-string">"carColor"</span>, <span class="hljs-string">"Pink"</span>);

<span class="hljs-comment">// Store pcColor: "Yellow" inside the session storage object:</span>
sessionStorage.setItem(<span class="hljs-string">"pcColor"</span>, <span class="hljs-string">"Yellow"</span>);

<span class="hljs-comment">// Store laptopColor: "White" inside the session storage object:</span>
sessionStorage.setItem(<span class="hljs-string">"laptopColor"</span>, <span class="hljs-string">"White"</span>);

<span class="hljs-comment">// Clear all items from the session storage:</span>
sessionStorage.clear();

<span class="hljs-comment">// Confirm whether the session storage still contains any item:</span>
<span class="hljs-built_in">console</span>.log(sessionStorage);

<span class="hljs-comment">// The invocation above will return:</span>
{<span class="hljs-attr">length</span>: <span class="hljs-number">0</span>}
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/clear/js-an86yu"><strong>Try Editing It</strong></a></p>
<h4 id="heading-example-2-how-to-clear-all-items-from-the-local-storage-object">Example 2: How to clear all items from the local storage object</h4>
<p>Invoke <code>localStorage</code>’s <code>clear()</code> method.</p>
<pre><code class="lang-js"><span class="hljs-comment">// Store carColor: "Pink" inside the browser's local storage object:</span>
<span class="hljs-built_in">localStorage</span>.setItem(<span class="hljs-string">"carColor"</span>, <span class="hljs-string">"Pink"</span>);

<span class="hljs-comment">// Store pcColor: "Yellow" inside the local storage object:</span>
<span class="hljs-built_in">localStorage</span>.setItem(<span class="hljs-string">"pcColor"</span>, <span class="hljs-string">"Yellow"</span>);

<span class="hljs-comment">// Store laptopColor: "White" inside the local storage object:</span>
<span class="hljs-built_in">localStorage</span>.setItem(<span class="hljs-string">"laptopColor"</span>, <span class="hljs-string">"White"</span>);

<span class="hljs-comment">// Clear all items from the local storage:</span>
<span class="hljs-built_in">localStorage</span>.clear();

<span class="hljs-comment">// Confirm whether the local storage still contains any item:</span>
<span class="hljs-built_in">console</span>.log(<span class="hljs-built_in">localStorage</span>);

<span class="hljs-comment">// The invocation above will return:</span>
{<span class="hljs-attr">length</span>: <span class="hljs-number">0</span>}
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/clear/js-w5vyem"><strong>Try Editing It</strong></a></p>
<p>Now that we know what web storage is and how to access it, we can practice using it in a JavaScript project.</p>
<h2 id="heading-time-to-practice-with-web-storage">Time to Practice with Web Storage 🤸‍♂️🏋️‍♀️</h2>
<p>Consider the following To-Do List app:</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/78MRup0PN7c" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<h3 id="heading-the-problem">The Problem</h3>
<p>The issue with the <a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/to-do-app/js-mgl6ie">To-Do List app</a> is this:</p>
<ul>
<li>Tasks disappear whenever users refresh the webpage.</li>
</ul>
<h3 id="heading-your-exercise">Your Exercise</h3>
<p>Use the appropriate Web Storage APIs to accomplish the following tasks:</p>
<ol>
<li>Prevent the Session pane’s To-Do items from disappearing whenever users reload the browser.</li>
<li>Prevent the Local section’s To-Do items from disappearing whenever users reload or close their browser tab (or window).</li>
<li>Auto-display the Session section's previously added tasks on page reload.</li>
<li>Auto-display the Local section's previously added tasks on page reload (or browser reopen).</li>
</ol>
<h3 id="heading-bonus-exercise">Bonus Exercise</h3>
<p>Use your browser’s console to:</p>
<ol>
<li>Check the number of items in your browser’s session storage object.</li>
<li>Display the name of your local storage’s zeroth index item.</li>
<li>Delete all the items in your browser’s session storage.</li>
</ol>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/to-do-app/js-mgl6ie"><strong>Try the Web Storage Exercise</strong></a></p>
<p><strong>Note:</strong> You will benefit much more from this tutorial if you attempt the exercise yourself.</p>
<p>If you get stuck, don’t be discouraged. Instead, review the lesson and give it another try.</p>
<p>Once you’ve given it your best shot (you’ll only cheat yourself if you don’t!), we can discuss how I approached the exercise below.</p>
<h2 id="heading-how-did-you-go-about-solving-the-web-storage-exercise">How Did You Go About Solving the Web Storage Exercise?</h2>
<p>Below are feasible ways to get the exercise done.</p>
<h3 id="heading-how-to-prevent-the-session-storage-panes-to-do-items-from-disappearing-on-page-reload">How to prevent the Session Storage pane’s To-Do items from disappearing on page reload</h3>
<p>Whenever users click the “Add task” button,</p>
<ol>
<li>Get existing session storage’s content, if any. Otherwise, return an empty array.</li>
<li>Merge the existing to-do items with the user’s new input.</li>
<li>Add the new to-do list to the browser’s session storage object.</li>
</ol>
<p><strong>Here’s the code:</strong></p>
<pre><code class="lang-js">sessionAddTaskBtn.addEventListener(<span class="hljs-string">'click'</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-comment">// Get existing session storage's content, if any. Otherwise, return an empty array:</span>
  <span class="hljs-keyword">const</span> currentTodoArray =
    <span class="hljs-built_in">JSON</span>.parse(sessionStorage.getItem(<span class="hljs-string">'codesweetlyStore'</span>)) || [];

  <span class="hljs-comment">// Merge currentTodoArray with the user's new input:</span>
  <span class="hljs-keyword">const</span> newTodoArray = [
    ...currentTodoArray,
    { <span class="hljs-attr">checked</span>: <span class="hljs-literal">false</span>, <span class="hljs-attr">text</span>: sessionInputEle.value },
  ];

  <span class="hljs-comment">// Add newTodoArray to the session storage object:</span>
  sessionStorage.setItem(<span class="hljs-string">'codesweetlyStore'</span>, <span class="hljs-built_in">JSON</span>.stringify(newTodoArray));

  <span class="hljs-keyword">const</span> todoLiElements = createTodoLiElements(newTodoArray);
  sessionTodosContainer.replaceChildren(...todoLiElements);
  sessionInputEle.value = <span class="hljs-string">''</span>;
});
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/to-do-app/js-txyt66"><strong>Try Editing It</strong></a></p>
<p><strong>Note:</strong> The three dots (<code>...</code>) preceding the <code>currentTodoArray</code> variable represent the <a target="_blank" href="https://codesweetly.com/spread-operator">spread operator</a>. We used it in the <code>newTodoArray</code> object to copy <code>currentTodoArray</code>’s items into <code>newTodoArray</code>.</p>
<h3 id="heading-how-to-prevent-the-local-storage-panes-to-do-items-from-disappearing-on-page-reload-or-reopen">How to prevent the Local Storage pane’s To-Do items from disappearing on page reload or reopen</h3>
<ol>
<li>Get existing local storage’s content, if any. Otherwise, return an empty array.</li>
<li>Merge the existing to-do items with the user’s new input.</li>
<li>Add the new to-do list to the browser’s local storage object.</li>
</ol>
<p><strong>Here’s the code:</strong></p>
<pre><code class="lang-js">localAddTaskBtn.addEventListener(<span class="hljs-string">'click'</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-comment">// Get existing local storage's content, if any. Otherwise, return an empty array:</span>
  <span class="hljs-keyword">const</span> currentTodoArray =
    <span class="hljs-built_in">JSON</span>.parse(<span class="hljs-built_in">localStorage</span>.getItem(<span class="hljs-string">'codesweetlyStore'</span>)) || [];

  <span class="hljs-comment">// Merge currentTodoArray with the user's new input:</span>
  <span class="hljs-keyword">const</span> newTodoArray = [
    ...currentTodoArray,
    { <span class="hljs-attr">checked</span>: <span class="hljs-literal">false</span>, <span class="hljs-attr">text</span>: localInputEle.value },
  ];

  <span class="hljs-comment">// Add newTodoArray to the local storage object:</span>
  sessionStorage.setItem(<span class="hljs-string">'codesweetlyStore'</span>, <span class="hljs-built_in">JSON</span>.stringify(newTodoArray));

  <span class="hljs-keyword">const</span> todoLiElements = createTodoLiElements(newTodoArray);
  localTodosContainer.replaceChildren(...todoLiElements);
  localInputEle.value = <span class="hljs-string">''</span>;
});
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/to-do-app/js-dpuffp"><strong>Try Editing It</strong></a></p>
<p><strong>Note:</strong> The <code>localTodosContainer.replaceChildren(...todoLiElements)</code> statement tells the browser to replace <code>localTodosContainer</code>’s current children elements with the list of <code>&lt;li&gt;</code>s in the <code>todoLiElements</code> array.</p>
<h3 id="heading-how-to-auto-display-the-session-sections-previously-added-tasks-on-page-reload">How to auto-display the Session section’s previously added tasks on page reload</h3>
<p>Whenever users reload the page,</p>
<ol>
<li>Get existing session storage’s content, if any. Otherwise, return an empty array.</li>
<li>Use the retrieved content to create <code>&lt;li&gt;</code> elements.</li>
<li>Populate the tasks display space with the <code>&lt;li&gt;</code> elements.</li>
</ol>
<p><strong>Here’s the code:</strong></p>
<pre><code class="lang-js"><span class="hljs-built_in">window</span>.addEventListener(<span class="hljs-string">'load'</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-comment">// Get existing session storage's content, if any. Otherwise, return an empty array:</span>
  <span class="hljs-keyword">const</span> sessionTodoArray =
    <span class="hljs-built_in">JSON</span>.parse(sessionStorage.getItem(<span class="hljs-string">'codesweetlyStore'</span>)) || [];

  <span class="hljs-comment">// Use the retrieved sessionTodoArray to create &lt;li&gt; elements:</span>
  <span class="hljs-keyword">const</span> todoLiElements = createTodoLiElements(sessionTodoArray);

  <span class="hljs-comment">// Populate the tasks display space with the todoLiElements:</span>
  sessionTodosContainer.replaceChildren(...todoLiElements);
});
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/to-do-app/js-zga551"><strong>Try Editing It</strong></a></p>
<h3 id="heading-how-to-auto-display-the-local-sections-previously-added-tasks-on-page-reload-or-reopen">How to auto-display the Local section’s previously added tasks on page reload or reopen</h3>
<p>Whenever users reload or reopen the page,</p>
<ol>
<li>Get existing local storage’s content, if any. Otherwise, return an empty array.</li>
<li>Use the retrieved content to create <code>&lt;li&gt;</code> elements.</li>
<li>Populate the tasks display space with the <code>&lt;li&gt;</code> elements.</li>
</ol>
<p><strong>Here’s the code:</strong></p>
<pre><code class="lang-js"><span class="hljs-built_in">window</span>.addEventListener(<span class="hljs-string">'load'</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-comment">// Get existing local storage's content, if any. Otherwise, return an empty array:</span>
  <span class="hljs-keyword">const</span> localTodoArray =
    <span class="hljs-built_in">JSON</span>.parse(<span class="hljs-built_in">localStorage</span>.getItem(<span class="hljs-string">'codesweetlyStore'</span>)) || [];

  <span class="hljs-comment">// Use the retrieved localTodoArray to create &lt;li&gt; elements:</span>
  <span class="hljs-keyword">const</span> todoLiElements = createTodoLiElements(localTodoArray);

  <span class="hljs-comment">// Populate the tasks display space with the todoLiElements:</span>
  localTodosContainer.replaceChildren(...todoLiElements);
});
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/to-do-app/js-srmnst"><strong>Try Editing It</strong></a></p>
<h3 id="heading-how-to-check-the-total-items-in-the-browsers-session-storage">How to check the total items in the browser’s session storage</h3>
<p>Use session storage’s <code>length</code> property like so:</p>
<pre><code class="lang-js"><span class="hljs-built_in">console</span>.log(sessionStorage.length);
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/to-do-app/js-m4pmhf"><strong>Try Editing It</strong></a></p>
<h3 id="heading-how-to-display-the-local-storages-zeroth-index-items-name">How to display the local storage’s zeroth index item’s name</h3>
<p>Use the local storage’s <code>key()</code> method as follows:</p>
<pre><code class="lang-js"><span class="hljs-built_in">console</span>.log(<span class="hljs-built_in">localStorage</span>.key(<span class="hljs-number">0</span>));
</code></pre>
<p><a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/to-do-app/js-th8xr7"><strong>Try Editing It</strong></a></p>
<h3 id="heading-how-to-empty-the-browsers-session-storage">How to empty the browser’s session storage</h3>
<p>Use the session storage’s <code>clear()</code> method as follows:</p>
<pre><code class="lang-js">sessionStorage.clear();
</code></pre>
<h2 id="heading-how-to-continue-practicing-with-web-storage">How to Continue Practicing with Web Storage 🧗‍♀️🚀</h2>
<p>The to-do app still has a lot of potential. For instance, you can:</p>
<ul>
<li>Convert it to a React TypeScript application.</li>
<li>Make it keyboard accessible.</li>
<li>Allow users to delete or edit individual tasks.</li>
<li>Allow users to star (mark as important) specific tasks.</li>
<li>Let users specify due dates.</li>
</ul>
<p>So, feel free to continue developing what we’ve built in this tutorial so you can better understand the web storage objects.</p>
<p>For instance, here’s my attempt at <a target="_blank" href="https://codesweetly.com/try-it-sdk/javascript/web-storage-apis/to-do-app/js-ax8tvk">making the two panes functional</a>:</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/gDiU-ubWPD4" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<p>Before we wrap up our discussion, you should know some differences between web storage and cookies. So, let’s talk about that below.</p>
<h2 id="heading-web-storage-vs-cookies-what-is-the-difference">Web Storage vs. Cookies: What is the Difference?</h2>
<p>Web storage and cookies are two main ways to store data locally within a user’s browser. But they work differently. Below are the main distinctions between them.</p>
<h3 id="heading-storage-limit">Storage limit</h3>
<p><strong>Cookies:</strong> Have 4 kilobytes maximum <a target="_blank" href="https://docs.devexpress.com/AspNet/11912/common-concepts/cookies-support#browser-limitations">storage limit</a>.</p>
<p><strong>Web storage:</strong> Can store a lot more than 4 kilobytes of data. For instance, Safari 8 can store up to 5 MB, while Firefox 34 permits 10 MB.</p>
<h3 id="heading-data-transfer-to-the-server">Data transfer to the server</h3>
<p><strong>Cookies:</strong> Transfer data to the server whenever browsers send HTTP requests to the web server.</p>
<p><strong>Web storage:</strong> Never transfers data to the server.</p>
<p><strong>Note:</strong> It is a waste of users’ bandwidth to send data to the server if such information is needed only by the client (browser), not the server.</p>
<h3 id="heading-weak-integrity-and-confidentiality">Weak integrity and confidentiality</h3>
<p><strong>Cookies:</strong> Suffer from <a target="_blank" href="https://datatracker.ietf.org/doc/html/rfc6265#section-8.6">weak integrity</a> and <a target="_blank" href="https://datatracker.ietf.org/doc/html/rfc6265#section-8.5">weak confidentiality</a> issues.</p>
<p><strong>Web storage:</strong> Do not suffer from weak integrity and confidentiality issues because it stores data per <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Glossary/Origin">origin</a>.</p>
<h3 id="heading-property">Property</h3>
<p><strong>Cookies:</strong> Cookies are a property of the <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/Document"><code>Document</code></a> object.</p>
<p><strong>Web storage:</strong> Web storage is a property of the <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/Window"><code>Window</code></a> object.</p>
<h3 id="heading-expiration">Expiration</h3>
<p><strong>Cookie:</strong> You can specify a cookie’s expiration date.</p>
<p><strong>Web storage:</strong> Browsers determine web storage’s expiration date.</p>
<h3 id="heading-retrieving-individual-data">Retrieving individual data</h3>
<p><strong>Cookies:</strong> There’s no way to retrieve individual data. You always have to recall all the data to read any single one.</p>
<p><strong>Web storage:</strong> You can choose the specific data you wish to retrieve.</p>
<h3 id="heading-the-syntax-for-storing-data">The syntax for storing data</h3>
<p><strong>Cookies:</strong></p>
<pre><code class="lang-js"><span class="hljs-built_in">document</span>.cookie = <span class="hljs-string">"key=value"</span>;
</code></pre>
<p><strong>Web storage:</strong></p>
<pre><code class="lang-js">webStorageObject.setItem(key, value);
</code></pre>
<h3 id="heading-the-syntax-for-reading-data">The syntax for reading data</h3>
<p><strong>Cookies:</strong></p>
<pre><code class="lang-js"><span class="hljs-built_in">document</span>.cookie;
</code></pre>
<p><strong>Web storage:</strong></p>
<pre><code class="lang-js">webStorageObject.getItem(key);
</code></pre>
<h3 id="heading-the-syntax-for-removing-data">The syntax for removing data</h3>
<p><strong>Cookies:</strong></p>
<pre><code class="lang-js"><span class="hljs-built_in">document</span>.cookie = <span class="hljs-string">"key=; expires=Thu, 01 May 1930 00:00:00 UTC"</span>;
</code></pre>
<p>The snippet above deletes the cookie by assigning an empty value to the <code>key</code> property and setting a past expiration date.</p>
<p><strong>Web storage:</strong></p>
<pre><code class="lang-js">webStorageObject.removeItem(key);
</code></pre>
<h2 id="heading-wrapping-up">Wrapping up</h2>
<p>In this article, we discussed how to use web storage and its built-in interfaces. We also used a to-do list project to practice using the local and session storage objects to store data locally and securely within users’ browsers.</p>
<p>Thanks for reading!</p>
<h3 id="heading-and-heres-a-useful-react-typescript-resource">And here’s a useful React TypeScript resource:</h3>
<p>I wrote a book about <a target="_blank" href="https://amzn.to/3Pa4bI4">Creating NPM Packages</a>!</p>
<p>It is a beginner-friendly book that takes you from zero to creating, testing, and publishing NPM packages like a pro.</p>
<p><a target="_blank" href="https://amzn.to/3Pa4bI4"><img src="https://www.freecodecamp.org/news/content/images/2023/09/creating-npm-package-banner-codesweetly.png" alt="Creating NPM Package Book Now Available at Amazon" width="600" height="400" loading="lazy"></a></p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
