<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Omer Rosenbaum - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Omer Rosenbaum - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Wed, 06 May 2026 16:59:19 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/author/omerros/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ Stop Staring at a Blank Deck: How I Use Claude Code + Marp to Think Through Presentations ]]>
                </title>
                <description>
                    <![CDATA[ The hard part of building a presentation is figuring out the story. What are you trying to say? What’s the structure? Which sections build on which? Where does the data go, table or bullets? Before th ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-use-claude-code-and-marp-to-think-through-presentations/</link>
                <guid isPermaLink="false">69bc3429b238fd45a3206764</guid>
                
                    <category>
                        <![CDATA[ writing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ research ]]>
                    </category>
                
                    <category>
                        <![CDATA[ claude-code ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Omer Rosenbaum ]]>
                </dc:creator>
                <pubDate>Thu, 19 Mar 2026 17:36:41 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/fcbd044d-0add-467c-a9b0-d068584a8197.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>The hard part of building a presentation is figuring out the <em>story</em>. What are you trying to say? What’s the structure? Which sections build on which? Where does the data go, table or bullets? Before the comparison or after?</p>
<p>What <em>would</em> help is having something to <strong>react to</strong>. Starting from zero is hard. Reacting to a draft is fast. “Move this before that” is way easier than “what should I say?”</p>
<p>That’s the workflow I want to show you. I use Claude Code + Marp to think through presentations. Claude helps me brainstorm the story, gives me a first draft to react to, and then I iterate, either through “conversation” or by editing the Markdown directly. The whole thing is a text file. 🎉</p>
<p>(I used a deck to think through this post. You can find it <a href="https://omerr.github.io/claude-skills/presentations/claude-code-marp/">here</a>.)</p>
<h3 id="heading-well-cover">We'll cover:</h3>
<ol>
<li><p><a href="#heading-the-workflow">The Workflow</a></p>
<ul>
<li><p><a href="#heading-brainstorm">Brainstorm</a></p>
</li>
<li><p><a href="#heading-react">React</a></p>
</li>
<li><p><a href="#heading-iterate">Iterate</a></p>
</li>
<li><p><a href="#heading-export">Export</a></p>
</li>
<li><p><a href="#heading-editable-pptx">Editable PPTX</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-get-started-in-5-minutes">Get Started in 5 Minutes</a></p>
<ul>
<li><p><a href="#heading-1-install-marp-cli">1. Install Marp CLI</a></p>
</li>
<li><p><a href="#heading-2-install-the-skill-via-skillssh">2. Install the skill (via skills.sh)</a></p>
</li>
<li><p><a href="#heading-run-it">Run it</a></p>
</li>
<li><p><a href="#heading-iterate">Iterate</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-use-case-this-very-article">Use Case: This Very Article</a></p>
</li>
<li><p><a href="#heading-under-the-hood">Under the Hood</a></p>
<ul>
<li><p><a href="#heading-marp-markdown-to-slides">Marp: Markdown to Slides</a></p>
</li>
<li><p><a href="#heading-the-skill-file">The Skill File</a></p>
</li>
<li><p><a href="#heading-section-dividers">Section Dividers</a></p>
</li>
<li><p><a href="#heading-breadcrumb-navigation">Breadcrumb Navigation</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-wrapping-up">Wrapping Up</a></p>
</li>
</ol>
<h2 id="heading-the-workflow">The Workflow</h2>
<p>The process has four stages: <strong>Brainstorm</strong>, <strong>React</strong>, <strong>Iterate</strong>, <strong>Export</strong>. Let me walk through each one.</p>
<h3 id="heading-brainstorm">Brainstorm</h3>
<p>You kick things off with a slash command (I will provide you with this command later):</p>
<pre><code class="language-plaintext">/create-marp-deck API rate limiting
</code></pre>
<p>Claude starts by interviewing you, asks about the goal, audience, key points, any data you want to include. This partforces you to articulate the story before a single slide exists.</p>
<img src="https://cdn-images-1.medium.com/max/1200/0*HHy7-RCRF_oS8WeU.png" alt="Claude Code interviewing the user about their presentation — asking about the goal, audience, and key&nbsp;sections" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<pre><code class="language-plaintext">~/projects/my-app $ claude
&gt; /create-marp-deck API rate limiting
─────────────────────────────────────────────

Claude: What's this presentation about? What should the audience
        walk away knowing?

User:   How our rate limiting works, which algorithm we chose and
        why, and what they need to know to configure it for their
        services.

Claude: Who's the audience?

User:   Platform engineering team — they know our stack but haven't
        seen the rate limiting internals.

Claude: What are the key points or sections you want to cover?

User:   1. Why we need rate limiting (the incident last month)
        2. Token bucket vs sliding window — our decision
        3. How to configure it per-service
        4. Monitoring dashboard walkthrough
</code></pre>
<p>Think of it as a lightweight brainstorm: you talk through what you’re trying to say, and Claude helps you structure it.</p>
<h3 id="heading-react">React</h3>
<p>Once you’ve aligned on the structure, Claude generates the full Marp Markdown file and exports it. You get a solid first draft you can <em>react to</em> and reshape.</p>
<img src="https://cdn-images-1.medium.com/max/1200/0*pFj4D2py1ATg816C.png" alt="Title slide with dark gradient background showing “API Rate Limiting — A Technical Deep&nbsp;Dive”" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>That title slide came from this Markdown:</p>
<pre><code class="language-plaintext">&lt;!-- _class: lead title-slide --&gt;
</code></pre>
<pre><code class="language-plaintext"># API Rate Limiting
## A Technical Deep Dive
</code></pre>
<pre><code class="language-plaintext">**Team**: Platform Engineering
**Date**: February 2026
</code></pre>
<p>Is it perfect? Probably not. But now you have something concrete, with sections, structure, and a story, that you can push around. That’s so much faster than starting from a blank canvas.</p>
<p>When you go through the slides, you <em>feel</em> if the story is coherent and clear.</p>
<h3 id="heading-iterate">Iterate</h3>
<p>While reviewing the draft, it'll inevitably spark ideas: “oh, I should add a comparison table here,” “this section is too dense, maybe split it into two,” “move this summary up to the top.”</p>
<p>One way to make such edits is to ask Claude Code to do that:</p>
<pre><code class="language-plaintext">"Slide 6 is too dense. Split the algorithm comparison into
two slides, one for token bucket, one for sliding window."
</code></pre>
<img src="https://cdn-images-1.medium.com/max/1200/0*dVWLJ1hecPviej4D.png" alt="Claude Code splitting a slide and adding a callout, with file&nbsp;diffs" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<pre><code class="language-plaintext">&gt; Slide 6 is too dense. Split the algorithm comparison into two
  slides — one for token bucket, one for sliding window.

Claude: I'll split slide 6 into two separate slides, one per algorithm.

  Edit presentations/api-rate-limiting.md
  ───────────────────────────────────────
  - # Algorithm Comparison
  - | Feature | Token Bucket | Sliding Window |
  + # Token Bucket
  + Tokens refill at a steady rate...
  + ---
  + # Sliding Window
  + Track exact timestamp of every request...

&gt; Add a "Why we chose token bucket" callout to that first slide

Claude: Added a blockquote callout explaining the decision.

  Edit presentations/api-rate-limiting.md
  ───────────────────────────────────────
  + &gt; We chose token bucket because it handles bursty traffic
  + &gt; from our mobile clients without penalizing steady callers
</code></pre>
<p>You can also edit in <strong>VS Code</strong> with the Marp extension for live preview. Open the&nbsp;<code>.md</code> file, hit <code>Ctrl+Shift+V</code>, and you get the source on the left with rendered slides on the right. Claude Code edits the file, VS Code detects the change, and the preview updates automatically. (I keep both open side by side and it just works.)</p>
<img src="https://cdn-images-1.medium.com/max/1200/0*88zY1J4xzeo1vUWS.png" alt="Me editing the deck that I created to help me think through this&nbsp;article" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-export">Export</h3>
<p>When you’re done, you get three files:</p>
<ul>
<li><p><code>.md</code> – the source (version-controlled, diffable)</p>
</li>
<li><p><code>.html</code> – open in any browser, share via Slack</p>
</li>
<li><p><code>.pptx</code> – open in PowerPoint, present anywhere</p>
</li>
</ul>
<pre><code class="language-bash">$ marp --no-stdin deck.md -o deck.html
[  INFO ] Converting 1 markdown...
[  INFO ] deck.md =&gt; deck.html

$ marp --no-stdin --pptx deck.md -o deck.pptx
[  INFO ] Converting 1 markdown...
[  INFO ] deck.md =&gt; deck.pptx

$ ls presentations/
api-rate-limiting.md
api-rate-limiting.html   ✓ open in browser, share via Slack
api-rate-limiting.pptx   ✓ open in PowerPoint, present anywhere
</code></pre>
<img src="https://cdn-images-1.medium.com/max/1200/0*XsnHZELJ9w3vovOz.png" alt="marp CLI exporting to HTML and&nbsp;PPTX" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>The skill runs the export commands automatically after generating the deck. A 15-slide deck converts in about 2 seconds.</p>
<h4 id="heading-editable-pptx">Editable PPTX</h4>
<p>The standard PPTX export renders each slide as an image  –  pixel-perfect, but you can’t edit the text in PowerPoint or Google Slides. If you need editable text, Marp has a <code>--pptx-editable</code> flag that uses LibreOffice under the hood to produce real text boxes.</p>
<p>The catch: LibreOffice creates text boxes that are too narrow, so text wraps and overlaps. The skill includes a python-pptx post-processing script that automatically widens the text boxes to fix this. Just ask for “editable PPTX” and the skill handles the rest  –  the LibreOffice conversion, the text box fix, everything.</p>
<h2 id="heading-get-started-in-5-minutes">Get Started in 5&nbsp;Minutes</h2>
<p>OK, are you ready? Here’s everything you need:</p>
<h3 id="heading-1-install-marp-cli">1. Install Marp CLI:</h3>
<ul>
<li><code>npm install -g @marp-team/marp-cli</code></li>
</ul>
<h3 id="heading-2-install-the-skill-via-skillssh">2. Install the skill (via <a href="http://skills.sh">skills.sh</a>):</h3>
<ul>
<li><code>npx skills add Omerr/claude-skills</code></li>
</ul>
<p>This works with Claude Code, Cursor, GitHub Copilot, and other AI agents. You can also install manually  ( see the <a href="https://github.com/Omerr/claude-skills">repo</a> for details).</p>
<h3 id="heading-3-run-it">3. Run it:</h3>
<ul>
<li><code>/create-marp-deck your topic here</code></li>
</ul>
<h3 id="heading-4-iterate">4. Iterate:</h3>
<p>React to the draft, refine through conversation or VS Code, and export.</p>
<p>That’s it. Four steps. Fork the repo and customize the conventions to match your style.</p>
<h2 id="heading-use-case-this-very-article">Use Case: This Very&nbsp;Article</h2>
<p>Want to see this workflow in practice? You’re looking at it.</p>
<p>I wrote this article by first creating a slide deck using exactly the process I described above. I ran <code>/create-marp-deck</code>, answered the interview questions, got a first draft, and iterated until the story felt right. You can <a href="https://omerr.github.io/claude-skills/presentations/claude-code-marp/">see the deck here</a>.</p>
<p>Why start with slides? Because a deck forces you to be concise and to go through the <em>story</em>. If the story doesn’t flow across 15 slides, it won’t flow across 1,500 words either. The deck became my outline, and once I had a coherent structure there, writing the article was much easier.</p>
<p>So if you’re ever staring at a blank doc thinking “I should write a blog post about X,” try making a deck first. You might be surprised how much faster the writing goes when the story is already figured out. 😎</p>
<h2 id="heading-under-the-hood">Under the&nbsp;Hood</h2>
<p>If you’re curious about what makes this work, read on. If not, you’re all set. 🙌🏻</p>
<h3 id="heading-marp-markdown-to-slides">Marp: Markdown to&nbsp;Slides</h3>
<p><a href="https://marp.app/"><strong>Marp</strong></a> (Markdown Presentation Ecosystem) converts&nbsp;<code>.md</code> files into slides. Your deck starts with frontmatter:</p>
<pre><code class="language-plaintext">---
marp: true
theme: default
paginate: true
size: 16:9
---
</code></pre>
<p>Four lines and you have widescreen, paginated slides. Slide breaks are just <code>---</code> in the Markdown. Your presentation is a text file: version-controlled, diffable, and AI-editable.</p>
<h3 id="heading-the-skill-file">The Skill&nbsp;File</h3>
<p>You <em>could</em> just ask Claude Code to “make me a Marp presentation” every time. But you’d spend half the conversation explaining your preferred format, color palette, and slide structure.</p>
<p>Instead, I created a <strong>Claude Code skill</strong> (see it <a href="https://github.com/Omerr/claude-skills.git">here</a>), a reusable set of instructions that Claude follows whenever you invoke it. It has two parts:</p>
<ol>
<li><p>An <strong>interview phase</strong> that gathers context before generating anything (the 5 questions from the brainstorm step)</p>
</li>
<li><p>A <strong>generation phase</strong> with the full Marp conventions: CSS palette, slide structure, breadcrumb pattern, formatting rules, and export commands</p>
</li>
</ol>
<p>The full skill is about 200 lines. That sounds like a lot, but you write it once and then every deck you create follows the same polished conventions automatically.</p>
<h3 id="heading-section-dividers">Section Dividers</h3>
<p>Each section of the deck gets its own gradient background. So when you’re presenting, the audience intuitively knows when you’ve moved to a new topic:</p>
<img src="https://cdn-images-1.medium.com/max/1200/0*SDRNCJnSw3BDwXUG.png" alt="Section divider slide with blue gradient showing “Part 1: The&nbsp;Problem”" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Applied via CSS classes in the skill:</p>
<pre><code class="language-plaintext">&lt;!-- _class: lead part-problem --&gt;
# Part 1: The Problem
</code></pre>
<h3 id="heading-breadcrumb-navigation">Breadcrumb Navigation</h3>
<p>This is my favorite part of the whole setup.</p>
<p>Every content slide has a breadcrumb header at the top that shows where you are in the deck:</p>
<img src="https://cdn-images-1.medium.com/max/1200/0*PaPBdx60ZYJn9G3K.png" alt="Content slide showing breadcrumb “The Problem > Algorithms > Implementation” with the current section highlighted in&nbsp;blue" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>See that header? “The Problem &gt; <strong>Algorithms</strong> &gt; Implementation”, with “Algorithms” highlighted in blue.</p>
<p>In Marp, this is done with a simple HTML comment:</p>
<pre><code class="language-plaintext">&lt;!-- header: "The Problem &gt; **Algorithms** &gt; Implementation" --&gt;
</code></pre>
<p>The <code>**bold**</code> text renders in blue (via CSS <code>header strong { color: #2563eb; }</code>), while the rest stays gray. You set it once per section and it persists until you change it.</p>
<p>How often have you sat through a presentation wondering “wait, where are we?” 🤔</p>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>The hard part of presentations is telling a coherent story. Get yourself a first draft to react to, iterate until it flows, and export. That’s it.</p>
<p>If you want to try it: <code>npm install -g @marp-team/marp-cli</code>, run <code>npx skills add Omerr/claude-skills</code>, and then <code>/create-marp-deck</code>. You'll have a deck in minutes and a workflow you can reuse for every presentation after that.</p>
<h3 id="heading-about-the-author">About the&nbsp;Author</h3>
<p><a href="https://www.linkedin.com/in/omer-rosenbaum-034a08b9/">Omer Rosenbaum</a> is the author of the <a href="https://youtube.com/@BriefVid">Brief</a> <a href="https://youtube.com/@BriefVid">YouTube Channel</a>. He’s also a cyber training expert and founder of Checkpoint Security Academy. He’s the author of <a href="https://www.freecodecamp.org/news/product-led-research-a-practical-guide-for-randd-leaders-full-book/">Product-Led Research</a>, <a href="https://www.freecodecamp.org/news/gitting-things-done-book/">Gitting Things Done</a> (in English) and <a href="https://data.cyber.org.il/networks/networks.pdf">Computer Networks</a> (in Hebrew). You can find him on <a href="https://twitter.com/Omer_Ros">Twitter</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Product-Led Research: A Practical Guide for R&D Leaders [Full Book] ]]>
                </title>
                <description>
                    <![CDATA[ Your team needs to solve a problem, and there's no clear solution path. Multiple approaches might work, but you're not sure which. Success isn't guaranteed. This is Research, not Development. And if you manage it like Development, things aren't going... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/product-led-research-a-practical-guide-for-randd-leaders-full-book/</link>
                <guid isPermaLink="false">69963f99d35b661838993be2</guid>
                
                    <category>
                        <![CDATA[ research ]]>
                    </category>
                
                    <category>
                        <![CDATA[ engineering ]]>
                    </category>
                
                    <category>
                        <![CDATA[ development ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Omer Rosenbaum ]]>
                </dc:creator>
                <pubDate>Wed, 18 Feb 2026 22:39:21 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769115610494/37ca44ed-763d-42a3-969a-8430f000701e.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Your team needs to solve a problem, and there's no clear solution path. Multiple approaches might work, but you're not sure which. Success isn't guaranteed.</p>
<p>This is Research, not Development. And if you manage it like Development, things aren't going to go well for you.</p>
<p>Maybe you've seen this: A talented engineer spends three weeks trying different approaches, switching between them until the original problem is no longer relevant. Or a researcher declares after two days "it's impossible" and gives up, though later it turns out the problem was actually solvable. Or worse and most frustrating of all: Research succeeds technically but doesn't impact the product, and you realize you've been solving the wrong problem.</p>
<p>Here's the thing: <strong>managing Research is fundamentally different from managing Development.</strong></p>
<p>Development has known solution paths, relatively predictable timelines, and measurable progress. You probably learned to manage Development, and know how to do it well. Managing Development is challenging, yet there are established best practices that are well known in the industry.</p>
<p>But Research? Research is inherently uncertain. The techniques that work for Development fail spectacularly for Research. Most organizations either approach Research by deploying techniques devised for managing Development, or entirely give up on managing Research and treat it as mystical work by brilliant people, where the best you can do is not interrupt.</p>
<p>It doesn't have to be this way.</p>
<h2 id="heading-what-youll-learn">What You'll Learn</h2>
<p>By reading this book, you’ll gain practical tools for managing Research that create real product impact.</p>
<p>You’ll learn your <strong>two critical roles as a Research leader</strong>:</p>
<ol>
<li><p>Ensure Research connects to product impact.</p>
</li>
<li><p>Ensure Research is done effectively.</p>
</li>
</ol>
<p>You’ll understand <em>why</em> Research is different from Development. You’ll get concrete tools for managing Research execution, and learn proven heuristics for ensuring product impact.</p>
<p>Throughout the book, you'll see the mentioned methods applied to real challenges.</p>
<p>You’ll feel <strong>confident</strong> managing Research. You’ll <strong>understand</strong> when to use which tools. And you’ll know how to keep Research connected to product value while managing its inherent uncertainty.</p>
<h2 id="heading-who-is-this-book-for">Who Is This Book For?</h2>
<p>Any engineering leader who manages Research, or has researchers on their team.</p>
<p>If you're a CTO, VP of Engineering, R&amp;D Director, or Engineering Manager responsible for Research initiatives – this book is for you.</p>
<p>This book is for product-focused leaders who need their Research to ship value, not just produce interesting findings.</p>
<p>You’ll also notice that I use a casual style throughout the book. I believe that learning Research management should be insightful and practical. These are hard problems, and writing in an overly academic style wouldn't serve you well. This book is for <em>you</em>, written to help you succeed.</p>
<h2 id="heading-who-am-i">Who Am I?</h2>
<p>This book is about you, and your journey managing Research. But let me tell you why I think I can contribute to that journey.</p>
<p>I am the CTO and co-founder of Swimm.io, a knowledge management platform for code. At Swimm, I've led multiple Research initiatives — from automatically keeping documentation in sync with code changes, to extracting business rules from legacy COBOL systems. We've faced genuine Research challenges where the path to success wasn't clear, where we didn't know if solutions were even possible.</p>
<p>I've managed Research in product environments where "interesting findings" aren't enough — where Research must ship and create measurable value. I've experienced the failure modes firsthand: Research that succeeded technically but didn't impact the product, teams that got stuck exploring endlessly, and the challenge of managing uncertainty while maintaining velocity.</p>
<p>I've also managed teams of researchers — brilliant people who needed guidance not on technical capability, but on connecting their work to product value and working systematically through uncertainty.</p>
<p>This book combines my experience leading product-focused Research with my background in teaching and making complex ideas practical and actionable.</p>
<h2 id="heading-the-approach-of-this-book">The Approach of This Book</h2>
<p>This is not an academic piece on Research methodology. When writing it, I had three principles in mind:</p>
<p><strong>1. Practical</strong>: In this book, you will learn how to accomplish things in Research management. You will understand frameworks not just for the sake of understanding, but with a practical mindset. I sometimes refer to this as the "practicality principle" – which guides me in deciding whether to include certain topics, and to what extent.</p>
<p><strong>2. Based on proven theory</strong>: While practical, the methods are grounded in Alan Schoenfeld's problem-solving research — a framework developed by studying how people actually solve uncertain problems. You'll see how Schoenfeld's components (knowledge, heuristics, control, beliefs) map directly to practical Research management tools.</p>
<p><strong>3. Real examples</strong>: You will see these methods applied to actual Research challenges. Not toy problems, but real initiatives involving complex problems with genuine uncertainty. These examples show the methods in action, including when they're messy, when approaches fail, and how to adapt.</p>
<h2 id="heading-structure-of-this-book">Structure of This Book</h2>
<p>The book is organized in three parts:</p>
<p><strong>Part 1: Foundations</strong>: Read this to understand what makes Research different and why it needs specialized management. This part is short — just enough to establish the framework that organizes everything else.</p>
<p><strong>Part 2: Research Management Methods</strong>: These are tools that work for <em>any</em> Research.</p>
<p><strong>Part 3: Ensuring Product Impact</strong>: Methods specifically for connecting Research to product value.</p>
<h2 id="heading-why-is-this-book-publicly-available">Why Is This Book Publicly Available?</h2>
<p>In short, I'd like this book to get to as many people as possible.</p>
<p>If you would like to support this book, you are welcome to buy <a target="_blank" href="https://buymeacoffee.com/omerr/e/505520">E-Book version</a>, <a target="_blank" href="https://amzn.to/46aCxnO">Paperback</a>, <a target="_blank" href="https://amzn.to/4tD1T7O">Hardc</a><a target="_blank" href="https://amzn.to/46aCxnO">over</a> , or <a target="_blank" href="https://buymeacoffee.com/omerr">buy me a coffee</a>. Thank you!</p>
<h2 id="heading-feedback-is-welcome">Feedback Is Welcome</h2>
<p>I created this book to help you and people like you manage Research effectively and ensure it creates product impact.</p>
<p>From the beginning, I asked for feedback from experienced leaders and researchers to make sure the book achieves these goals. If you found something valuable, felt something was missing, or thought something needed improvement — I would love to hear from you.</p>
<p>Your feedback helps make this book better for everyone. Please reach out at: gitting.things@gmail.com.</p>
<p>Now, let's begin. In Chapter 1, you'll learn exactly what makes Research different from Development, and why managing them differently matters.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<h3 id="heading-part-1-foundations">Part 1: Foundations</h3>
<ol>
<li><p><a class="post-section-overview" href="#heading-chapter-1-what-is-research">Chapter 1 - What is Research?</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-the-difference">The Difference</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-schoenfelds-framework-for-problem-solving">Schoenfeld's Framework for Problem Solving</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-this-matters-for-rampd-leaders">Why This Matters for R&amp;D Leaders</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-notes">Notes</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-references">References</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-2-research-and-development">Chapter 2 - Research and Development</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-defining-research-vs-development">Defining Research vs. Development</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-a-quick-test">A Quick Test</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-key-indicators-youre-doing-research">Key Indicators You're Doing Research</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-common-misconceptions">Common Misconceptions</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-your-role-as-research-leader">Your Role as Research Leader</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-part-1-summary">Part 1 - Summary</a></p>
</li>
</ol>
<h3 id="heading-part-2-research-management-methods">Part 2: Research Management Methods</h3>
<ol>
<li><p><a class="post-section-overview" href="#heading-chapter-3-why-methodology-matters-a-true-story">Chapter 3 - Why Methodology Matters: A True Story</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-the-lesson">The Lesson</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-this-part-covers">What This Part Covers</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-4-the-research-tree">Chapter 4 - The Research Tree</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-a-research-tree">What Is a Research Tree?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-your-first-research-tree">Your First Research Tree</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-power-of-stopping-to-think">The Power of Stopping to Think</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-choosing-your-first-path">Choosing Your First Path</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-if-your-first-choice-doesnt-work">What If Your First Choice Doesn't Work?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-choosing-when-everything-seems-equal">Choosing When Everything Seems Equal</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-heuristics-can-be-combined">Heuristics Can Be Combined</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-answers-lead-to-new-questions">How Answers Lead to New Questions</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-complete-picture">The Complete Picture</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-color-coding-status">Color-Coding Status</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-additional-tips">Additional Tips</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-research-tree-prevents-common-pitfalls">The Research Tree Prevents Common Pitfalls</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-time-to-practice">Time to Practice</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-using-the-tree-with-your-team-using-the-tree-with-your-team">Using the Tree with Your Team {#using-the-tree-with-your-team}</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-tools">Tools</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-pro-tips">Pro Tips</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-recap-the-research-tree">Recap - The Research Tree</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-5-time-boxing-research-explorations">Chapter 5 - Time-Boxing Research Explorations</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-the-problem-research-without-time-limits">The Problem: Research Without Time Limits</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-time-boxing-creating-decision-points">Time-Boxing: Creating Decision Points</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-decision-point">The Decision Point</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-example-detecting-god-objects">Example: Detecting God Objects</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-set-time-boxes">How to Set Time Boxes</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-using-time-boxing-with-your-team">Using Time-Boxing with Your Team</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-integration-with-the-research-tree">Integration with the Research Tree</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-recap">Recap</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-part-2-summary">Part 2 Summary</a></p>
</li>
</ol>
<h3 id="heading-part-3-ensuring-product-impact">Part 3: Ensuring Product Impact</h3>
<ol>
<li><p><a class="post-section-overview" href="#heading-chapter-6-how-to-choose-research-initiatives-how-to-choose-research-initiatives">Chapter 6 - How to Choose Research Initiatives {#how-to-choose-research-initiatives}</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-1-starting-from-a-concrete-problem">1. Starting From a Concrete Problem</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-2-starting-from-a-technological-opportunity">2. Starting From a Technological Opportunity</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-problem-driven-vs-opportunity-driven-a-comparison">Problem-Driven vs. Opportunity-Driven: A Comparison</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-should-you-pursue-a-research-initiative">Should You Pursue a Research Initiative?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-pre-research-checks">Pre-Research Checks</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-choose-research-initiatives-summary">How to Choose Research Initiatives - Summary</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-7-drawing-backwards">Chapter 7 - Drawing Backwards</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-the-spiral-game">The Spiral Game</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-apply-drawing-backwards-to-product-led-research">How to Apply Drawing Backwards to Product-led Research</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-case-study-extracting-business-rules-from-cobol">Case Study: Extracting Business Rules from COBOL</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-drawing-backwards-is-so-powerful">Why Drawing Backwards Is So Powerful</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-practical-application-your-research-tree">Practical Application: Your Research Tree</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-summary-drawing-backwards">Summary: Drawing Backwards</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-8-end-to-end-iterations">Chapter 8 - End-to-End Iterations</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-drawing-backwards-end-to-end-a-combined-approach">Drawing Backwards + End-to-End: A Combined Approach</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-five-principles-of-end-to-end-iterations">The Five Principles of End-to-End Iterations</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-principle-1-outline-the-end-to-end-process">Principle 1: Outline the End-to-End Process</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-principle-2-get-to-end-to-end-by-simplifying">Principle 2: Get to End-to-End by Simplifying</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-principle-3-ship-it-as-fast-as-you-can">Principle 3: Ship It as Fast as You Can</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-principle-4-gradually-replace-steps-while-carefully-prioritizing">Principle 4: Gradually Replace Steps, While Carefully Prioritizing</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-principle-5-get-frequent-feedback-on-results">Principle 5: Get Frequent Feedback on Results</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-integration-with-other-tools">Integration with Other Tools</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-summary-end-to-end-iterations">Summary: End-to-End Iterations</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-part-3-summary">Part 3 Summary</a></p>
</li>
</ol>
<h3 id="heading-book-summary">Book Summary</h3>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-you-learned-about-research">What You Learned About Research</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-do-research-effectively">How to Do Research Effectively</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-ensure-product-impact">How to Ensure Product Impact</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-your-toolkit-for-research-management">Your Toolkit for Research Management</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-my-message-to-you">My Message To You</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-acknowledgements">Acknowledgements</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-contact-me">Contact Me</a></p>
</li>
</ul>
</li>
</ul>
<h2 id="heading-note">Note</h2>
<p>This book is provided for free on freeCodeCamp as described above and according to <a target="_blank" href="https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International</a>.</p>
<p>If you would like to support this book, you are welcome to buy <a target="_blank" href="https://buymeacoffee.com/omerr/e/505520">E-Book version</a>, <a target="_blank" href="https://amzn.to/46aCxnO">Paperback</a>, <a target="_blank" href="https://amzn.to/4tD1T7O">Hardc</a><a target="_blank" href="https://amzn.to/46aCxnO">over</a> , or <a target="_blank" href="https://buymeacoffee.com/omerr">buy me a coffee</a>. Thank you!</p>
<h2 id="heading-part-1-foundations-1">Part 1: Foundations</h2>
<h3 id="heading-chapter-1-what-is-research">Chapter 1 - What is Research?</h3>
<p>To manage research effectively, you first need to understand what research is, and what it is not.</p>
<p>From now on, I will use <em>Research</em> (capital R) when referring to our specific concept of research in this book, to distinguish it from general uses of the word.</p>
<p>Consider this scenario: Your team needs to optimize a critical API endpoint. It's slow, users complain, and you know exactly what to do: profile the code, identify bottlenecks, apply standard optimization techniques. This is challenging work, but it's not Research.</p>
<p>Now consider this: Your team needs to automatically extract business rules from 40-year-old COBOL codebases, consisting of millions of lines of code where the original developers are long retired. You're not even sure if extracting these rules automatically is possible. Multiple approaches might work. Or none might. This is Research.</p>
<h4 id="heading-the-difference">The Difference</h4>
<p>The distinction isn't about difficulty or technical sophistication. It's about <strong>uncertainty of approach</strong>.</p>
<p>Throughout this book, I will adopt the following definition: Research is confronting problems where you don't know if solutions exist or which approaches will work.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768912012166/8621e6a4-796a-4974-9d09-d572ff44f870.png" alt="Research definition" width="600" height="400" loading="lazy"></p>
<p><strong>Research</strong> confronts problems where:</p>
<ul>
<li><p>You don't know if a solution exists.</p>
</li>
<li><p>Multiple approaches might work, but you don't know which.</p>
</li>
<li><p>The path to success is not immediately clear.</p>
</li>
<li><p>You may need to invent new techniques.</p>
</li>
</ul>
<p><strong>Development</strong> involves:</p>
<ul>
<li><p>Applying known techniques to build specific features.</p>
</li>
<li><p>Following established approaches, even if complex.</p>
</li>
<li><p>Clear success criteria tied to working software.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768912033086/a93cea78-c671-4ab4-9d08-3767d60b9865.png" alt="Research vs Development" width="600" height="400" loading="lazy"></p>
<p>I've found Alan Schoenfeld's model of problem solving [1] to be a useful framework for defining and analyzing research. Schoenfeld, studying how people solve mathematical problems, identified four** components that determine success when facing genuinely uncertain problems. His framework applies directly to software Research:</p>
<h4 id="heading-schoenfelds-framework-for-problem-solving">Schoenfeld's Framework for Problem Solving</h4>
<p><strong>1. Knowledge Base</strong> — What you know</p>
<p>Are you familiar with relevant tools, algorithms, and techniques?</p>
<p>For COBOL business rule extraction: Do you understand COBOL syntax? Static analysis? Program comprehension techniques?</p>
<p>Without the right knowledge, you will have to spend time acquiring it before making progress, and might miss options that could be obvious to someone with more background.</p>
<p><strong>2. Heuristics</strong> — Strategies for approaching problems</p>
<p>We’ll cover heuristics in much more detail later. For now, here are some examples of effective heuristics:</p>
<ul>
<li><p>"Work backwards from the desired output".</p>
</li>
<li><p>"Break the problem into smaller pieces".</p>
</li>
<li><p>"Try a simpler version first".</p>
</li>
<li><p>"List all assumptions and test each one".</p>
</li>
</ul>
<p>For our COBOL business rule extraction case: "Start by manually extracting rules from one small program to understand what 'success' looks like".</p>
<p><strong>3. Control</strong> — Monitoring and adjusting your approach</p>
<p>Recognizing when your current strategy isn't working. Deciding when to pivot to a different approach. Managing your time and resources effectively.</p>
<p>This is what separates experienced researchers from novices: it's not just what you know, and the heuristics that you may deploy, but when and how you use them. If you choose one approach, reflect on its effectiveness, and decide to try something different when needed, that's an example of control.</p>
<p><strong>4. Beliefs and Attitudes</strong> — Your mindset toward the problem</p>
<p>Schoenfeld found that successful problem solvers held certain beliefs that helped them persist through challenges. Examples include:</p>
<ul>
<li><p>"I can figure this out" vs. "I'm not good at this kind of thing".</p>
</li>
<li><p>"Problems have multiple solutions" vs. "There's one right answer".</p>
</li>
<li><p>"I should write things down and work systematically" vs. "I should solve this in my head".</p>
</li>
</ul>
<p>These beliefs profoundly affect your ability to persist and succeed.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768912119159/af15ffb5-8691-4a26-b48e-675e0101e1d5.png" alt="Schoenfeld's Framework" width="600" height="400" loading="lazy"></p>
<h4 id="heading-why-this-matters-for-rampd-leaders">Why This Matters for R&amp;D Leaders</h4>
<p>You've likely seen this: A capable engineer spends three weeks on a Research task, trying approach after approach, until the original problem is no longer relevant. Or they declare after two days "it's impossible" and give up, though it turns out later that the problem was actually solvable.</p>
<p>The issue isn't (necessarily) capability. It's that Research requires different management, and different skills, than Development:</p>
<ul>
<li><p><strong>Knowledge base</strong>: Did they have the right background, or access to people who did?</p>
</li>
<li><p><strong>Heuristics</strong>: Were they using effective problem-solving strategies, or just "trying things"?</p>
</li>
<li><p><strong>Control</strong>: Did they know when to pivot? When to ask for help? When to break the problem down differently?</p>
</li>
<li><p><strong>Beliefs</strong>: Did they believe the problem was solvable? That systematic approaches work better than random attempts?</p>
</li>
</ul>
<p><strong>The good news</strong>: All four components can be improved. People get better at Research through practice, exposure to effective heuristics, and environments that support good Control and healthy Beliefs.</p>
<p>The rest of this book provides concrete tools, like using a Research Tree, drawing backwards, and time-boxing methods, that put Schoenfeld's framework into action in a Product-led environment. These tools help you and your team apply better heuristics, maintain effective control, and build the beliefs that sustain successful Research.</p>
<p>But first, let's make sure we're clear on when you actually need these tools. <a class="post-section-overview" href="#heading-chapter-2-research-and-development">The next chapter</a> dives deeper into distinguishing Research from Development work.</p>
<h4 id="heading-notes">Notes</h4>
<p>** Actually, Schoenfeld (1992) described five components (which he terms "categories"), but I focused on four of them. For the curious reader – the one I skipped is called "Practices" – the habits and cultural norms of the mathematical environment that shape how a student approaches a problem. I chose to skip it as applying it to Research felt artificial.</p>
<h4 id="heading-references">References</h4>
<p>[1] Schoenfeld, A. H. (1992). Learning to think mathematically: Problem solving, metacognition, and sense-making in mathematics. In D. Grouws (Ed.), Handbook for Research on Mathematics Teaching and Learning (pp. 334-370). New York: MacMillan.</p>
<h3 id="heading-chapter-2-research-and-development">Chapter 2 - Research and Development</h3>
<p>Most R&amp;D departments have plenty of Development work. Everyone agrees on what Development is. But Research? That's murkier.</p>
<p>Some claim every development task involves "research" – you have to test your code, try different things, read documentation. Is this Research?</p>
<p>Let's be precise.</p>
<h4 id="heading-defining-research-vs-development">Defining Research vs. Development</h4>
<p>We established in <a class="post-section-overview" href="#heading-chapter-1-what-is-research">chapter 1</a> the core distinction: Research involves fundamental uncertainty about whether solutions exist and which approaches will work, while Development applies known techniques to build specific features.</p>
<p>With this foundation, let's explore how to identify Research in practice.</p>
<h4 id="heading-a-quick-test">A Quick Test</h4>
<p>You're asked to reverse-engineer a specific compiled function: disassemble it and provide the equivalent code in C language. You know assembly, you know C, you have a disassembler. Is this Research?</p>
<p><strong>No.</strong> You know how to proceed. It might take three days of careful work, especially if the function is complex, but it's not Research. You're applying known techniques, and know how to progress to a solution.</p>
<p>But if you need to understand how an entire program operates, and one <em>possible</em> approach is reverse engineering its compiled form, and you're not sure if that approach is even feasible time-wise or whether it will yield the insights you need? <strong>That's Research.</strong></p>
<h4 id="heading-key-indicators-youre-doing-research">Key Indicators You're Doing Research</h4>
<p><strong>1. Fundamental Uncertainty About Solution Viability</strong></p>
<p>You're asking "Can this even be done?" rather than "How should we do this?" This isn't about implementation details – rather, it's about whether the approach itself is viable.</p>
<p><strong>2. Multiple Competing Approaches Without Clear Superiority</strong></p>
<p>Research often means exploring several paths simultaneously, knowing that many will fail, to discover which approach (if any) can solve the problem.</p>
<p><strong>3. Need for New Fundamental Techniques</strong></p>
<p>You may need to invent new methods rather than adapting existing ones. Note: Not all Research creates new techniques, but the possibility exists.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768913247156/98b7a1b5-fc3d-46f1-af91-f6a9608eb229.png" alt="98b7a1b5-fc3d-46f1-af91-f6a9608eb229" width="600" height="400" loading="lazy"></p>
<h4 id="heading-common-misconceptions">Common Misconceptions</h4>
<p><strong>Misconception 1: Technical Complexity = Research</strong></p>
<p>Many challenging Development tasks involve sophisticated algorithms, large-scale systems, or cutting-edge technologies without requiring Research approaches.</p>
<p>Building a distributed system with complex consensus algorithms? Challenging Development. Figuring out whether a distributed system <em>can</em> meet your latency requirements given your unusual constraints? Might be Research.</p>
<p>Technical complexity is not the same as Research.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768913954691/72542ee4-e1cd-47c2-9e6d-9716a9567172.png" alt="Technical complexity is not the same as Research" width="600" height="400" loading="lazy"></p>
<p><strong>Misconception 2: Using Advanced Algorithms = Research</strong></p>
<p>Implementing machine learning pipelines with random forests or neural networks isn't Research – even though the underlying algorithms are sophisticated. The Research happened when those algorithms were first developed. However, if you are using those algorithms when trying to solve a problem where it's unclear if they will work at all, that could be Research.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768914062832/e07be911-526f-4869-80a7-a66c8a1f9a0d.png" alt="Using Advanced Algorithms is not the same as Research" width="600" height="400" loading="lazy"></p>
<p><strong>Misconception 3: Research Can Be Managed Like Development</strong></p>
<p>Perhaps the most damaging misconception. This leads to:</p>
<ul>
<li><p>Demanding precise time estimates for uncertain work.</p>
</li>
<li><p>Expecting steady, measurable progress on fixed schedules.</p>
</li>
<li><p>Evaluating Research with Development metrics.</p>
</li>
</ul>
<p>Research requires different approaches. This is exactly what the rest of this book addresses.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768914002340/4fb06aa4-7ed3-4eb8-b011-8cb0131acd1c.png" alt="Managing Research is different from Managing Development" width="600" height="400" loading="lazy"></p>
<p><strong>Misconception 4: Research Cannot Be Managed</strong></p>
<p>The opposite extreme: treating Research as mystical work by brilliant people. The best a manager can do is not interrupt.</p>
<p>I've had the pleasure and privilege to work with many extremely skilled researchers. I can confidently say that this is simply not the case, as even the most talented researcher can benefit from skillful guidance.</p>
<p>Specifically, even the most talented researcher benefits from:</p>
<ul>
<li><p>Clear connections between their work and product goals.</p>
</li>
<li><p>Structured approaches to exploring alternatives.</p>
</li>
<li><p>Regular checkpoints to assess direction.</p>
</li>
<li><p>Team collaboration and brainstorming.</p>
</li>
</ul>
<p>Research is not magic, it <em>can</em> be managed effectively.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768914021783/ebba2bbc-303c-4c70-9e4d-bc716b7bd782.png" alt="Research is not magic, and it can be managed" width="600" height="400" loading="lazy"></p>
<h4 id="heading-your-role-as-research-leader">Your Role as Research Leader</h4>
<p>When leading Product-led Research, your job has two parts:</p>
<p><strong>1. Ensure the Research makes the biggest possible impact on the product</strong></p>
<p>This is your most important responsibility. "Successful" Research that doesn't impact the product is a failed project. Your job is to maintain the connection between Research work and product value – not just at the start, but continuously.</p>
<p>This means:</p>
<ul>
<li><p>Starting with clear product needs, not interesting technical questions.</p>
</li>
<li><p>Regularly validating that the Research still serves the product goal.</p>
</li>
<li><p>Making trade-offs between thorough exploration and shipping impact.</p>
</li>
</ul>
<p>We'll cover this in detail in <a class="post-section-overview" href="#heading-part-3-ensuring-product-impact">Part 3</a>.</p>
<p><strong>2. Ensure the Research is done in the most effective manner</strong></p>
<p>Even brilliant researchers benefit from structured approaches. Your role is to help the team work systematically rather than randomly.</p>
<p>This means:</p>
<ul>
<li><p>Helping identify which questions are worth answering.</p>
</li>
<li><p>Introducing better heuristics when the team is stuck ("Let's work backwards," "Let's time-box this investigation").</p>
</li>
<li><p>Preventing common failure modes like endless exploration or premature commitment.</p>
</li>
</ul>
<p>(We'll cover this in detail in <a class="post-section-overview" href="#heading-part-2-research-management-methods">Part 2</a>.)</p>
<p>Responsibility (1) is defining the right goals. Responsibility (2) is reaching these goals effectively.</p>
<p>The rest of this book provides concrete tools for both responsibilities.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768914050928/99978f75-033e-4aeb-bfa2-837909d1081a.png" alt="Your Role as Research Leader" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-part-1-summary">Part 1 - Summary</h3>
<p><strong>Research</strong> is work where the path to success is not immediately obvious. It requires:</p>
<ul>
<li><p>Knowledge of relevant domains.</p>
</li>
<li><p>Effective problem-solving strategies (heuristics) .</p>
</li>
<li><p>Monitoring and adjusting approaches (control).</p>
</li>
<li><p>Healthy beliefs and persistence.</p>
</li>
</ul>
<p><strong>Your role</strong> as a Research leader is to:</p>
<ol>
<li><p>Ensure Research connects to product impact.</p>
</li>
<li><p>Ensure Research is done effectively.</p>
</li>
</ol>
<p><strong>You need different tools</strong> to manage Research versus Development – in order to make sure Research is done effectively. Part 2 provides those tools.</p>
<p>Part 3 will focus on connecting Research to product impact.</p>
<h2 id="heading-part-2-research-management-methods-1">Part 2: Research Management Methods</h2>
<h3 id="heading-chapter-3-why-methodology-matters-a-true-story">Chapter 3 - Why Methodology Matters: A True Story</h3>
<p>It was a late evening, and the classroom was filled with three dozen students. They were all sitting in front of their computers, working in silence. I was leading a cybersecurity training focused on reverse engineering. Each exercise included a single compiled program without its source code, with one goal: "understand how this program works." The output would be either a detailed document, an equivalent program implemented in a high-level language, or both.</p>
<p>This particular evening, the students were furiously working on reverse-engineering a game. The instruction was: "understand the game's rules, and document them thoroughly." The game had a user interface with two dimensions and could be played against the computer. Moving behind the students, I could see their screens with various reverse engineering tools open.</p>
<p>At some point we asked them to stop, turn around and look at the instructor. The instructor then provided a guided solution – this was a technique we used quite frequently, showing the students the "right" way to approach a problem they had spent some time tackling. The instructor opened the game, looked at the screen, opened the "File" menu, clicked on "Help" – and there it was, the entire description of the game rules.</p>
<h4 id="heading-the-lesson">The Lesson</h4>
<p>The room erupted in nervous laughter. Some students looked embarrassed. Others seemed frustrated. But everyone understood the point.</p>
<p>These were capable people. They had the relevant knowledge base, in Schoenfeld's terms, as presented in <a class="post-section-overview" href="#heading-chapter-1-what-is-research">chapter 1</a>. That is, they had the relevant technical skills – they knew both assembly and C, they knew how to use disassemblers, debuggers, and all the sophisticated tools of reverse engineering. Yet they had missed something fundamental: <strong>checking if there was a simpler solution first</strong>.</p>
<p>This happens all the time in Research work.</p>
<p>A team spends weeks diving deep into a complex technical approach, when a simpler path existed that they never explored. Or they try one thing, then another, then another – never systematically evaluating which approaches make sense before starting.</p>
<p><strong>The problem isn't capability. It's approach.</strong></p>
<p>This is exactly why you need structured methods for Research management. Without them, even your most talented people will waste time, miss obvious solutions, and burn out trying random approaches.</p>
<p>As a reminder, in <a class="post-section-overview" href="#heading-chapter-2-research-and-development">chapter 2</a>, we discussed your role as a Research leader:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768916329893/c498be3d-8ef7-4dc4-9046-96f0ed2159da.png" alt="Your Role as Research Leader" width="600" height="400" loading="lazy"></p>
<p>Focusing on responsibility (2), the illustration here shows how the same peak can be reached via difficult climbing (reverse engineering) or by using a hot air balloon (reading the Help menu). Part of your job as a Research leader is to help your team find the easiest path to the goal.</p>
<h4 id="heading-what-this-part-covers">What This Part Covers</h4>
<p>The rest of Part 2 provides concrete tools to prevent these problems. You'll learn:</p>
<ul>
<li><p><strong>The Research Tree</strong> – A visual framework for systematically exploring solution paths.</p>
</li>
<li><p><strong>Time-boxing methods</strong> – How to limit exploration without killing creativity.</p>
</li>
</ul>
<p>These aren't abstract concepts. They're battle-tested techniques that directly address the failure modes you've probably seen: teams spinning their wheels, giving up too early, or getting stuck on the wrong approach.</p>
<p>Let's get started.</p>
<h3 id="heading-chapter-4-the-research-tree">Chapter 4 - The Research Tree</h3>
<p>You've probably seen this: An engineer/researcher starts down one path, hits an obstacle, tries something else, hits another obstacle, then tries a third approach. Three weeks later, they're still stuck – or worse, they've built something that technically works but doesn't solve the actual problem.</p>
<p>The issue isn't persistence. It's that they never mapped out the solution space. They never visualized <strong>which</strong> approaches might work, <strong>which questions</strong> need answering, and <strong>how</strong> everything relates to each other.</p>
<p><strong>The Research Tree solves this problem.</strong></p>
<p>It’s a way to visualize and manage the <strong>Control</strong> component of Schoenfeld's framework from <a class="post-section-overview" href="#heading-chapter-1-what-is-research">chapter 1</a> – monitoring and adjusting your approach. This is where most Research efforts struggle.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768916536009/1b4688f5-42f5-442c-b0cd-4f518a88dc63.png" alt="Reminder: Schoenfeld's Framework" width="600" height="400" loading="lazy"></p>
<h4 id="heading-what-is-a-research-tree">What Is a Research Tree?</h4>
<p>A Research Tree is a living visual representation of your Research journey. It captures three things:</p>
<ol>
<li><p><strong>Possible solution paths</strong> – different approaches you might take.</p>
</li>
<li><p><strong>Open questions</strong> – what you need to learn.</p>
</li>
<li><p><strong>Closed questions</strong> – what you've already discovered.</p>
</li>
</ol>
<p>Unlike a static plan, the Research Tree grows and changes as you learn (which is one reason I like the name "tree" 😇). You start with what you know, then update it continuously as you investigate. Dead ends get marked. New branches appear. Questions get answered and new questions emerge.</p>
<p>Think of it like this: You're exploring a cave system. You don't have a map – you're <em>creating</em> the map as you explore. You mark passages you've tried. You note which ones are dead ends. You write down questions: "Does this passage connect to the main chamber?" "Is there water down this route?" As you explore, you answer some questions and discover new ones you hadn't considered. You write down the answers you found, and the experiments you conducted to find them ("I tried going left, hit a wall after 50 feet").</p>
<p>Research works the same way. The Research Tree is both your map and your log.</p>
<h4 id="heading-your-first-research-tree">Your First Research Tree</h4>
<p>Let's build one together. Consider a common engineering challenge:</p>
<p><strong>Goal: Reduce API response time from 800ms to under 100ms</strong></p>
<p>(Note: despite not being a real Research task, I chose this example for its simplicity to illustrate the process of creating a Research Tree. It can also show you how Research Trees can be useful in a variety of scenarios.)</p>
<p>You start with a fundamental question that needs answering. What's the first thing you need to know?</p>
<p>Take a moment to think about this before reading on.</p>
<p>The first question is usually: <strong>Where is the bottleneck?</strong></p>
<p>Until you answer this, you don't know which approaches make sense. Let's draw this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768916610613/f1f1cfb1-3b03-4ba9-b7b7-6883582ac972.png" alt="Initial Research Tree" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>You can use a simple pen and paper, a whiteboard, or a digital tool to draw this out. When creating your very first tree, I highly recommend doing it by hand – the physical act of drawing will help you feel comfortable with the process.</p>
<p>Now, how can you answer this question? What approaches might tell you where the bottleneck is?</p>
<p>You might identify:</p>
<ul>
<li><p>Profile the application with a performance monitoring tool.</p>
</li>
<li><p>Add detailed logging to measure each operation.</p>
</li>
<li><p>Use database query analysis tools.</p>
</li>
</ul>
<p>Let's add these as branches:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768916651252/2c0fa347-f3d8-4c45-88f1-aaa708d4dfbd.png" alt="Research Tree with a few directions" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>(Note: the Brown status means "uncertain, needs investigation" – more on this later.)</p>
<p>Each approach is an investigation you could run to answer the question.</p>
<h4 id="heading-the-power-of-stopping-to-think">The Power of Stopping to Think</h4>
<p>Before we go further, notice what just happened. <strong>You stopped.</strong></p>
<p>Instead of immediately jumping into "Let's add more logging!" or "I bet it's the database, let's check queries," you identified multiple possible approaches. You're looking at three different ways to answer the same question.</p>
<p>This is already valuable. Most engineers would have jumped straight into whichever approach came to mind first. Maybe you've done this yourself: spent two days adding detailed logging, only to discover later that a profiler would have given you the answer in 30 minutes.</p>
<p>By creating this tree, you've avoided that trap. You can see all the approaches before committing to any of them. It doesn't guarantee you will choose the "right" path - you can't always do that in advance – but it will minimize the chances of you omitting it completely.</p>
<p>Remember the reverse engineering students from <a class="post-section-overview" href="#heading-chapter-3-why-methodology-matters-a-true-story">chapter 3</a>? They never created this tree. They jumped straight to the first approach they knew: disassemblers and debuggers. They didn't stop to think: "What are all the ways we could understand this game's rules?" If they had, they would have listed approaches like: reverse engineer the binary, check the Help menu, just play the game, examine config files, watch network traffic. And if they'd evaluated those approaches using the framework you're about to learn, "check the Help menu" would have scored perfectly: fastest feedback (30 seconds), lowest cost (zero), best coverage (complete rules). Instead, they spent hours on complex reverse engineering when a simple menu click would have worked. Don't be those students.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768916696569/d5802e53-a6b2-46ec-846a-dd05043a7787.png" alt="A simple tree for the game makes it clear starting with reversing is the wrong option" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><strong>Now comes the critical question: Which branch do you try first?</strong></p>
<p>Consider our tree again:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768916841081/ae633e0b-891e-47f9-aa34-b37619a20802.png" alt="Research Tree with a few directions" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-choosing-your-first-path">Choosing Your First Path</h4>
<p>You're looking at three approaches: Profile, Logging, DB Analysis. Each is a <strong>heuristic</strong> (in Schoenfeld's terms, as presented in <a class="post-section-overview" href="#heading-chapter-1-what-is-research">chapter 1</a>) – a problem-solving strategy that might work. How do you decide which one to try?</p>
<p>Sometimes, the answer is obvious (I wouldn't really use a framework to check if there's a "Help" menu). Let me show you the framework I use when it's not so clear which path to choose. Ask yourself these questions:</p>
<p><strong>1. Which gives the fastest feedback?</strong></p>
<p>How long until you get an answer?</p>
<ul>
<li><p>Profile: Can run in 10 minutes. Setup is probably 5 minutes.</p>
</li>
<li><p>Logging: Need to add logging code, deploy, wait for traffic – maybe 4 hours.</p>
</li>
<li><p>DB Analysis: Need to enable slow query log, wait for data – maybe 2 hours.</p>
</li>
</ul>
<p><strong>Fastest feedback wins.</strong> You want to learn quickly.</p>
<p><strong>2. Which has the lowest cost?</strong></p>
<p>What do you need to set up or change?</p>
<ul>
<li><p>Profile: Just attach a profiler – no code changes.</p>
</li>
<li><p>Logging: Need to modify code, test, deploy.</p>
</li>
<li><p>DB Analysis: Need database permissions, might need config changes.</p>
</li>
</ul>
<p><strong>Lower cost wins.</strong> Why spend hours adding logging if you can get the answer without changing any code?</p>
<p>(In your environment, it may be different. Perhaps logging is really easy, while profiling is super hard to set up. I am not claiming that profiling is a better heuristic than logging – it depends on your circumstances.)</p>
<p><strong>3. Which answers the most questions?</strong></p>
<p>Some approaches answer not just your immediate question, but related questions, too.</p>
<ul>
<li><p>Profile: Shows you CPU, memory, database, network – a complete picture.</p>
</li>
<li><p>Logging: Only shows what you logged.</p>
</li>
<li><p>DB Analysis: Only shows database queries.</p>
</li>
</ul>
<p><strong>Broader coverage wins.</strong> A profiler might show you that 70% of time is database queries <em>and</em> that 20% is network latency – information you wouldn't get from narrow approaches.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768916912029/ad429b0b-8cda-4af7-86e2-7687396aa401.png" alt="Prioritizing heuristics framework" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>So this is an easy one. <strong>Profiling wins on all three criteria.</strong> That's your first approach to try.</p>
<p>Update your tree:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768917127720/b5f2a038-ba28-41b6-9a8c-bd4157c31873.png" alt="Start with profiling" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>This doesn't mean the other approaches are bad, or that this one will necessarily turn out to be the best. It means that profiling is the best <em>starting point</em> given what you know right now.</p>
<h4 id="heading-what-if-your-first-choice-doesnt-work">What If Your First Choice Doesn't Work?</h4>
<p>Let's say you try profiling and hit a problem: Your profiler can't attach to the production environment due to security restrictions.</p>
<p><strong>This is valuable information.</strong> Mark Profile as Red and move to your next best option:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768917189119/381686ce-c6e5-4ee2-9d09-7eea2d42d305.png" alt="Profiling failed, pivot to logging" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now you try Logging. But notice: You didn't waste days trying Profile in production. You tried it, hit a blocker, immediately pivoted to Logging. The tree helped you move quickly.</p>
<h4 id="heading-choosing-when-everything-seems-equal">Choosing When Everything Seems Equal</h4>
<p>Sometimes multiple approaches look equally good. Profile and DB Analysis might both be fast, low-cost, and have decent coverage. How do you choose?</p>
<p><strong>Pick one and move on.</strong> Don't spend an hour analyzing which approach to try for 30 minutes. The meta-work (deciding) shouldn't take longer than the actual work (trying it).</p>
<p>When approaches seem equal, ask: "Which one am I more familiar with?" or "Which one does the team have experience with?" Use your judgment, make a choice, and start learning.</p>
<p>The worst decision is no decision.</p>
<p>In general, this approach might feel like overkill. Should you really sketch out trees and compare branches before actually doing something?</p>
<p>The surprising answer is that while almost always it feels like overkill, almost every single time, it turns out to be worth it. Try it a few times and you will see for yourself.</p>
<h4 id="heading-heuristics-can-be-combined">Heuristics Can Be Combined</h4>
<p>Sometimes you don't have to choose just one. You might run Profile <em>and</em> enable DB Analysis simultaneously. If they don't conflict and you have the time, parallel investigations can be powerful.</p>
<p>But be careful: Don't try to do everything at once. Start with your best option. If that doesn't fully answer your question, then add another approach.</p>
<h4 id="heading-how-answers-lead-to-new-questions">How Answers Lead to New Questions</h4>
<p>After marking "Profile" as red, you moved on to "Logging". You added a few indicative log messages and let the system run for a day. You discover: <strong>70% of response time is database queries</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768917484811/7f82dd85-01b4-4f6b-a793-842b74103b68.png" alt="Database is the bottleneck" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>This answer eliminates the need for other approaches (you don't need DB Analysis now – you found the answer). But more importantly, it reveals new questions:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768917538250/5e47f40c-e998-4c8c-afa3-3ad047577504.png" alt="New questions emerge" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>See how the tree grows? One answered question spawns two new questions. Each of these new questions will have its own approaches for answering them.</p>
<p>And you'll apply the same framework to choose which question to answer first: Which gives fastest feedback? Which has lowest cost? Which answers the most questions?</p>
<p>Let's expand "Which queries are slowest?":</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768917712109/31c73b0c-aa1b-4e64-965c-1643fb6e8698.png" alt="Research tree expanding" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Again, you'd evaluate: Which approach gives fastest feedback? Enabling slow query log is probably fastest, as you’d just flip a config flag and wait a few minutes.</p>
<p>You decide to enable the slow query log. After investigating, you discover: <strong>User profile queries are slowest: they make 15 separate database calls (N+1 problem)</strong>.</p>
<p>(What is the N+1 problem in this context? It means that when fetching user profiles, the code first queries for a list of users (1 query), then for each user, it makes an additional query to fetch related data (N queries). If there are 15 users, that's 16 queries total. This is inefficient and slows down response time.)</p>
<p>This answer leads to a new question:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768917756381/566e2fba-e47d-48b7-aa91-531e7a2855a8.png" alt="New question about fixing N+1" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now you have three solution approaches. Again, evaluate them:</p>
<ul>
<li><p>Rewrite queries with JOINs: Fast to implement, proven pattern.</p>
</li>
<li><p>Add Eager Loading: Depends on your ORM, might be quick.</p>
</li>
<li><p>DataLoader Pattern: Requires learning new pattern, takes longer.</p>
</li>
</ul>
<p>Rewrite queries with JOINs probably gives the fastest feedback if your team knows SQL well.</p>
<h4 id="heading-the-complete-picture">The Complete Picture</h4>
<p>Let's see how the full tree looks after a few days of investigation:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768917875604/39d6c595-1fb3-4e1d-8be3-e0015fd2e642.png" alt="Completed research tree example" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>(Note: for the "How many queries per request?" node – I skipped the approaches for brevity.)</p>
<p><strong>Reading this tree:</strong></p>
<ol>
<li><p>We started with "Where is the bottleneck?" and chose profiling (fastest feedback).</p>
</li>
<li><p>We found out the profiler can't attach to the production environment due to security restrictions, so we pivoted to logging.</p>
</li>
<li><p>Logging revealed that 70% of response time is database queries.</p>
</li>
<li><p>That answer led to two new questions about specific queries.</p>
</li>
<li><p>For "Which queries are slowest?", we chose slow query log (fastest to enable).</p>
</li>
<li><p>Answering it led to another question about fixing the N+1 problem.</p>
</li>
<li><p>For "How can we fix N+1?", we evaluated the approaches and chose "Rewrite queries with JOINs" (team knows SQL, fastest to implement).</p>
</li>
<li><p>Meanwhile, "Can we reduce query count?" is still open and has its own approaches to investigate.</p>
</li>
</ol>
<h4 id="heading-color-coding-status">Color-Coding Status</h4>
<p>When you're creating your Research Tree, you'll mark both questions and approaches with a particular status. Of course, the following specific colors are just suggestions – the important thing is to keep something consistently so you can quickly see the status at a glance.</p>
<p><strong>For Questions:</strong></p>
<ul>
<li><p><strong>Open</strong>: Not yet answered.</p>
</li>
<li><p><strong>Closed</strong>: Answered (show the answer).</p>
</li>
<li><p><strong>Blocking</strong>: Must answer before proceeding with an approach.</p>
</li>
</ul>
<p><strong>For Approaches:</strong></p>
<ul>
<li><p><strong>Green</strong>: Viable, worth pursuing.</p>
</li>
<li><p><strong>Brown</strong>: Uncertain, needs investigation.</p>
</li>
<li><p><strong>Red</strong>: Dead end or not viable.</p>
</li>
</ul>
<p>In our example above:</p>
<ul>
<li><p>"Rewrite with Joins" is Green because we've identified that it addresses the specific N+1 problem and that the team is confident in implementing it.</p>
</li>
<li><p>"Redesign API" is Red because it would take too long for this project.</p>
</li>
<li><p>Other approaches are Brown because we haven't investigated them yet.</p>
</li>
</ul>
<h4 id="heading-additional-tips">Additional Tips</h4>
<p>Keeping the tree clean and simple is important, and obsessing over its looks and details really misses the point. That said, some readers will find benefits by adding a few more details to the tree, specifically:</p>
<ol>
<li><p><strong>Order</strong>: add a number next to a specific branch when tackling it, so it is easy to track which direction you tried first, which one followed and so on.</p>
</li>
<li><p><strong>Pivot Explanations</strong>: if you chose to pivot from one branch to another, write why. This might help when you revise your decisions later, or when reviewing with your team (as described <a class="post-section-overview" href="#heading-using-the-tree-with-your-team">later in this chapter</a>).</p>
</li>
</ol>
<h4 id="heading-the-research-tree-prevents-common-pitfalls">The Research Tree Prevents Common Pitfalls</h4>
<p>The Research Tree with this decision-making framework addresses five critical failure modes:</p>
<p><strong>1. Jumping on First Idea:</strong> Without a tree, people implement the first approach they think of. The tree forces you to identify alternatives and evaluate them systematically before starting.</p>
<p><strong>2. Tunnel Vision:</strong> Even when considering alternatives in the beginning, people tend to lock onto one approach and not pivot from it even when it turns out to be the wrong choice. The tree makes alternatives visible and helps you not only choose the best starting point, but also reevaluate continuously.</p>
<p><strong>3. Inefficient Learning</strong>: Teams might try expensive, slow approaches first when faster, cheaper ones exist. The decision framework helps you learn quickly.</p>
<p><strong>4. Answering Questions You Don't Need To:</strong> Teams waste time investigating interesting but irrelevant questions. The tree shows how questions connect – you only need to answer questions that lead to your goal.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768918009905/39f7577b-9d71-4ef8-b206-ec2d2d1027b6.png" alt="The Research Tree prevents common pitfalls" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-time-to-practice">Time to Practice</h4>
<p>Open your favorite drawing tool, or just grab a piece of paper. Think of a Research problem you're currently facing or recently faced.</p>
<p>Now answer these questions:</p>
<ol>
<li><p>What's your goal? Write it at the top.</p>
</li>
<li><p>What's the first question you need to answer? Write it below the goal.</p>
</li>
<li><p>What are 2-4 approaches to answer that question? Draw them as branches.</p>
</li>
<li><p>For each approach, evaluate:</p>
<ul>
<li><p>How fast is the feedback?</p>
</li>
<li><p>What's the cost?</p>
</li>
<li><p>How much does it answer?</p>
</li>
</ul>
</li>
<li><p>Which approach scores best? Mark it "TRY FIRST".</p>
</li>
</ol>
<p>Actually do this. Don't just read and think "I understand." Drawing the tree and evaluating approaches forces you to be explicit, and you'll immediately see gaps in your thinking.</p>
<p>I'll wait.</p>
<p>...</p>
<p>Don't worry about me, I'm enjoying some really great coffee in the meanwhile.</p>
<p>...</p>
<p>Done? Good. You now have your first Research Tree with a clear starting point.</p>
<h4 id="heading-using-the-tree-with-your-team">Using the Tree with Your Team</h4>
<p>Research Trees become even more powerful when shared with a team. It actually provides you, the Lead, with a way to see what directions the team is executing upon, and why. Your job here is to make sure the framework is used. Help your team stop and ask: are we asking the right questions? Are there approaches that we missed? Are we choosing the right approach?</p>
<p><strong>During Planning:</strong></p>
<ul>
<li><p>Draw the tree together as a group.</p>
</li>
<li><p>Brainstorm questions and approaches.</p>
</li>
<li><p>Evaluate approaches using the framework (fastest feedback, lowest cost, best coverage).</p>
</li>
<li><p>Everyone sees <em>why</em> you're trying a particular approach first.</p>
</li>
</ul>
<p><strong>During Execution:</strong></p>
<ul>
<li><p>Update the tree as you learn.</p>
</li>
<li><p>When stuck, revisit the tree to identify alternative approaches.</p>
</li>
<li><p>Make sure you consider whether you are asking all of the important questions, and whether you are considering all relevant approaches.</p>
</li>
<li><p>If you pivoted from a branch, explain your reason and ask if someone can challenge your logic.</p>
</li>
<li><p>Conduct regular tree reviews (weekly or bi-weekly).</p>
</li>
</ul>
<p>Note that the Research Tree is also useful for one-on-one sessions: you can review the tree with individual team members to understand their progress and help them choose next steps. It actually makes the Control component of Schoenfeld's framework much easier to manage - as you see the variolus questions and approaches laid out visually.</p>
<h4 id="heading-tools">Tools</h4>
<ul>
<li><p>Pen and paper (seriously, this works great).</p>
</li>
<li><p>Whiteboard (for team sessions).</p>
</li>
<li><p>Miro, Mural, or similar digital whiteboards.</p>
</li>
<li><p>Mind mapping software (XMind, MindNode, and so on).</p>
</li>
<li><p>Even a simple text file with indentation.</p>
</li>
</ul>
<p>The tool doesn't matter. What matters is that the tree exists, is visible, and gets updated.</p>
<h4 id="heading-pro-tips">Pro Tips</h4>
<p><strong>Start with the most important question</strong></p>
<p>Don't try to list all possible questions upfront. Start with the one question that, if answered, would most clarify your path forward. Answer it, then see what new questions emerge. More on finding the questions to start from in <a class="post-section-overview" href="#heading-chapter-7-drawing-backwards">chapter 7</a>.</p>
<p><strong>Show how answers lead to new questions</strong></p>
<p>When you close a question, immediately ask: "What new questions does this answer reveal?" Draw those as branches from the answer.</p>
<p><strong>Update questions weekly</strong></p>
<p>In your weekly check-ins, explicitly review: Which questions did we close? What did we learn? Which new questions emerged? Which questions are blocking progress?</p>
<p><strong>Re-evaluate when context changes</strong></p>
<p>If you learn something new that changes the evaluation (maybe a tool you thought was fast turns out to be slow), re-evaluate your approaches. The tree is living – update it.</p>
<h4 id="heading-recap-the-research-tree">Recap - The Research Tree</h4>
<p>The Research Tree is a living visual framework that:</p>
<ul>
<li><p>Shows questions you need to answer to reach your goal.</p>
</li>
<li><p>Maps approaches for answering each question.</p>
</li>
<li><p>Helps you choose the best starting point for each question.</p>
</li>
<li><p>Prevents jumping on the first idea without considering alternatives.</p>
</li>
<li><p>Captures how answers lead to new questions.</p>
</li>
<li><p>Tracks status of questions (open/closed/blocking) and approaches (green/brown/red).</p>
</li>
<li><p>Documents the investigation path so the team understands why decisions were made.</p>
</li>
<li><p>Evolves as you learn – questions get answered, new questions emerge.</p>
</li>
</ul>
<p><strong>Key structure:</strong></p>
<ul>
<li>Goal → Question → Approaches to answer it → Answer → New questions</li>
</ul>
<p><strong>Decision framework for choosing approaches:</strong></p>
<ol>
<li><p>Which gives fastest feedback?</p>
</li>
<li><p>Which has lowest cost?</p>
</li>
<li><p>Which answers the most questions?</p>
</li>
</ol>
<p>In the next chapter, you'll learn how to manage execution using time-boxing and decision points to keep your Research moving forward without getting stuck.</p>
<h3 id="heading-chapter-5-time-boxing-research-explorations">Chapter 5 - Time-Boxing Research Explorations</h3>
<p>In the previous chapter, you learned about the Research Tree, a powerful tool for visualizing and managing Research efforts. The tree helps you systematically explore different solution paths and keep track of open questions. It also helps you decide which path to try first.</p>
<p>But once you've chosen a branch, how long should you pursue it before stepping back to reconsider?</p>
<h4 id="heading-the-problem-research-without-time-limits">The Problem: Research Without Time Limits</h4>
<p>A researcher investigates whether a machine learning model can predict code complexity. Day one goes well. Day two, they need more features. Day three, a different architecture might work better. They switch. Day four, the new architecture needs different preprocessing.</p>
<p>Two weeks later, they're still on this path. When you ask about trying alternatives, they say "I'm close. Just need a few more days."</p>
<p>Three weeks in, they admit this approach isn't viable. Meanwhile, a simpler approach sat unexplored on the Research Tree.</p>
<p>Without a defined checkpoint, there's no natural moment to ask: "Given what I've learned, is this still the best path?" The sunk cost fallacy (continuing an approach because you've already invested time, rather than because it's still the best option) takes over. This is extremely common by Researchers, who tend to be dedicated, brilliant people who get fixated on problems they try to solve.</p>
<h4 id="heading-time-boxing-creating-decision-points">Time-Boxing: Creating Decision Points</h4>
<p>Let's face it: it's hard, even impossible, to estimate how long a Research task will take. But you should still provide time limits based on how long you're willing to invest before reconsidering.</p>
<p><strong>Time-boxing provides mandatory decision points.</strong></p>
<p>Note that we are not talking about deadlines, but decision points. These are moments where you stop and evaluate: What did I learn? Is this still the most promising path?</p>
<p>As a rule, for every task longer than a day, define a time limit. For example: "I'll spend three days investigating whether we can cluster files into logical folders based on their I/O operations."</p>
<p>Three things can happen:</p>
<ol>
<li><p><strong>Early success:</strong> The researcher figures it out in a few hours. Done early, move to the next question.</p>
</li>
<li><p><strong>Early blocker:</strong> After one hour, they discover they don't have access to filenames. They can immediately reconsider: "Without filenames, is this viable?"</p>
</li>
<li><p><strong>Time box expires:</strong> Three days pass with partial progress. Now comes the mandatory decision point.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768918238122/eab625ca-20d9-4c9c-824b-c057d4248fac.png" alt="For every task that is longer than a day, define a time limit" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-the-decision-point">The Decision Point</h4>
<p>When your time box expires, stop. Pull out your Research Tree (yes, you already love that Tree). Ask three questions:</p>
<p><strong>1. What was your goal in pursuing this direction?</strong></p>
<p>Which question were you trying to answer? Why did you choose this approach over alternatives?</p>
<p><strong>2. What did you learn?</strong></p>
<p>Document your discoveries, even if incomplete:</p>
<ul>
<li><p>"The approach works but needs more sophisticated algorithms than we thought."</p>
</li>
<li><p>"We need data we don't currently have."</p>
</li>
<li><p>"This is harder than expected – would take at least 2 more weeks."</p>
</li>
<li><p>"We're 70% there – just need to handle edge cases."</p>
</li>
</ul>
<p><strong>3. Given what you learned, is this still the most promising path?</strong></p>
<p>Look at your Research Tree. You have other branches. Given what you now know, is continuing the best use of time?</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768918412614/502819a3-508c-4f6f-81da-42ffefcff900.png" alt="When the limit expires, stop to reconsider your next steps" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>You have three options:</p>
<p>You can <strong>continue with a new time box:</strong> "I've solved the core challenge. One more day for edge cases." Define the new time box. The key: you consciously decided to continue based on what you learned, not inertia.</p>
<p>You can <strong>pivot:</strong> "This would take two more weeks, and I'm not confident it'll work. There's a simpler approach on my tree." Mark this branch Red. Move to a different branch.</p>
<p>Or you can <strong>reconsider the question:</strong> "Files in this codebase don't have clear I/O patterns. Maybe I should try clustering by function dependencies instead." Go back to your tree and identify a different question.</p>
<h4 id="heading-example-detecting-god-objects">Example: Detecting God Objects</h4>
<p>You're detecting God Objects (classes that do too much) using static analysis. Your Research Tree shows three approaches: complexity metrics, method naming patterns, or machine learning on AST features.</p>
<p>You choose complexity metrics (fastest feedback, lowest cost) and set a 2-day time box.</p>
<p><strong>Day 1-2:</strong> You implement <a target="_blank" href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a> and <a target="_blank" href="https://en.wikipedia.org/wiki/Source_lines_of_code">SLOC</a> metrics. Results: 65% accuracy, but 40% false positives.</p>
<p><strong>Decision point:</strong> You stop. Looking at your tree, method naming patterns might provide complementary information. You decide to spend 1 day exploring whether combining both approaches improves accuracy.</p>
<p><strong>Day 3:</strong> Combined approach: 78% accuracy, 25% false positives. Promising.</p>
<p><strong>New decision:</strong> Time-box 3 more days to refine and test on larger codebases.</p>
<p>Notice what happened: The 2-day time box forced assessment before perfecting the first approach. You learned combining approaches was better. Without time-boxing, you might have spent a week perfecting complexity metrics alone.</p>
<h4 id="heading-how-to-set-time-boxes">How to Set Time Boxes</h4>
<p>When choosing a branch on your Research Tree, ask: Is this shorter than a day? A few days? Longer than a week?</p>
<p><strong>Shorter than a day:</strong> Just do it. Don't overthink time-boxing for tasks this short.</p>
<p><strong>A few days (2-7 days):</strong> Time-box for 2-5 days. I usually time-box for slightly less than my estimate – if I think 4 days, I set 3 days. This forces reflection based on learning, not arbitrary completion.</p>
<p><strong>Longer than a week:</strong> Time-box for one week maximum. Even if you're making progress, a week is long enough that stepping back is valuable.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768918460198/8ed7d7c3-b212-4777-927c-09317ed1b97e.png" alt="Time box based on your initial estimate" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Don't overthink the exact duration. The goal is creating natural stopping points. The specific number matters less than having some checkpoint.</p>
<p><strong>Critical:</strong> Time-boxing is not about pressure. You're not failing if the time box expires without solving the problem. That's expected in Research. What time-boxing does is force moments of reflection that prevent weeks spent on directions you'd have abandoned if you'd stopped to reconsider.</p>
<h4 id="heading-using-time-boxing-with-your-team">Using Time-Boxing with Your Team</h4>
<p>When managing researchers, set time boxes together during planning. Make sure they understand it's a decision point, not a deadline. Schedule the review explicitly.</p>
<p>At the review, look at the Research Tree together. Ask the three questions. Make the decision collaboratively. This prevents researchers from getting stuck without asking for help, and makes pivoting feel like a positive decision rather than failure.</p>
<h4 id="heading-integration-with-the-research-tree">Integration with the Research Tree</h4>
<p>Time-boxing combines naturally with the Research Tree (<a class="post-section-overview" href="#heading-chapter-4-the-research-tree">chapter 4</a>): each branch gets a time box, and when it expires, the tree shows your alternatives.</p>
<p>Time-boxing creates moments to ask "given what I've learned, what should I do next?" The Research Tree helps you answer that question.</p>
<h4 id="heading-recap">Recap</h4>
<p>For tasks longer than a day, set a time limit (2-5 days for medium tasks, max 1 week). When time expires, stop and evaluate using your Research Tree: What was my goal? What did I learn? Is this still the best path?</p>
<p>Time boxes acknowledge that research estimation is hard. They prevent sunk cost fallacy and endless exploration. They're not about pressure or finishing "on time" – they're about forcing reflection instead of momentum-driven continuation.</p>
<p>Combined with the Research Tree, time-boxing gives you control over research execution while respecting its inherent uncertainty.</p>
<h3 id="heading-part-2-summary">Part 2 Summary</h3>
<p><strong>Effective Research</strong> requires structured approaches, not just technical skill:</p>
<ul>
<li><p>The Research Tree visualizes solution paths, open questions, and closed questions.</p>
</li>
<li><p>Time-boxing prevents endless exploration while preserving flexibility.</p>
</li>
<li><p>Systematic evaluation helps you choose the best path forward.</p>
</li>
</ul>
<p>In <a class="post-section-overview" href="#heading-chapter-2-research-and-development">chapter 2</a>, I argued that your role as a Research leader is to:</p>
<ol>
<li><p>Ensure Research connects to product impact.</p>
</li>
<li><p>Ensure Research is done effectively.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768918564138/25040618-afdf-49e9-bf59-44a9518699f5.png" alt="Your Role as Research Leader" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>This part handled the latter part of your role: how to ensure Research is done effectively.</p>
<p>Within this part, your role is to:</p>
<ol>
<li><p>Ensure the team uses these methods consistently.</p>
</li>
<li><p>Help identify when to pivot, when to persist, and which questions matter.</p>
</li>
</ol>
<p><strong>These tools prevent common failure modes</strong>: jumping on the first idea, tunnel vision, inefficient learning, and wasting time answering irrelevant questions.</p>
<p>Part 3 shows how to connect Research to product impact.</p>
<h2 id="heading-part-3-ensuring-product-impact-1">Part 3: Ensuring Product Impact</h2>
<p>In <a class="post-section-overview" href="#heading-chapter-2-research-and-development">chapter 2</a>, I argued that your role as a Research leader is to:</p>
<ol>
<li><p>Ensure Research connects to product impact.</p>
</li>
<li><p>Ensure Research is done effectively.</p>
</li>
</ol>
<p>Part 2 addressed (2) above – ensuring Research is done effectively – with tools that work in ANY research context. This part provides the complete answer to (1) – ensuring Research connects to product impact:</p>
<ul>
<li><p>First, choose research that matters (<a class="post-section-overview" href="#heading-chapter-6-how-to-choose-research-initiatives">chapter 6</a>).</p>
</li>
<li><p>Then, work backwards from product value (<a class="post-section-overview" href="#heading-chapter-7-drawing-backwards">chapter 7</a>).</p>
</li>
<li><p>Continuously validate with end-to-end iterations (<a class="post-section-overview" href="#heading-chapter-8-end-to-end-iterations">chapter 8</a>).</p>
</li>
</ul>
<h3 id="heading-chapter-6-how-to-choose-research-initiatives">Chapter 6 - How to Choose Research Initiatives</h3>
<p>The very first step in making sure your Research impacts the product is choosing the right thing to research. And, just as important, avoiding Research that won't impact the product.</p>
<p>Research initiatives can start from two different places:</p>
<ol>
<li><p>From a problem the product is facing.</p>
</li>
<li><p>From a technological opportunity.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768918734842/5706e595-2264-4162-99a0-9f2aae7c9578.png" alt="Starting research intiatives - either from a product problem, or a technological opportunity" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-1-starting-from-a-concrete-problem">1. Starting From a Concrete Problem</h4>
<p>The most promising way to find research initiatives that will have a big impact on the product is to start from an acute problem that the product is facing.</p>
<p>At <a target="_blank" href="https://swimm.io">Swimm</a>, we allowed users to write documents about their code, but inevitably, the code changed, and the documentation became outdated. This made writing documentation not worth the effort in the first place. We needed to find a way to make sure the documentation stayed up to date automatically, with a good user experience. This was a clear problem we faced, and we didn't know if it was even possible to solve.</p>
<p>Consider a different example: a medical company that wants to diagnose a disease based on a few blood samples. Currently, they have an algorithm in place, but it's not very accurate. Specifically, it yields too many false positives. They need to find a way to improve their prediction accuracy. This is a clear problem the product is facing, and it doesn't have a clear technological solution.</p>
<p>In both cases, the problem is clear, and its impact on the product or company is clear. At the same time, the <em>solution</em> is not clear, and it's not certain that a solution will be technologically feasible.</p>
<h4 id="heading-2-starting-from-a-technological-opportunity">2. Starting From a Technological Opportunity</h4>
<p>When generative AI became popular, many companies started exploring how to leverage it to improve their products. This is an example of an emerging technology that can enable new product features.</p>
<p>The same can happen with smaller, more specific technologies, and not necessarily new technologies – sometimes technologies that the relevant teams just familiarized themselves with. For example, if a researcher reads a paper about a new way to parse source code, that researcher might have an idea for a new product feature that can leverage this technology.</p>
<p>While many good ideas come from technological opportunities, it's important to remember that the real impact of Research is determined by the product, not the technology. It's far more risky to pursue a technological opportunity than a concrete problem. If you do start a Research based on a technological opportunity, your responsibility is to make sure that the technological opportunity, if pursued successfully, will indeed have a big impact on the product.</p>
<h4 id="heading-problem-driven-vs-opportunity-driven-a-comparison">Problem-Driven vs. Opportunity-Driven: A Comparison</h4>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Aspect</td><td>Problem-Driven Research</td><td>Opportunity-Driven Research</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Starting Point</strong></td><td>Customer pain point or product limitation</td><td>New technology or technique becomes available</td></tr>
<tr>
<td><strong>Product Impact Clarity</strong></td><td>High – you know exactly what problem you're solving</td><td>Low to Medium – you're searching for problems this technology can solve</td></tr>
<tr>
<td><strong>Risk Level</strong></td><td>Lower – you know there's demand if you succeed</td><td>Higher – solution might not match any important problem</td></tr>
<tr>
<td><strong>Validation</strong></td><td>Problem already validated through user feedback</td><td>Needs validation that the solution matters to users</td></tr>
<tr>
<td><strong>Examples</strong></td><td>Swimm's auto-updating docs; Reducing false positives in medical diagnosis</td><td>"How can we use LLMs in our product?"; "This new parsing technique could enable..."</td></tr>
<tr>
<td><strong>Success Criteria</strong></td><td>Did we solve the problem?</td><td>Did we find a valuable use case AND solve the problem?</td></tr>
</tbody>
</table>
</div><p>Problem-driven research starts with validated demand: you know that the goal is worth pursuing. Opportunity-driven research starts with a hammer looking for nails. Sometimes you find valuable nails, but it's riskier.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768919054708/d526ce83-d052-436f-9bd4-de86cd90e353.png" alt="d526ce83-d052-436f-9bd4-de86cd90e353" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-should-you-pursue-a-research-initiative">Should You Pursue a Research Initiative?</h4>
<p>Say you have a research initiative – an idea you'd like to research. To know whether you should pursue it, you should be able to answer a simple set of questions:</p>
<p><strong>1. Product Impact – If the Research succeeds, how big will the impact be?</strong></p>
<p>This is the most crucial question.</p>
<p>For problem-driven research, this is usually straightforward: "If we solve automatic doc updates, we'll retain 40% more customers who currently churn due to outdated docs."</p>
<p>For opportunity-driven research, you need to work harder: "If we use LLMs for code analysis, we could enable X feature, which would help Y users save Z hours per week, translating to $W in additional revenue."</p>
<p>If you're not convinced a successful Research result would make a huge impact on the product, it's probably not worth pursuing at the moment (until you <em>are</em> convinced). "Huge impact" means:</p>
<ul>
<li><p>Solving a top-3 customer pain point, OR</p>
</li>
<li><p>Enabling a new product capability that significantly expands your market, OR</p>
</li>
<li><p>Reducing a major cost or risk factor</p>
</li>
</ul>
<p>Anything less, and your resources are better spent on Development work with clearer ROI.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768919149212/56a55ba5-9339-4cab-b622-2a1a2b21c363.png" alt="Product Impact - will success create huge value?" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><strong>2. Time to Impact – How long until we see product value?</strong></p>
<p>Time estimation is always hard in software. This is true for Development tasks, and even more so for Research. You learned ways to manage this uncertainty <em>during</em> Research in <a class="post-section-overview" href="#heading-chapter-5-time-boxing-research-explorations">chapter 5</a>, but even at this early pre-Research stage, you should consider the timeline.</p>
<p>Ask yourself two related questions:</p>
<ul>
<li><p><strong>How long for the Research itself?</strong> (days? weeks? months?)</p>
</li>
<li><p><strong>How long from successful Research to product impact?</strong> (immediate integration? requires significant Development? needs market validation?)</p>
</li>
</ul>
<p>The total timeline matters because:</p>
<ul>
<li><p>Research that takes 6 months but delivers immediate product value might be worthwhile.</p>
</li>
<li><p>Research that takes 2 months but requires 8 more months of Development might not be worth it if your product roadmap can't accommodate that.</p>
</li>
<li><p>Research that takes 1 year with uncertain outcomes probably isn't worth pursuing unless the potential impact is transformational.</p>
</li>
</ul>
<p>Of course, the actual timespans vary greatly by context (and specifically by the company you work for).</p>
<p>Consider also whether you can achieve <strong>incremental value</strong>. Can you get <em>some</em> product impact in 3 months even if the full solution takes 9 months? This significantly de-risks longer Research initiatives.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768919218931/3cdf5c7b-fd5c-4472-be89-e3fed8cda147.png" alt="Time to Impact - how long until we see product results?" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><strong>3. Resources – Do you have what you need?</strong></p>
<p>Research requires specific resources beyond just "engineering time":</p>
<p><strong>Knowledge</strong>: Do you have team members familiar with the relevant:</p>
<ul>
<li><p>Technical domain (for example, NLP, compiler design, distributed systems)?</p>
</li>
<li><p>Business domain (for example, medical diagnostics, financial regulations)?</p>
</li>
<li><p>Similar problems solved elsewhere?</p>
</li>
</ul>
<p>If not, can you acquire this knowledge in reasonable time? (Reading papers for a week: reasonable. Earning a PhD: not reasonable.)</p>
<p><strong>Capacity</strong>: Can you dedicate someone (or multiple people) for the expected duration? Research requires sustained focus – splitting someone 10% on Research and 90% on urgent product work rarely succeeds.</p>
<p><strong>Dependencies</strong>: Do you need:</p>
<ul>
<li><p>Access to specific data or systems?</p>
</li>
<li><p>Collaboration from other teams?</p>
</li>
<li><p>External expertise or consulting?</p>
</li>
<li><p>Budget for tools, cloud resources, or datasets?</p>
</li>
</ul>
<p>If critical resources are unavailable or expensive to obtain, the initiative may not be viable.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768919272463/2a6a5d81-a55b-4dc8-bfee-afec9f0175f8.png" alt="Resources - Do we have the knowledge, capacity, and dependencies?" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-pre-research-checks">Pre-Research Checks</h4>
<p>You might not have clear answers to all three questions above. In that case, it makes sense to spend time answering them <em>before</em> actually pursuing the Research. This phase should be limited in time – ideally a few days, at most a week or two.</p>
<p>In this stage you might:</p>
<ol>
<li><p><strong>Interview customers and business stakeholders</strong> to understand the real impact of solving the problem.</p>
</li>
<li><p><strong>Read about the technological aspects</strong> and previous research done in this field to assess feasibility and timeline.</p>
</li>
<li><p><strong>Consult with people</strong> who have faced similar challenges (internal experts, academic contacts, or practitioners in your network).</p>
</li>
<li><p><strong>Run quick feasibility tests</strong> – not full Research, but simple checks like "Can we even access the data we'd need?" or "Does this library exist in our language?"</p>
</li>
</ol>
<p>After pre-research checks, you should have a clear answer: "Yes, we should pursue this because the impact is X, the timeline is roughly Y, and we have (or can get) the resources we need."</p>
<p>If you're not confident about substantial product impact, <em>don't start the Research</em>. This might sound harsh, but it's crucial: unfocused Research that doesn't connect to product needs wastes your most valuable resource – talented engineers' time and attention. One way to enhance your certainty about product impact is described in <a class="post-section-overview" href="#heading-chapter-7-drawing-backwards">chapter 7</a>.</p>
<p>The next chapters assume you understand the product impact of successful Research outcomes, and will help you ensure you actually achieve this impact as quickly as possible.</p>
<h4 id="heading-how-to-choose-research-initiatives-summary">How to Choose Research Initiatives - Summary</h4>
<p><strong>Research initiatives</strong> start from either:</p>
<ul>
<li><p><strong>Problem-driven</strong>: A clear product need (strongly preferred)</p>
</li>
<li><p><strong>Opportunity-driven</strong>: A new technology (higher risk)</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768919679213/0ebc0dd3-4294-4602-9f86-726341b1fc1c.png" alt="Starting research intiatives - either from a product problem, or a technological opportunity" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><strong>Before pursuing Research</strong>, answer three questions:</p>
<ol>
<li><p><strong>Product Impact</strong>: Will success create huge value?</p>
</li>
<li><p><strong>Time to Impact</strong>: How long until we see product results?</p>
</li>
<li><p><strong>Resources</strong>: Do we have the knowledge, capacity, and dependencies?</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768919785248/179011ec-fed2-4123-b467-95739be62151.png" alt="Three questions to answer before pursuing Research" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><strong>Run pre-research checks</strong> (days, not weeks) to answer these questions if unclear.</p>
<p><strong>Only pursue Research</strong> when you're confident about substantial product impact.</p>
<p>The next chapter shows how to maintain that product connection throughout the Research process.</p>
<h3 id="heading-chapter-7-drawing-backwards">Chapter 7 - Drawing Backwards</h3>
<p>So you've chosen a research initiative, and done so correctly (following <a class="post-section-overview" href="#heading-chapter-6-how-to-choose-research-initiatives">chapter 6</a>). Now, how do you start working on it?</p>
<p>Most teams start by diving into technical challenges: parsing COBOL, building callgraphs, implementing algorithms. But there's a more powerful approach that ensures your Research actually impacts the product: <strong>start from the end and work backwards</strong>.</p>
<p>This heuristic – working backwards from your goal – is one of the most valuable problem-solving strategies you can use. Let me show you why with a simple game.</p>
<h4 id="heading-the-spiral-game">The Spiral Game</h4>
<p>Consider this game:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768919878522/eff5e496-f352-4973-b5d6-2ff8dde9c286.jpeg" alt="A spiral board numbered from 1 to 41, with a pawn on spot 41" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>The rules are simple:</p>
<ul>
<li><p>The pawn starts on spot 41.</p>
</li>
<li><p>On each turn, a player moves the pawn backward between 1 to 6 spots. That is, if the pawn is on spot 41, you can move it to any spot from 35 to 40.</p>
</li>
<li><p>The player who moves the pawn to spot 1 wins.</p>
</li>
</ul>
<p>Take a moment: If you go first, how would you play to guarantee a win?</p>
<p>(I do encourage you to take a moment and try this for yourself first.)</p>
<p>Most people start thinking from the current position (spot 41) and try to calculate forward: "If I move 3 spaces, they can move 2, then I can move 4..." This quickly becomes overwhelming – too many possible moves to track.</p>
<p>But if you <strong>work backwards</strong>, the solution becomes clear:</p>
<p><strong>Starting from the end (spot 1):</strong></p>
<ul>
<li><p>To win, you need the pawn on spot 1 on your turn.</p>
</li>
<li><p>Your opponent just moved, so the pawn could be on spots 2-7 (since they moved 1-6 backward from wherever it was).</p>
</li>
<li><p>For any spot from 2-7, you can move directly to spot 1.</p>
</li>
<li><p><strong>Conclusion</strong>: If the pawn is on spots 2-7 at the start of your turn, you win.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768920391528/1b74e6ac-43f1-4dca-9760-f38cf2b849a0.png" alt="You do *not* want to land on spots 2-7" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><strong>Continue working backwards (from spots 2-7):</strong></p>
<ul>
<li><p>You want your opponent to land on spots 2-7.</p>
</li>
<li><p>From spot 8, no matter what they do (move 1-6), they land on spots 2-7.</p>
</li>
<li><p><strong>Conclusion</strong>: If you get the pawn to spot 8, you guarantee a win.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768920572462/3f4ca3a4-e037-4b63-89dc-830da52d73ed.png" alt="If you land on spot 8, you win" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><strong>Continue working backwards from spot 8:</strong></p>
<p>Notice that you've just created a "new" game: you no longer need to land on spot 1 in order to win. It's enough that you land on spot 8 – as from there, you can win no matter what your opponent does.</p>
<p>So, how do you ensure you can land on spot 8?</p>
<ul>
<li><p>Spots 9-14 all allow moving to spot 8.</p>
</li>
<li><p>So if your opponent starts their turn with pawn on a spot between 9-14, you can force it to 8.</p>
</li>
<li><p>Which means you do <em>not</em> want to land on spots 9-14, but you do want to land on 15 (which will force your opponent, in turn, to land somewhere between 9-14).</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768920735421/e390c664-846d-449e-a3ee-f9907ce2cbcb.png" alt="If you land on spot 15, you win" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Again, you've just created a "new" game: where your goal is to land on spot 15. From there, you already know how to win.</p>
<p>You can keep going like this, drawing backwards – would you want to land on spot 16 or not? How about 17? Until, at some point...</p>
<p><strong>The pattern emerges:</strong></p>
<ul>
<li><p>Safe spots for you: 1, 8, 15, 22, 29, 36</p>
</li>
<li><p>From any of these, your opponent cannot avoid giving you another safe spot</p>
</li>
<li><p>These are all numbers of the form: <code>1 + 7n</code></p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768921127375/389c1f9b-f5dc-45b6-9e69-ec986d9bd5a8.png" alt="The winning pattern" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><strong>The winning strategy:</strong></p>
<ul>
<li><p>From spot 41, move 5 spots backward to reach spot 36 (a "safe spot").</p>
</li>
<li><p>No matter what your opponent does, you can always move to the next "safe spot".</p>
</li>
<li><p>Eventually, you reach spot 1 and win.</p>
</li>
</ul>
<p>Notice what happened: By working backwards from the goal, you discovered the systematic solution. Working forward from the start position would have been much, much harder.</p>
<p>You actually deployed a second, powerful heuristic here: considering specific cases (1, 8, 15, 22, 29, 36) – and generalizing (<code>1+7n</code>). This heuristic is also very common in research, though not specifically when aiming to ensure product impact.</p>
<h4 id="heading-how-to-apply-drawing-backwards-to-product-led-research">How to Apply Drawing Backwards to Product-led Research</h4>
<p>Let's connect this to Product-led Research. When you face a complex Research challenge, the question isn't "What technical problem should I solve first?" but rather:</p>
<p><strong>"If the Research succeeds, what would the result look like?"</strong></p>
<p>This forces you to:</p>
<ol>
<li><p><strong>Connect to product impact</strong>: You must envision the end state that creates value.</p>
</li>
<li><p><strong>Work systematically</strong>: Like the spiral game, you identify the chain of dependencies backward.</p>
</li>
<li><p><strong>Validate assumptions</strong>: Before solving sub-problems, ensure they lead to your goal.</p>
</li>
</ol>
<p>Let me show you how this worked in practice at Swimm.</p>
<h4 id="heading-case-study-extracting-business-rules-from-cobol">Case Study: Extracting Business Rules from COBOL</h4>
<p>At Swimm, we wanted to automatically generate documents from COBOL codebases that included all the extracted business rules.</p>
<p><strong>Quick context on business rules:</strong> Business rules are the constraints, conditions, and actions embedded within software that reflect organizational policies. For example, in money transfer logic:</p>
<ul>
<li><p>A customer cannot transfer more than their available balance (overdraft limits notwithstanding).</p>
</li>
<li><p>High-value transfers require additional verification.</p>
</li>
<li><p>Cross-currency transfers must apply current exchange rates.</p>
</li>
</ul>
<p>Some sources define business rules with three elements: Event, Condition, Action:</p>
<pre><code class="lang-plaintext">ON &lt;Event&gt;
IF &lt;Condition&gt;
THEN &lt;Action&gt;
ELSE &lt;Action&gt;
</code></pre>
<p>Our goal was to extract all business rules from a COBOL codebase. While challenging in any codebase, it's particularly acute in legacy COBOL code. (If you're interested in the technical details, see <a target="_blank" href="https://swimm.io/blog/blackbox-to-blueprint-extracting-business-logic-from-cobol-applications">this post</a>.) For this book, it's sufficient to know that many research attempts over the last few decades have tried different approaches to face this challenge.</p>
<p><strong>Where Do You Even Start?</strong></p>
<p>Faced with this problem, you might think:</p>
<ul>
<li><p>"Should I build a callgraph of all functions? That means I need to parse COBOL code..."</p>
</li>
<li><p>"Should I create a COBOL parser first?"</p>
</li>
<li><p>"Maybe I should read academic papers about program comprehension?" (Good practice during the pre-Research checks phase from <a class="post-section-overview" href="#heading-chapter-6-how-to-choose-research-initiatives">chapter 6</a>)</p>
</li>
<li><p>"Perhaps I should track COBOL variables through the codebase?"</p>
</li>
<li><p>"Should I distinguish business conditions ('if requested transfer amount &gt; available balance') from technical conditions ('if variable not initialized, show error')?"</p>
</li>
</ul>
<p>Each of these might require its own research effort. Where do you start?</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768921215403/e663ef18-3ba7-4cec-a3e3-9e432fa781e0.png" alt="Where do you start?" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><strong>Drawing Backwards: Start with the End Result</strong></p>
<p>Drawing backwards made us ask:</p>
<p><strong>"If the Research succeeds, what would the result look like?"</strong></p>
<p>When we first asked ourselves this question, we weren't sure. We knew from our clients that they wanted extracted business rules, but we couldn't know what the "ideal" output would look like.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768921373545/b604f6dc-5c41-4076-afa9-77299a43c6e7.png" alt="Start with the end result" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>So we decided, <strong>before tackling any technical challenges</strong>, to manually create documents showing extracted business rules from sample programs. We did this completely manually: no parsing, no algorithms, just understanding COBOL code ourselves and writing documentation.</p>
<p>We did this for various types of applications from different codebases, and learned:</p>
<ul>
<li><p>There's no single "right" way to construct such a document.</p>
</li>
<li><p>The output structure differs from one program to another.</p>
</li>
<li><p>Certain patterns appear consistently across business logic.</p>
</li>
</ul>
<p>By creating these documents manually, we formed a hypothesis: <strong>"This is what the output should look like, which means this output would make the biggest impact on the product. This is our north star."</strong></p>
<p>But was it actually the north star?</p>
<p><strong>Validating the Hypothesis</strong></p>
<p>Once we manually wrote the documents, it was time to verify our hypothesis. With concrete output in hand, we could:</p>
<ol>
<li><p>Discuss internally within the team – get feedback from engineers who understand both COBOL and our product.</p>
</li>
<li><p>Reach out to clients – show them the actual output and ask: "Does this solve your problem?"</p>
</li>
</ol>
<p>We deliberately <strong>refrained from solving hard technological challenges</strong> before knowing where we were aiming.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769096476455/2c4b25df-b9d9-4fc1-9b4a-f0704164726c.png" alt="Hypothesize about your end result" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><strong>Working Backwards Through Sub-Problems</strong></p>
<p>The manual documents also gave us something crucial: a concrete example to analyze backwards.</p>
<p>For instance, we saw that our documents listed many conditions. This made us realize we would (most probably) need to:</p>
<ol>
<li><p><strong>Find conditions</strong> in the code.</p>
</li>
<li><p><strong>Filter out business-related conditions</strong> (vs. technical conditions).</p>
</li>
<li><p><strong>Explain the condition</strong> in the document.</p>
</li>
</ol>
<p><strong>Here's where drawing backwards becomes powerful:</strong> We tackled (3) before (2), and (2) before (1).</p>
<p>Why? Because we needed to solve (3) to reach our goal – creating impactful documents.</p>
<p>This means we <strong>mocked the first two steps</strong> – we assumed we already had a way to find conditions and filter business-related ones. So we already had our list of business conditions, and now the challenge was: for each condition, explain it clearly in the output document.</p>
<p>This approach prevented us from spending weeks on tasks (1) and (2), only to discover that our explanation approach (3) didn't actually work. If we can't accomplish (3) assuming (1) and (2) are solved, then perhaps this entire approach isn't viable, and we need to reconsider alternatives.</p>
<p>The logic is clear: While solving (3) ultimately requires solving (1) and (2), if we fail at (3) even with (1) and (2) accomplished, then pursuing (1) and (2) might not be worth the effort at all.</p>
<p><strong>What does mocking a dependency look like in practice?</strong> In our case, mocking steps (1) and (2) meant manually identifying a handful of business conditions from the COBOL code ourselves, rather than building automated systems to find them. We created a small, hand-crafted list of conditions that we <em>knew</em> were business-relevant and existed in our sample programs, and used that as input for testing our explanation approach in step (3).</p>
<h4 id="heading-why-drawing-backwards-is-so-powerful">Why Drawing Backwards Is So Powerful</h4>
<p>The advantages of drawing backwards become clear from both the game examples and the COBOL case study:</p>
<p><strong>1. Forces Connection to Product Impact</strong></p>
<p>Like working backwards from spot 1 in the spiral game, starting with "what does successful output look like?" forces you to think about end-user value. You can't start drawing backwards without a clear goal. This prevents the common failure mode of technically interesting Research that doesn't impact the product.</p>
<p><strong>2. Provides a System for Progressing</strong></p>
<p>When Research seems like a huge, daunting task with endless options, working backwards gives you a systematic approach. Just as the spiral game became trivial once you worked backwards to identify the safe spots (1, 8, 15, 22...), Research becomes more manageable when you work backwards from the desired output to identify the dependencies.</p>
<p><strong>3. Validates Each Step Before Investment</strong></p>
<p>Working backwards lets you validate that each sub-problem actually contributes to your goal before you invest significant effort. In the COBOL example, we could verify that our explanation approach (step 3) worked before spending weeks on finding and filtering conditions (steps 1 and 2).</p>
<h4 id="heading-practical-application-your-research-tree">Practical Application: Your Research Tree</h4>
<p>Drawing backwards integrates naturally with the Research Tree from <a class="post-section-overview" href="#heading-chapter-4-the-research-tree">chapter 4</a>. When you create your tree:</p>
<p><strong>Start with the end:</strong></p>
<ul>
<li><p>Root of tree: "Generate impactful business rule documentation".</p>
</li>
<li><p>First question: "What should successful output look like?"</p>
</li>
<li><p>Approach: Create manual examples.</p>
</li>
</ul>
<p><strong>Then work backwards:</strong></p>
<ul>
<li><p>Once you have output, ask: "What do we need to produce this?".</p>
</li>
<li><p>This reveals the actual sub-questions and their dependencies.</p>
</li>
<li><p>Each branch represents a prerequisite you need to solve.</p>
</li>
</ul>
<p><strong>Validate before going deeper:</strong></p>
<ul>
<li><p>Before pursuing any branch deeply, ask: "If I solve this, does it actually get me closer to the goal?"</p>
</li>
<li><p>Mock out dependencies to test approaches cheaply.</p>
</li>
<li><p>Use time-boxing (from <a class="post-section-overview" href="#heading-chapter-5-time-boxing-research-explorations">chapter 5</a>) to limit exploration of any branch.</p>
</li>
</ul>
<h4 id="heading-summary-drawing-backwards">Summary: Drawing Backwards</h4>
<p><strong>Drawing backwards</strong> is the heuristic of starting from your desired end state and working systematically toward your current position.</p>
<p><strong>In Product-led Research</strong>, drawing backwards means:</p>
<ol>
<li><p>Start by defining what successful output looks like (often manually or semi-manually).</p>
</li>
<li><p>Validate the output with stakeholders before technical work.</p>
</li>
<li><p>Work backwards through dependencies, solving them in reverse order.</p>
</li>
<li><p>Validate that each step contributes to the goal before major investment.</p>
</li>
</ol>
<p>This heuristic ensures that Research connects to product impact, since you start with the product goal. It provides systematic progress even when problems seem overwhelming, and makes you validate each step before you invest heavily.</p>
<p><strong>Integration with other tools:</strong></p>
<ul>
<li><p>Use with Research Tree (<a class="post-section-overview" href="#heading-chapter-4-the-research-tree">chapter 4</a>) to map backwards dependencies.</p>
</li>
<li><p>Apply time-boxing (<a class="post-section-overview" href="#heading-chapter-5-time-boxing-research-explorations">chapter 5</a>) to limit exploration of each branch.</p>
</li>
<li><p>Combine with pre-Research checks (<a class="post-section-overview" href="#heading-chapter-6-how-to-choose-research-initiatives">chapter 6</a>) to validate product impact.</p>
</li>
</ul>
<p>In the next chapter, you’ll learn about two limitations of drawing backwards and how to address them with continuous end-to-end iterations.</p>
<h3 id="heading-chapter-8-end-to-end-iterations">Chapter 8 - End-to-End Iterations</h3>
<p>Your most important role is ensuring that Research impacts the product. One major risk: spending weeks on research questions that seem vital, only to discover they don't actually impact the product.</p>
<p>In <a class="post-section-overview" href="#heading-chapter-7-drawing-backwards">chapter 7</a>, I advocated for drawing backwards – starting from product impact and working your way back to research questions. This is indeed powerful, especially because it forces you to focus on the end result and validate it with users.</p>
<p>But drawing backwards alone has limitations. Two risks emerge:</p>
<p><strong>Risk 1: Infeasibility in practice</strong> Your manually-created "ideal output" might be technically infeasible or impossibly expensive to generate. You won't discover this until you try to build it.</p>
<p><strong>Risk 2: Lack of real-world validation</strong> Since you haven't completed an end-to-end process, you probably haven't run your solution on clients' actual data (assuming you can't access it during the manual phase). Continuing our COBOL example from the previous chapter, what works on carefully-chosen examples might fail on real codebases.</p>
<p>These two risks are why I advocate for <strong>continuous end-to-end iterations</strong>.</p>
<h4 id="heading-drawing-backwards-end-to-end-a-combined-approach">Drawing Backwards + End-to-End: A Combined Approach</h4>
<p>These aren't competing approaches – they're complementary:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Heuristic</td><td>Purpose</td><td>What It Gives You</td><td>Strength</td><td>Limitation</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Drawing Backwards</strong> (<a class="post-section-overview" href="#heading-chapter-7-drawing-backwards">chapter 7</a>)</td><td>Define target and path</td><td>1. Target output\ 2. Hypothesized chain of steps to get there\ 3. Order of dependencies</td><td>Ensures product focus; reveals what you need to build</td><td>Hypotheses may be wrong; doesn't validate feasibility on real data</td></tr>
<tr>
<td><strong>End-to-End Iterations</strong> (this chapter)</td><td>Validate and build incrementally</td><td>1. Proof the chain works\ 2. Learning from real data\ 3. Prioritized improvements</td><td>Validates feasibility; discovers what actually works</td><td>Can lose direction without clear target and chain</td></tr>
</tbody>
</table>
</div><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768921601722/310d3a80-6325-4d0c-849b-cbd39b972a69.png" alt="These are complementary heuristics" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><strong>The recommended flow:</strong></p>
<ol>
<li><p>Use drawing backwards to:</p>
<ul>
<li><p>Define your target output (manually create examples, validate with users).</p>
</li>
<li><p>Identify the chain of intermediary steps needed to produce that output.</p>
</li>
<li><p>Understand the order of dependencies.</p>
</li>
</ul>
</li>
<li><p>Switch to end-to-end iterations to:</p>
<ul>
<li><p>Test whether your hypothesized chain actually works.</p>
</li>
<li><p>Validate on real data (not just your manual examples).</p>
</li>
<li><p>Incrementally build toward the target.</p>
</li>
</ul>
</li>
<li><p>Throughout iterations, keep both the target AND the chain from step 1 as your guide.</p>
</li>
</ol>
<p>Drawing backwards already outlines your end-to-end process (Principle 1 below). End-to-end iterations validate and build that process incrementally, learning what works and what doesn't.</p>
<h4 id="heading-the-five-principles-of-end-to-end-iterations">The Five Principles of End-to-End Iterations</h4>
<p>The end-to-end approach relies on five principles:</p>
<ol>
<li><p>Outline the end-to-end process.</p>
</li>
<li><p>Get to end-to-end by simplifying.</p>
</li>
<li><p>Ship it as fast as you can.</p>
</li>
<li><p>Gradually replace steps.</p>
</li>
<li><p>Get frequent feedback on results.</p>
</li>
</ol>
<p>Let me explain each principle using the COBOL business rules example from <a class="post-section-overview" href="#heading-chapter-7-drawing-backwards">chapter 7</a>. Since this book isn't about COBOL, I'll keep explanations short while focusing on the methodology.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768921744585/677b7220-73f7-4ef3-846c-9b32a14e1f34.png" alt="The five principles of end-to-end iterations" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-principle-1-outline-the-end-to-end-process">Principle 1: Outline the End-to-End Process</h4>
<p>Start by outlining the entire process from input to output. More accurately, outline the <em>hypothesized</em> end-to-end process – you can't know for certain until you test with users.</p>
<p><strong>If you followed drawing backwards from</strong> <a class="post-section-overview" href="#heading-chapter-7-drawing-backwards"><strong>chapter 7</strong></a><strong>, you already have this.</strong> Drawing backwards naturally produces this chain: you started with the target output, asked "what do I need to produce this?", then "what do I need for that?", working your way back to the starting input. That chain IS your end-to-end process outline.</p>
<p>For the COBOL business rules example: We used drawing backwards to identify that our document needed business rule sections, which meant we needed to explain conditions, which meant we needed to filter business conditions, which meant we needed to find conditions, which meant we needed to parse COBOL. Working backwards gave us this chain:</p>
<ol>
<li><p>Start with a COBOL program.</p>
</li>
<li><p>Parse the COBOL program into an Abstract Syntax Tree (AST).</p>
</li>
<li><p>Traverse the AST to find conditions.</p>
</li>
<li><p>Filter out business-related conditions (vs. technical conditions).</p>
</li>
<li><p>For every business rule, create a document section explaining the condition.</p>
</li>
<li><p>Sort document sections according to business rule dependencies.</p>
</li>
</ol>
<p>This is your first draft: the hypothesized process that drawing backwards revealed.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768921827862/4fd0559d-2b08-49ec-8f03-5c4dbad80d17.png" alt="First end-to-end process draft" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><strong>How to outline:</strong></p>
<ul>
<li><p>Draw boxes on a whiteboard.</p>
</li>
<li><p>Use a flowchart if the process isn't linear.</p>
</li>
<li><p>Keep it visible throughout the research.</p>
</li>
<li><p>Update it as you learn.</p>
</li>
</ul>
<p>The outline serves as your map – it shows you where you are and what you're building toward.</p>
<h4 id="heading-principle-2-get-to-end-to-end-by-simplifying">Principle 2: Get to End-to-End by Simplifying</h4>
<p>Your goal: make the process work end-to-end. Start with input (a COBOL program), reach the output (a document with business rules), while passing through the intermediate stages.</p>
<p>This sounds like too much. Don't you need to complete the whole research to achieve this?</p>
<p><strong>Definitely not.</strong> The trick is to find the easiest way to get end-to-end by taking shortcuts and making assumptions that are definitely too generous for production. You should fight your inner engineer who wants to "do it right" from the start. Your goal is to get an end-to-end process working, as this heuristic is far more valuable than perfecting one step.</p>
<p>This means that some of the steps can be completed manually, or with very simple implementations that you know won't work in production. Remember it is an intermediate milestone, not the final product.</p>
<p>For our COBOL example:</p>
<ul>
<li><p>Start with a single, known COBOL program.</p>
</li>
<li><p><strong>Skip parsing</strong> – just manually write a list of conditions for the next stage.</p>
</li>
<li><p>The filtering function returns <code>true</code> if business-related, <code>false</code> otherwise.</p>
<ul>
<li><p>First implementation: a simple mapping between input conditions (that you know of) and whether to return <code>true</code> or <code>false</code>.</p>
</li>
<li><p>Alternative: always return <code>true</code> – yes, you'll get non-relevant rules, but that's a problem for <em>later</em>.</p>
</li>
</ul>
</li>
<li><p>The document generation might also be manual for the first pass.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768922067666/202242c9-fa00-4dcf-9b23-d0b1bce26da8.png" alt="With shortcuts, we get to an end-to-end process quickly" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><strong>End result:</strong> A rough-looking document, generated by a combination of manual work and code that works solely on a single program. This is far from shippable, but it's an extremely important milestone for ensuring research impacts the product.</p>
<p>You might argue that it's overkill, and why waste this time on manual steps when you'll need to automate them eventually? Even if you don't, I promise from experience that many researchers and engineers feel that way.</p>
<p>From my own experience, I learned (the hard way) that this pays off. Getting to a working end-to-end makes sure you:</p>
<ul>
<li><p>Validate that the entire flow works.</p>
</li>
<li><p>Identify bottlenecks and blockers early.</p>
</li>
<li><p>Have something concrete to show and get feedback on.</p>
</li>
<li><p>Prevent spending months on one component that turns out to be unnecessary.</p>
</li>
</ul>
<h4 id="heading-principle-3-ship-it-as-fast-as-you-can">Principle 3: Ship It as Fast as You Can</h4>
<p>By the end of Principle 2, you have a working end-to-end process for a single case. You definitely can't ship it yet – it only works on one specific program, certainly not the client's program.</p>
<p><strong>Next milestone: make it shippable.</strong></p>
<p>Does "shippable" mean it works on <em>any</em> program? That's too hard and takes too long. <strong>You need shortcuts.</strong></p>
<p>This is where creativity matters. For our COBOL example:</p>
<p><strong>Information gathering:</strong></p>
<ul>
<li><p>If running on a client's program, get as much information as possible.</p>
</li>
<li><p>Perhaps assume it's less than 1,000 lines of code.</p>
</li>
<li><p>Perhaps you know which COBOL dialect the client uses, so you don't need to support others (fun fact: <a target="_blank" href="https://www.cs.vu.nl/grammarware/500/500.pdf">COBOL has more than 300 dialects</a>. Well, it's fun for you, not for those who need to support them).</p>
</li>
</ul>
<p><strong>Algorithmic shortcuts:</strong></p>
<ul>
<li><p>Use regular expressions to find conditions instead of a full parser.</p>
</li>
<li><p>Yes, you'll miss some conditions – that's a problem for <em>later</em>.</p>
</li>
</ul>
<p><strong>UX shortcuts:</strong></p>
<ul>
<li><p>Skip the document generation. Just print rules to the console.</p>
</li>
<li><p>Run from the command line without a GUI.</p>
</li>
<li><p>Manual configuration file instead of user interface.</p>
</li>
</ul>
<p><strong>Your goal is clear: ship it.</strong></p>
<p>It doesn't need to be perfect. It needs to be enough to learn from this iteration. If you don't ship, you can only learn from your intuition – a very bad idea.</p>
<p>Unlike in Principle 2, note that here you don't generate an output (partially) manually - you need a working software that can run on real data.</p>
<p><strong>What "shippable" means:</strong></p>
<ul>
<li><p>Works on the client's actual data (even if imperfectly).</p>
</li>
<li><p>Produces output you can get real feedback on.</p>
</li>
<li><p>Doesn't require your manual intervention for each run.</p>
</li>
</ul>
<p><strong>What "shippable" doesn't mean:</strong></p>
<ul>
<li><p>Perfect accuracy.</p>
</li>
<li><p>Handles all edge cases.</p>
</li>
<li><p>Production-quality code.</p>
</li>
<li><p>Beautiful user interface.</p>
</li>
</ul>
<p>You can't ship just <em>anything</em>, though. In our example, if you only have a solution that works on one specific test program you created, generating an irrelevant document for the client wastes time and teaches you nothing.</p>
<p>You can't learn from iteration without shipping. And you can't ship without a working end-to-end process. I know it sounds obvious, but many teams miss this in practice as they get into the rabbit hole of solving one step "the right way" before validating the entire flow.</p>
<h4 id="heading-principle-4-gradually-replace-steps-while-carefully-prioritizing">Principle 4: Gradually Replace Steps, While Carefully Prioritizing</h4>
<p>Now you have a working end-to-end process. You can start replacing various steps' implementations with better ones:</p>
<ul>
<li><p>Replace manual steps with automated ones.</p>
</li>
<li><p>Remove shortcuts and add more robust implementations.</p>
</li>
<li><p>Improve accuracy and coverage</p>
</li>
</ul>
<p>After shipping, you'll have many things you think you <em>must</em> replace immediately. But remember: the goal is making research impact the product. To do that, you need careful prioritization.</p>
<p><strong>The Prioritization Framework</strong></p>
<p>Prioritize changes based on three criteria:</p>
<p><strong>1. Learned Necessity</strong> Did you learn that something doesn't work in the current implementation and <strong>must</strong> be fixed for the product to be viable?</p>
<p><em>Example: "Regular expressions miss nested conditions, and 60% of the client's business rules are in nested conditions. We must use a real parser."</em></p>
<p><strong>2. Learning Potential</strong> Will changing this implementation help you learn more about product impact in the next iteration?</p>
<p><em>Example: "If we improve the filtering accuracy from 40% to 80%, we'll learn whether the document format is actually useful when it contains mostly-correct content."</em></p>
<p><strong>3. Effort Estimation</strong> How much time will this change take?</p>
<p><em>Example: "Building a full parser: 3 weeks. Improving regex to handle nested conditions: 2 days. The latter gives us 80% of the value for 10% of the effort."</em></p>
<p>Continuing with our COBOL example, you may consider and prioritize these changes following the first iteration:</p>
<pre><code class="lang-plaintext">Changes after first iteration:

- Fix parser to handle nested conditions [Learned: Critical, 2 days] → DO FIRST
- Add GUI for document generation [Nice-to-have, 1 week] → DEFER
- Improve filtering accuracy [Learning: Critical, 3 days] → DO SECOND  
- Support additional COBOL dialects [No evidence needed, very long time...] → DEFER
- Better document formatting [Client mentioned it, 1 week] → DEFER (validate content first)
</code></pre>
<p>Iterate fast: change something, ship again, get feedback. Don't solve every issue you find, even issues clients mention. Ask: "What's the most important thing to change to learn something in the very next iteration?"</p>
<h4 id="heading-principle-5-get-frequent-feedback-on-results">Principle 5: Get Frequent Feedback on Results</h4>
<p>The trick is to be obsessed with getting as much feedback as you can, on each and every iteration.</p>
<p><strong>On each iteration:</strong></p>
<ol>
<li><p><strong>Get feedback on the end result</strong></p>
<ul>
<li><p>Show the actual output to users.</p>
</li>
<li><p>When applicable, don't just ask "does this work?" – also watch them try to use it.</p>
</li>
<li><p>Identify what works, what doesn't, what's missing.</p>
</li>
</ul>
</li>
<li><p><strong>Understand the next questions to answer</strong></p>
<ul>
<li><p>What did you learn about product impact?</p>
</li>
<li><p>What assumptions were validated or invalidated?</p>
</li>
<li><p>What new questions emerged?</p>
</li>
</ul>
</li>
<li><p><strong>Plan the next iteration accordingly</strong></p>
<ul>
<li><p>Use the prioritization framework above.</p>
</li>
<li><p>Focus on learning, not building.</p>
</li>
</ul>
</li>
<li><p><strong>Implement minimal changes to answer questions</strong></p>
<ul>
<li><p>Don't fix everything.</p>
</li>
<li><p>Make the smallest changes that will answer your next most important question.</p>
</li>
<li><p>Keep the cycle fast (days to weeks, not months).</p>
</li>
</ul>
</li>
</ol>
<p><strong>Example iteration cycle:</strong></p>
<pre><code class="lang-plaintext">Iteration 1:
- Shipped: Regex-based condition finder, always-true filter, console output.
- Learned: Document structure works, but too many false positives (noise).
- Question: Is filtering accuracy the blocker to usefulness?
- Next: Improve filtering to 80% accuracy, ship again.

Iteration 2:
- Shipped: Same regex finder, smarter filtering (80% accurate), console output.
- Learned: With better filtering, users found the output useful!
- Question: Do we need a parser, or is regex sufficient?
- Next: Test on larger programs to see where regex breaks down.

Iteration 3:
- Shipped: Same pipeline, tested on 10 real programs.
- Learned: Regex fails on 4/10 programs (nested conditions).
- Question: Parser worth the investment now?
- Next: Build parser for nested conditions, ship again.
</code></pre>
<p>Notice: Each cycle is fast, focused on one question, and based on real learning.</p>
<p>In real life, you may want to tackle a few questions per iteration. If two things are clear, fix them before shipping again so you can actually gain valuable feedback rather than hearing the same complaints.</p>
<p>Also, when working with real clients, they might not be as receptive to trying things so many times – so you’ll need to consider that aspect as well. Regardless, the key remains the same: keep cycles short and focused on learning.</p>
<h4 id="heading-integration-with-other-tools">Integration with Other Tools</h4>
<p>End-to-end iterations work best when combined with other research management tools:</p>
<p><strong>Research Tree (</strong><a class="post-section-overview" href="#heading-chapter-4-the-research-tree"><strong>chapter 4</strong></a><strong>):</strong></p>
<ul>
<li><p>The outlined process becomes a branch in your tree.</p>
</li>
<li><p>Each iteration tests different approaches on branches.</p>
</li>
<li><p>Failed iterations mark branches red, successful ones mark green.</p>
</li>
</ul>
<p><strong>Time-boxing (</strong><a class="post-section-overview" href="#heading-chapter-5-time-boxing-research-explorations"><strong>chapter 5</strong></a><strong>):</strong></p>
<ul>
<li><p>Time-box each iteration.</p>
</li>
<li><p>If you can't ship in the time box, you're building too much.</p>
</li>
</ul>
<p><strong>Drawing Backwards (</strong><a class="post-section-overview" href="#heading-chapter-7-drawing-backwards"><strong>chapter 7</strong></a><strong>):</strong></p>
<ul>
<li><p>Drawing backwards (<a class="post-section-overview" href="#heading-chapter-7-drawing-backwards">chapter 7</a>) defines both:</p>
<ul>
<li><p>The target output.</p>
</li>
<li><p>The hypothesized chain of intermediary steps to reach it.</p>
</li>
</ul>
</li>
<li><p>End-to-end iterations test whether that chain actually works on real data.</p>
</li>
<li><p>The target from drawing backwards acts as your north star throughout iterations.</p>
</li>
<li><p>Each iteration validates or refines the steps that drawing backwards identified.</p>
</li>
<li><p>Use both: drawing backwards reveals what to build, while end-to-end iterations prove it works and let you test it incrementally.</p>
</li>
</ul>
<h4 id="heading-summary-end-to-end-iterations">Summary: End-to-End Iterations</h4>
<p><strong>End-to-end iterations</strong> ensure Research impacts the product by continuously validating feasibility and learning from real data.</p>
<p><strong>The five principles:</strong></p>
<ol>
<li><p><strong>Outline the process</strong>: Draw backwards already gives you this: the chain from input to output.</p>
</li>
<li><p><strong>Simplify to get end-to-end</strong>: Use shortcuts and manual steps to make the whole chain work.</p>
</li>
<li><p><strong>Ship fast</strong>: Real data teaches what theory can't.</p>
</li>
<li><p><strong>Prioritize carefully</strong>: Use the three-criteria framework:</p>
<ul>
<li><p>Learned necessity (is it broken?)</p>
</li>
<li><p>Learning potential (what will you learn?)</p>
</li>
<li><p>Effort estimation (how long will it take?)</p>
</li>
</ul>
</li>
<li><p><strong>Get frequent feedback</strong>: Fast cycles (days to weeks) focused on learning.</p>
</li>
</ol>
<h3 id="heading-part-3-summary">Part 3 Summary</h3>
<p>Back in <a class="post-section-overview" href="#heading-chapter-2-research-and-development">chapter 2</a>, you learned that your role as a Research leader is to:</p>
<ol>
<li><p>Ensure Research connects to product impact.</p>
</li>
<li><p>Ensure Research is done effectively.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768922191304/a1d5bec0-59c4-4414-8a7b-73186057591c.png" alt="Your Role as Research Leader" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><a class="post-section-overview" href="#heading-part-2-research-management-methods">Part 2</a> handled ensuring Research is done effectively.</p>
<p><a class="post-section-overview" href="#heading-part-3-ensuring-product-impact">Part 3</a> handled ensuring Research connects to product impact – your most important responsibility. This part provided a complete methodology for ensuring product impact through three complementary stages:</p>
<h4 id="heading-the-three-stages-of-product-led-research">The Three Stages of Product-Led Research</h4>
<p><strong>Stage 1: Choose research that matters</strong> (<a class="post-section-overview" href="#heading-chapter-6-how-to-choose-research-initiatives">chapter 6</a>)</p>
<p>Before starting any Research, answer three critical questions:</p>
<ol>
<li><p><strong>Product impact</strong>: Will success create huge value?</p>
</li>
<li><p><strong>Time to impact</strong>: How long until we see product results?</p>
</li>
<li><p><strong>Resources</strong>: Do we have the knowledge, capacity, and dependencies?</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768922572960/eaf4d34f-b9cc-4350-8dee-db766bdcbb34.png" alt="Three questions to answer before pursuing Research" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>You learned to distinguish between problem-driven research (starting from validated customer pain points, which is strongly preferred) and opportunity-driven research (starting from new technologies).</p>
<p>Run focused pre-research checks to answer these questions. Only pursue Research when confident about substantial product impact.</p>
<p><strong>Stage 2: Start from product value and work backwards</strong> (<a class="post-section-overview" href="#heading-chapter-7-drawing-backwards">chapter 7</a>)</p>
<p>Once you've chosen what to research, the drawing backwards heuristic ensures that you start right. Instead of diving into technical challenges, start from the end:</p>
<ol>
<li><p><strong>Manually create the desired output</strong>: What should successful Research produce? Create it by hand before solving any technical problems.</p>
</li>
<li><p><strong>Validate with stakeholders or customers</strong>: Show them the output and confirm it solves their problem.</p>
</li>
<li><p><strong>Work backwards through dependencies</strong>: From that validated output, identify what you need to produce it, then what you need for that, working your way back to your starting point.</p>
</li>
<li><p><strong>Solve in reverse order</strong>: Tackle the final step first (with earlier steps mocked), validating that each step contributes to the goal before investing heavily in earlier dependencies.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768922685763/25ac6ca8-088a-48cf-a8d9-1fa877b5536b.png" alt="Hypothesize about your end result" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>The spiral game example showed why this works: working backwards from the goal reveals systematic solutions that working forward obscures.</p>
<p>Drawing backwards forces product connection because you literally start with product output. It integrates with the Research Tree (<a target="_blank" href="%7B#heading-chapter-4-the-research-tree%7D">chapter 4</a>): While drawing backwards identifies <em>what</em> questions matter, the Tree helps you explore approaches for answering them.</p>
<p><strong>Stage 3: Validate and build iteratively</strong> (<a class="post-section-overview" href="#heading-chapter-8-end-to-end-iterations">chapter 8</a>)</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768922924949/8d294792-5b29-4e42-b309-d613240d91d7.png" alt="The five principles of end-to-end iterations" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Drawing backwards alone has two limitations:</p>
<ul>
<li><p>Your manually-created "ideal output" might be technically infeasible to generate.</p>
</li>
<li><p>You haven't validated on real user data.</p>
</li>
</ul>
<p>End-to-end Iterations address both limitations. These two tools aren't competing approaches – rather, they complement one another:</p>
<h4 id="heading-how-the-three-stages-work-together">How the Three Stages Work Together</h4>
<p>These chapters form a complete methodology for product-led Research:</p>
<p><strong>Choose</strong> → <strong>Start from the end</strong> → <strong>Validate iteratively</strong></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-chapter-6-how-to-choose-research-initiatives"><strong>Chapter 6</strong></a> ensures you choose research that COULD have huge impact (strategic decision).</p>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-7-drawing-backwards"><strong>Chapter 7</strong></a> ensures you start from product value (planning backward from validated output).</p>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-8-end-to-end-iterations"><strong>Chapter 8</strong></a> ensures you continuously validate with real users.</p>
</li>
</ul>
<p>Each stage prevents different undesired, but painfully common outcomes:</p>
<ul>
<li><p><a class="post-section-overview" href="#heading-chapter-6-how-to-choose-research-initiatives"><strong>Chapter 6</strong></a> prevents pursuing research that won't matter (even if successful).</p>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-7-drawing-backwards"><strong>Chapter 7</strong></a> prevents building technically correct solutions that don't create product value.</p>
</li>
<li><p><a class="post-section-overview" href="#heading-chapter-8-end-to-end-iterations"><strong>Chapter 8</strong></a> prevents building infeasible solutions or solutions that fail on real data.</p>
</li>
</ul>
<p>You now have the complete answer to "How do I ensure Research impacts the product?":</p>
<ol>
<li><p><strong>Choose wisely</strong>: Only pursue research with clear, huge product impact.</p>
</li>
<li><p><strong>Start from product value</strong>: Manually create and validate desired output before technical work.</p>
</li>
<li><p><strong>Work backwards</strong>: Identify dependencies from output back to starting point.</p>
</li>
<li><p><strong>Build end-to-end fast</strong>: Get entire chain working with shortcuts and manual steps.</p>
</li>
<li><p><strong>Ship to real users</strong>: Validate on actual data, not just examples.</p>
</li>
<li><p><strong>Iterate based on learning</strong>: Improve the chain incrementally, prioritizing what teaches you most.</p>
</li>
</ol>
<p>This methodology keeps product impact central at every stage: choosing, planning, and executing. It prevents the most expensive failure: "successful" Research that doesn't affect the product.</p>
<h2 id="heading-book-summary-1">Book Summary</h2>
<p>You picked up this book because managing Research is different from managing Development – and you needed concrete tools to handle that difference.</p>
<h3 id="heading-what-you-learned-about-research">What You Learned About Research</h3>
<p>In <a class="post-section-overview" href="#heading-chapter-1-what-is-research">chapter 1</a>, you learned that Research isn't about difficulty or technical sophistication. It's about <strong>uncertainty of approach</strong> – that is, confronting problems where you don't know if a solution exists, where multiple approaches might work but you're not sure which, and where the path to success isn't immediately clear.</p>
<p>You saw how Alan Schoenfeld's problem-solving framework breaks down the Research process into four components:</p>
<ul>
<li><p><strong>Knowledge base</strong> – what you know.</p>
</li>
<li><p><strong>Heuristics</strong> – strategies for approaching problems.</p>
</li>
<li><p><strong>Control</strong> – monitoring and adjusting your approach.</p>
</li>
<li><p><strong>Beliefs</strong> – your mindset toward the problem.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768923077868/e6e88335-e990-40bc-b64b-dda8a07b989b.png" alt="Schoenfeld's Framework" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>The good news: all four can be improved with the right management and methods.</p>
<p>In <a class="post-section-overview" href="#heading-chapter-2-research-and-development">chapter 2</a>, you explored the distinction between Research and Development more deeply. You learned that your role as a Research leader has two parts:</p>
<ol>
<li><p><strong>Ensure Research connects to product impact</strong>: "Successful" Research that doesn't affect the product is a failed project. This is the most important part in Product-led companies.</p>
</li>
<li><p><strong>Ensure Research is done effectively</strong>: Even brilliant researchers benefit from structured approaches.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768923207501/5a8ba41f-247b-4831-9ccd-0f92185c7cf1.png" alt="Your Role as Research Leader" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>This two-part framework organized everything that followed.</p>
<h3 id="heading-how-to-do-research-effectively">How to Do Research Effectively</h3>
<p><strong>Part 2</strong> gave you concrete methods for effective Research execution – tools that work in <em>any</em> research context (whether it's Product-led or not).</p>
<p>In <a class="post-section-overview" href="#heading-chapter-3-why-methodology-matters-a-true-story">chapter 3</a>, I shared with you a personal reverse engineering classroom story. Students with sophisticated technical skills missed the obvious solution (checking the Help menu) because they lacked structured methodology. The story clearly showed that the problem isn't capability, it's often approach. This illustrated why you need the methods that follow.</p>
<p><a class="post-section-overview" href="#heading-chapter-4-the-research-tree">Chapter 4</a> introduced the <strong>Research Tree</strong> method – a living visual framework for systematically exploring solution paths. You learned:</p>
<ul>
<li><p>How to map questions you need to answer and approaches for answering them.</p>
</li>
<li><p>A decision framework for choosing which approach to try first: fastest feedback, lowest cost, best coverage.</p>
</li>
<li><p>How using the tree helps avoid common failure modes: jumping on the first idea, tunnel vision, inefficient learning, answering questions you don't need to, and lost context.</p>
</li>
</ul>
<p>The Research Tree helps you implement Schoenfeld's "control" component – helping you monitor and adjust your approach systematically rather than randomly trying things. It is helpful for a researcher, but as a Research leader, it allows you to guide your team effectively.</p>
<p>In <a class="post-section-overview" href="#heading-chapter-5-time-boxing-research-explorations">chapter 5</a>, you learned how to manage exploration without killing creativity. Since Research estimation is inherently difficult, time-boxing provides structure by setting time limits for specific research directions. After the allocated time, you stop to reconsider: What did you learn? Is this still the most promising path? This tool acknowledges uncertainty while preventing endless exploration, or diving too deep into rabbit holes that don't necessarily help your Product goals.</p>
<h3 id="heading-how-to-ensure-product-impact">How to Ensure Product Impact</h3>
<p><strong>Part 3</strong> focused on your most important responsibility: ensuring Research creates product value.</p>
<p><a class="post-section-overview" href="#heading-chapter-6-how-to-choose-research-initiatives">Chapter 6</a> showed you how to choose what directions to research – and more importantly, what <em>not</em> to pursue. You learned the distinction between:</p>
<ul>
<li><p><strong>Problem-driven research</strong> – starting from customer pain points (strongly preferred, lower risk).</p>
</li>
<li><p><strong>Opportunity-driven research</strong> – starting from new technologies (higher risk, needs validation).</p>
</li>
</ul>
<p>Before pursuing any Research initiative, you need to answer three questions:</p>
<ol>
<li><p><strong>Product impact</strong>: Will success create huge value?</p>
</li>
<li><p><strong>Time to impact</strong>: How long until we see product results?</p>
</li>
<li><p><strong>Resources</strong>: Do you and your team have the knowledge, capacity, and dependencies needed?</p>
</li>
</ol>
<p>You saw how to run focused pre-research checks to answer these questions.</p>
<p><a class="post-section-overview" href="#heading-chapter-7-drawing-backwards">Chapter 7</a> introduced a powerful heuristic for ensuring product connection: <strong>start from the end and work backwards</strong>. Through the spiral game example, you saw how working backwards reveals systematic solutions that working forward obscures.</p>
<p>In the COBOL business rules case study, you saw a practical application:</p>
<ol>
<li><p>Start by manually creating the desired output (before solving any technical challenges).</p>
</li>
<li><p>Validate that output with stakeholders.</p>
</li>
<li><p>Work backwards through dependencies, solving them in reverse order.</p>
</li>
<li><p>Validate that each step contributes to the goal before major investment.</p>
</li>
</ol>
<p>Drawing backwards forces connection to product impact because you must start with the product goal.</p>
<p><a class="post-section-overview" href="#heading-chapter-8-end-to-end-iterations">Chapter 8</a> showed you how to address two limitations of the drawing backwards heuristic through <strong>continuous end-to-end iterations</strong>. Your manually-created "ideal output" might be infeasible to generate, and you haven't validated it on real data. End-to-end iterations solve both problems.</p>
<p>You learned five principles to run effective end-to-end iterations:</p>
<ol>
<li><p><strong>Outline the end-to-end process</strong>: Drawing backwards already gives you this chain.</p>
</li>
<li><p><strong>Get to end-to-end by simplifying</strong>: Use shortcuts and manual steps to make the whole chain work.</p>
</li>
<li><p><strong>Ship it as fast as you can</strong>: Real data teaches what theory can't.</p>
</li>
<li><p><strong>Gradually replace steps</strong>: Prioritize based on learned necessity, learning potential, and effort.</p>
</li>
<li><p><strong>Get frequent feedback</strong>: Fast cycles focused on learning.</p>
</li>
</ol>
<p>Drawing backwards reveals what to build and in what order, while end-to-end iterations prove it works and build it incrementally.</p>
<h3 id="heading-your-toolkit-for-research-management">Your Toolkit for Research Management</h3>
<p>You now have a complete toolkit:</p>
<p><strong>From Part 2 (works for <em>any</em> research):</strong></p>
<ul>
<li><p><strong>Research Tree</strong> – maps solution space, chooses approaches systematically.</p>
</li>
<li><p><strong>Time-boxing</strong> – manages exploration with structure.</p>
</li>
</ul>
<p><strong>From Part 3 (specifically for product impact):</strong></p>
<ul>
<li><p><strong>Choosing frameworks</strong> – decides what deserves Research effort.</p>
</li>
<li><p><strong>Drawing backwards</strong> – forces product connection from the start.</p>
</li>
<li><p><strong>End-to-end iterations</strong> – validates feasibility and learns from real users.</p>
</li>
</ul>
<p>These methods work together. Drawing backwards identifies your goal and the chain of steps. The Research Tree maps approaches for each step. Time-boxing prevents getting deep into a rabbit hole when you should reconsider based on what you've learned, while acknowledging the inability to provide exact time estimates on Research tasks. End-to-end iterations validate and build incrementally.</p>
<h3 id="heading-my-message-to-you">My Message To You</h3>
<p>Research is fundamentally uncertain work. Applying traditional Development management practices fails because it assumes known solution paths, predictable timelines, and steady progress.</p>
<p>But Research doesn't have to be mystical or random. With the right frameworks, you can manage it systematically while maintaining focus on what matters: creating product value. You learned to ensure that Research is done effectively (part 2) and that it connects to product impact (part 3). You have concrete tools, real examples, and a clear framework for both responsibilities.</p>
<p>Now go turn your team's uncertain Research into systematic progress toward measurable product impact. I am confident that you can lead Research teams to success, and I would be happy to hear about your experiences applying these methods.</p>
<p>If you liked this book, please share it with more people.</p>
<h3 id="heading-acknowledgements">Acknowledgements</h3>
<p>I am extremely lucky to have such wonderful people supporting me in this journey.</p>
<p>Abbey Rennemeyer has been a wonderful editor. Abbey had edited my posts for freeCodeCamp over the past few years, as well as my previous book Gitting Things Done, so I knew she was the perfect fit for this book as well. Her insights both on the content and the writing style have greatly improved this book.</p>
<p>Quincy Larson founded the amazing community at freeCodeCamp. I thank him for starting this incredible community, his ongoing support, and for his friendship.</p>
<p>Estefania Cassingena Navone designed the cover of this book. I am grateful for her professional work and her patience with my perfectionism and requests.</p>
<p>Beta readers who contributed their time and mind to read through an unfinished version of this book to improve it for all of you have helped me get it to its current shape. Specifically, I would like to thank Jason S. Shapiro and Omer Gull for their insights.</p>
<p>Dr. David Ginat introduced me first to Alan Schoenfeld's problem-solving research during my time at Tel Aviv University. His teaching inspired me to apply these ideas in practical contexts, including Research management.</p>
<p>I was privileged to work with many brilliant researchers and engineering leaders over the years, too many to name here.</p>
<p>To readers of my previous book, <a target="_blank" href="https://www.freecodecamp.org/news/gitting-things-done-book/">Gitting Things Done</a>, who were kind enough to provide feedback and support – you are awesome. Receiving your emails and comments made me feel like there is a real reason to keep writing.</p>
<h3 id="heading-if-you-wish-to-support-this-book">If You Wish to Support This Book</h3>
<p>If you would like to support this book, you are welcome to buy <a target="_blank" href="https://buymeacoffee.com/omerr/e/505520">E-Book version</a>, <a target="_blank" href="https://amzn.to/46aCxnO">Paperback</a>, <a target="_blank" href="https://amzn.to/4tD1T7O">Hardc</a><a target="_blank" href="https://amzn.to/46aCxnO">over</a> , or <a target="_blank" href="https://buymeacoffee.com/omerr">buy me a coffee</a>. Thank you!</p>
<h3 id="heading-contact-me">Contact Me</h3>
<p>If you liked something about this book, or felt that something was missing or needed improvement – I would love to hear from you. Please reach out at: <code>gitting.things@gmail.com</code>.</p>
<p>Thank you for learning and allowing me to be a part of your journey.</p>
<p>- Omer Rosenbaum</p>
<h1 id="heading-about-the-author"><strong>About the Author</strong></h1>
<p><a target="_blank" href="https://www.linkedin.com/in/omer-rosenbaum-034a08b9/">Omer Rosenbaum</a> is an established technologist and writer. He's the author of the <a target="_blank" href="https://youtube.com/@BriefVid">Brief YouTube Channel</a>, and the books <a target="_blank" href="https://www.freecodecamp.org/news/gitting-things-done-book/">Gitting Things Done</a> and <a target="_blank" href="https://data.cyber.org.il/networks/networks.pdf"><strong>Computer Networks (in Hebrew)</strong></a><strong>.</strong> He's also a cyber training expert and founder of Checkpoint Security Academy.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How IPv4 Works – A Handbook for Developers ]]>
                </title>
                <description>
                    <![CDATA[ The Internet Protocol version 4 (IPv4) is one of the core protocols of standards-based internetworking methods in the Internet and other packet-switched networks. IPv4 is still the most widely deployed Internet protocol. Google’s IPv6 Statistics show... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-ipv4-works-a-handbook-for-developers/</link>
                <guid isPermaLink="false">68124752fa6fa6a9a91d9994</guid>
                
                    <category>
                        <![CDATA[ IPv4 ]]>
                    </category>
                
                    <category>
                        <![CDATA[ computer networks ]]>
                    </category>
                
                    <category>
                        <![CDATA[ networking ]]>
                    </category>
                
                    <category>
                        <![CDATA[ network ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ip address ]]>
                    </category>
                
                    <category>
                        <![CDATA[ IP ]]>
                    </category>
                
                    <category>
                        <![CDATA[ handbook ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Omer Rosenbaum ]]>
                </dc:creator>
                <pubDate>Wed, 30 Apr 2025 15:52:50 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1746028336196/79d97781-a9b8-4be3-86a1-47322e9640ff.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>The Internet Protocol version 4 (IPv4) is one of the core protocols of standards-based internetworking methods in the Internet and other packet-switched networks. IPv4 is still the most widely deployed Internet protocol. <a target="_blank" href="https://www.google.com/intl/en/ipv6/statistics.html">Google’s IPv6 Statistics</a> show 44.29% of traffic to Google services on April 24, 2025 is over IPv6, implying 55.71% goes over IPv4.</p>
<p>This handbook will take you through every aspect of IPv4, from understanding IP addresses to examining packet headers and fragmentation. You'll learn:</p>
<ul>
<li><p>How IP addresses work and their different formats</p>
</li>
<li><p>Network addressing schemes from fixed-length to CIDR</p>
</li>
<li><p>Special IPv4 addresses and their uses</p>
</li>
<li><p>The structure and purpose of every field in the IPv4 header</p>
</li>
<li><p>How IPv4 handles packet fragmentation across different networks</p>
</li>
</ul>
<p>Whether you're a network engineer, software developer, or IT professional, understanding IPv4 is crucial for working with modern computer networks.</p>
<h3 id="heading-what-well-cover">What we’ll cover:</h3>
<ol>
<li><p><a class="post-section-overview" href="#heading-background">Background</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-understanding-ip-addresses">Understanding IP Addresses</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-network-id-and-host-id">Network ID and Host ID</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-determine-network-vs-host-portions">How to Determine Network vs. Host Portions</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-fixed-length-approach">Fixed-Length Approach</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-are-the-disadvantages-here">What are the disadvantages here? 🤔</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-classful-addressing">Classful Addressing</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-ip-address-assignment">IP Address Assignment</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-are-the-disadvantages-here-1">What are the disadvantages here? 🤔</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-cidr-classless-interdomain-routing">CIDR: Classless Interdomain Routing</a></p>
<ul>
<li><a class="post-section-overview" href="#heading-real-world-example">Real-world Example</a></li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-subnet-masks">Subnet Masks</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-interim-summary-ipv4-addresses">Interim Summary – IPv4 Addresses</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-test-yourself">Test Yourself</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-converting-between-prefix-notation-and-subnet-masks">Converting Between Prefix Notation and Subnet Masks</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-working-backwards-with-subnet-masks">Working Backwards with Subnet Masks</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-non-byte-aligned-prefixes">Non-Byte-Aligned Prefixes</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-determining-network-membership">Determining Network Membership</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-special-ipv4-addresses">Special IPv4 Addresses</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-the-this-host-address-0000">The "This Host" Address: 0.0.0.0</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-this-network-addresses">"This Network" Addresses</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-broadcast-addresses">Broadcast Addresses</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-loopback-addresses-1270008">Loopback Addresses: 127.0.0.0/8</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-summary-of-special-ipv4-addresses">Summary of Special IPv4 Addresses</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-ipv4-header">IPv4 Header</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-the-header-structure">The Header Structure</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-ipv4-header-interim-summary">IPv4 Header – Interim Summary</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-ipv4-fragmentation">IPv4 Fragmentation</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-why-fragmentation-is-needed">Why Fragmentation Is Needed</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-fragmentation-works-in-ip">How Fragmentation Works in IP</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-identification-field">Identification Field</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-fragment-offset">Fragment Offset</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-more-fragments-and-dont-fragment-flags">More Fragments and Don't Fragment Flags</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-fragmentation-example">Fragmentation Example</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-ipv4-fragmentation-summary">IPv4 Fragmentation – Summary</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-summary-ipv4">Summary – IPv4</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-addressing-and-network-structure">Addressing and Network Structure</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-ipv4-header-structure">IPv4 Header Structure</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-fragmentation">Fragmentation</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-final-words">Final Words</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-about-the-author">About the Author</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-additional-references">Additional References</a></p>
</li>
</ol>
<h2 id="heading-quick-notes-before-we-start">Quick notes before we start</h2>
<ol>
<li><p>You can find more content about computer networks on my YouTube channel: <a target="_blank" href="https://www.youtube.com/playlist?list=PL9lx0DXCC4BMS7dB7vsrKI5wzFyVIk2Kg">Computer Networks Playlist</a></p>
</li>
<li><p>I am working on a book about Computer Networks! Are you interested in reading the initial versions and providing feedback? Send me an email: <a target="_blank" href="mailto:gitting.things@gmail.com">gitting.things@gmail.com</a></p>
</li>
</ol>
<h2 id="heading-background">Background</h2>
<p>IP stands for "Internet Protocol", so IPv4 is Internet Protocol version 4. It was described in RFC 791 by IETF, published in September 1981, and first deployed for production in 1982 on SATNET (the Atlantic Packet Satellite Network), which was an early satellite network that formed an initial segment of the Internet.</p>
<p>IPv4 is connectionless and operates in a best-effort delivery model. This means it doesn't guarantee delivery, correct ordering of packets, or the validity of the data. It's designed to be fast and flexible.</p>
<h2 id="heading-understanding-ip-addresses">Understanding IP Addresses</h2>
<p>IP addresses are hierarchical, logical addresses that power most internet connections today. Each consists of <code>4</code> bytes, or <code>32</code> bits. They're usually written in dotted decimal notation, for example:</p>
<p><a target="_blank" href="https://www.youtube.com/watch?v=zlDkqP3lMmU"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744039300370/348d757a-c6b0-4930-8e3a-ee753c45f3fa.png" alt="An example IPv4 address (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></a></p>
<p>Test yourself – Does the following address represent a valid IP address?</p>
<p><a target="_blank" href="https://www.youtube.com/watch?v=zlDkqP3lMmU"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744039900249/587d8b94-1ac3-478c-87d9-4b0fd97023b2.png" alt="Is this a valid IPv4 address? (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></a></p>
<p>No. Since the dots separate different bytes, each value must be between <code>0</code> and <code>255</code>. Since the number <code>392</code> is bigger than <code>255</code>, it cannot be represented in a single byte.</p>
<p><a target="_blank" href="https://www.youtube.com/watch?v=zlDkqP3lMmU"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744040039746/71392606-7ac8-441d-ac36-2cf05bb8d67f.png" alt="This is not a valid IPv4 address (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></a></p>
<h2 id="heading-network-id-and-host-id">Network ID and Host ID</h2>
<p>IP addresses have two parts: a <strong>network identifier</strong> (or network ID) that belongs to all hosts in the network and a <strong>host identifier</strong> (or host ID) that identifies the specific host in this network.</p>
<p>The network identifier will be the same for all hosts in the network, and is also called a "prefix". For example, consider a network identifier of <code>201.22.3</code>. Given that this is the network prefix, the following addresses:</p>
<pre><code class="lang-plaintext">201.22.3.15
201.22.3.91
</code></pre>
<p>Are part of the same network, as they share the same prefix. The first address belongs to host number <code>15</code> in this network, and the second belongs to host number <code>91</code>.</p>
<p>This address has a different prefix, or a different network identifier, and thus belongs to a different network:</p>
<pre><code class="lang-plaintext">201.22.14.50
</code></pre>
<p>In the examples above, there's a network identifier consisting of 3 bytes, or 24 bits, and a host identifier consisting of 1 byte, or 8 bits.</p>
<p><a target="_blank" href="https://www.youtube.com/watch?v=zlDkqP3lMmU"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744040184260/2511a5f3-3a98-40e4-aabe-7853e3febacf.png" alt="Network Identifier vs Host Identifier (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></a></p>
<h2 id="heading-how-to-determine-network-vs-host-portions">How to Determine Network vs. Host Portions</h2>
<p>A question arises: how do you know which bits are part of the network ID, and which are part of the host ID? Several approaches have evolved over time to address this challenge.</p>
<h3 id="heading-fixed-length-approach">Fixed-Length Approach</h3>
<p>Let's consider this solution: For every IP address, the first, most-significant byte would represent the network ID, and the remaining three, least-significant bytes would represent the host ID. This way it's really easy to read IP addresses. For example for this address:</p>
<pre><code class="lang-plaintext">20.12.1.92
</code></pre>
<p>You know that it describes network <code>20</code>, and the host <code>12.1.92</code> inside that network. Any IP address that doesn't start with <code>20</code>, such as <code>22.1.2.3</code>, would reside in a different network, and any IP address that starts with <code>20</code>, like <code>20.1.2.3</code>, would be within the same network.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744040959545/38c8766b-5ad2-4fb1-98b1-612c70fbe8ad.png" alt="Fixed-Length approach for IP addressing (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-what-are-the-disadvantages-here">What are the disadvantages here? 🤔</h3>
<p>With only one byte (8 bits) to represent the network ID, you only have 2^8, or <code>256</code>, different networks. Of course, there are far more networks than that in the real world. Even in the early days of the internet, universities and large companies each needed their own network identifiers.</p>
<p>In general, using a fixed length for the network ID and a fixed length for the host ID is not flexible enough. If you decide that the two most-significant bytes will represent the network ID and the two least-significant bytes will represent the host ID, you can represent up to 2^16, or <code>65,536</code> networks, which is also not enough. Furthermore, some networks, such as those of large companies, might require more than <code>65,536</code> host IDs.</p>
<h2 id="heading-classful-addressing">Classful Addressing</h2>
<p>The solution lies in providing some flexibility. Consider another approach called "classful addressing". In this approach, the number of bits dedicated for the network ID changes from one address to another, and you can tell the network ID by looking at the first, most-significant byte of the address.</p>
<ul>
<li><p>Any address starting with a number between <code>1</code> and <code>127</code> belongs to "Class A", meaning that its network ID consists of 1 byte, leaving 3 bytes for the host ID.</p>
</li>
<li><p>Any address starting with a number between <code>128</code> and <code>191</code> belongs to "Class B", which means that its network ID is 2 bytes long, and its host ID is also 2 bytes long.</p>
</li>
<li><p>Any address starting with a number between <code>192</code> and <code>223</code> belongs to "Class C", so it has 3 bytes of a network ID, and 1 byte of host ID.</p>
</li>
</ul>
<p>You can see the full representation of this approach in the table below:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Class</td><td>First Byte Range</td><td>Network ID Size</td><td>Host ID Size</td></tr>
</thead>
<tbody>
<tr>
<td>A</td><td><code>1</code> - <code>127</code></td><td>1 byte</td><td>3 bytes</td></tr>
<tr>
<td>B</td><td><code>128</code> - <code>191</code></td><td>2 bytes</td><td>2 bytes</td></tr>
<tr>
<td>C</td><td><code>192</code> - <code>223</code></td><td>3 bytes</td><td>1 byte</td></tr>
<tr>
<td>D</td><td><code>224</code> - <code>239</code></td><td>(multicast)</td><td></td></tr>
<tr>
<td>E</td><td><code>240</code> - <code>255</code></td><td>(reserved)</td></tr>
</tbody>
</table>
</div><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744088968355/e7f128c0-3173-4bb5-8872-3f820de6b354.png" alt="Classful addressing approach (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>For example, what class does this address belong to?</p>
<pre><code class="lang-plaintext">(1) 130.12.204.5
</code></pre>
<p>Since it starts with <code>130</code>, which is between <code>128</code> and <code>191</code>, it belongs to "Class B". This means that its network ID is <code>130.12</code>, and its host ID is <code>204.5</code>. Let's mark it as "address number 1".</p>
<p>Do this address and the following address (2) belong to the same network?</p>
<pre><code class="lang-plaintext">(2) 130.90.2.40
</code></pre>
<p>No, since they have different network identifiers, they are not within the same network.</p>
<p>What class does the following address belong to?</p>
<pre><code class="lang-plaintext">(3) 200.1.1.9
</code></pre>
<p>It belongs to class C, as the value of its first byte, <code>200</code>, is between <code>192</code> and <code>223</code>. This means that its network identifier is <code>200.1.1</code>, and any address starting with this prefix will reside within the same network. This specific address describes host <code>9</code> within this network.</p>
<p>To complete the picture, addresses starting with a value between <code>224</code> and <code>239</code> belong to "Class D" – that is, multicast addresses – addresses that belong to multiple devices. Addresses starting with a value between <code>240</code> and <code>255</code> were reserved for future use. Addresses starting with <code>0</code> are special addresses.</p>
<h3 id="heading-ip-address-assignment">IP Address Assignment</h3>
<p>In the early internet, IPv4 addresses were assigned to organizations by the Internet Assigned Numbers Authority (IANA). As the internet grew, this responsibility was distributed to five Regional Internet Registries (RIRs) that handle address allocation for different geographic regions. Large organizations would receive blocks of addresses based on their needs, with address classes determining the size of these blocks.</p>
<h3 id="heading-what-are-the-disadvantages-here-1">What are the disadvantages here? 🤔</h3>
<p>While classful addressing allows for more flexibility compared to the fixed-length approach, even this approach isn't flexible enough.</p>
<p>Consider this scenario: A small startup company with just two founders needs a network identifier. Which class would they need?</p>
<p>Getting a class A or class B would be excessive, so they might get a class C – allowing <code>256</code> addresses. This is more than currently needed, but allows some expansion. What happens if the startup grows to more than <code>256</code> employees (and devices)?</p>
<p>At this point, they would need to get a class B address, giving no less than <code>65,536</code> addresses, when all they need is a bit over <code>256</code> addresses. This means wasting more than <code>60,000</code> addresses.</p>
<p>This became a real problem in the early 1990s as the internet was growing faster. The need for more IP addresses became apparent, and there was an impending exhaustion of the IPv4 address space. Cases where <code>60,000</code> addresses were wasted could no longer be tolerated.</p>
<h2 id="heading-cidr-classless-interdomain-routing">CIDR: Classless Interdomain Routing</h2>
<p>One of the measures to handle this shortage of addresses was to abandon classful addressing in 1993 and switch to another approach called CIDR – Classless Interdomain Routing. This approach is still used today.</p>
<p>CIDR allows for flexibility when choosing the network ID and the host ID. It lets network administrators create subnets of precisely the right size, rather than being limited to Classes A, B, or C.</p>
<p>Let's start with a simple example. In CIDR notation, we add a suffix indicating how many bits are used for the network portion:</p>
<pre><code class="lang-plaintext">(4) 200.8.3.1/16
</code></pre>
<p>This slash notation specifies how many bits describe the network ID. In example (4) above, the first <code>16</code> bits (or <code>2</code> bytes) are used for the network ID. So, in this case, <code>200.8</code> is the network identifier, and <code>3.1</code> is the host identifier. The fact that <code>200.8</code> is the network ID means that all addresses from <code>200.8.0.0</code> through <code>200.8.255.255</code> are in this network.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744090490906/0a18b364-7ca2-4ed0-8f27-2103bcbdd579.png" alt="16-bit subnet mask address (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Consider these additional addresses:</p>
<pre><code class="lang-plaintext">(5) 200.2.13.5
(6) 200.8.21.6
</code></pre>
<p>Given this address prefix of <code>16</code> bits, or <code>2</code> bytes, which of these addresses belong to the same network as example (4) (<code>200.8.3.1/16</code>)?</p>
<p>The first address (5) (<code>200.2.13.5</code>) does not belong to this network, as its first <code>16</code> bits – <code>200.2</code>, are different from the first <code>16</code> bits of the example address.</p>
<p>The second address (6) (<code>200.8.21.6</code>) does belong to the same network as that of the example address.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744090582529/d314c9ca-73a3-4e48-92b8-b0a6c24ac7d3.png" alt="16-bit subnet mask address (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-real-world-example">Real-world Example</h3>
<p>In practice, an ISP might receive a large block like <code>104.16.0.0/12</code> from the RIR. This gives them control of all addresses from <code>104.16.0.0</code> to <code>104.31.255.255</code>. The ISP can then allocate smaller subnets to customers, such as giving a small business a <code>/24</code> subnet with <code>256</code> addresses, or a larger company a <code>/20</code> subnet with <code>4,096</code> addresses.</p>
<h2 id="heading-subnet-masks">Subnet Masks</h2>
<p>Another way to express the network prefix is by using a <a target="_blank" href="https://www.ipxo.com/blog/what-is-subnet-mask/">subnet mask</a>, like so:</p>
<pre><code class="lang-plaintext">255.255.0.0
</code></pre>
<p>When converted to binary, <code>255</code> in decimal equals eight <code>1</code>s in binary – so all bits are on. So if you translate this mask into binary, you get:</p>
<pre><code class="lang-plaintext">11111111 11111111 00000000 00000000
</code></pre>
<p>In other words, <code>16</code> bits are on, which means a network prefix of <code>16</code> bits. Both conventions (CIDR notation and subnet masks) are used very frequently.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744090679551/5466e739-1e1b-4e34-a044-0d680ca9ad6e.png" alt="16-bit subnet mask address (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>With CIDR, an address can reside in different networks given different network prefixes, or subnet masks. If you consider the same example address with a different prefix, say that of <code>8</code> bits – both additional addresses would belong to the same network, as they all share the first <code>8</code> bits – <code>200</code>.</p>
<p>How would you present a network prefix of <code>8</code> bits as a subnet mask? You need the first <code>8</code> bits to be on, so that means <code>255</code> in decimal, and the remaining bits are off, resulting in this subnet mask:</p>
<pre><code class="lang-plaintext">255.0.0.0
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744141258583/c4f606ff-410b-4b1f-92c5-505b5309cfa8.png" alt="8-bit subnet mask address (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>What happens if you use a network prefix of <code>24</code> bits? First, how would you express that as a subnet mask? You need <code>24</code> bits to be on, so that is 3 times 8 bits to be on, resulting in:</p>
<pre><code class="lang-plaintext">255.255.255.0
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744297152994/0dae747f-2a10-4ad6-9e29-b21df15e6169.png" alt="24-bit subnet mask address (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now, neither of the additional addresses reside within the same network as the example address, as they don't share its network ID of <code>200.8.3</code>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744297174124/16ad2016-c358-474b-964c-4bde75359670.png" alt="CIDR (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Note that network prefixes do not have to represent full bytes. For example, you can use a network prefix of <code>12</code> bits, or <code>11</code> bits, or <code>22</code> bits. When the prefix length isn't a multiple of <code>8</code>, the subnet mask will have a value other than <code>0</code> or <code>255</code> in one of its positions.</p>
<p>This addresses the issue regarding the startup company. If a startup has <code>300</code> employees, they'd need to get a <code>23</code>-bits network ID, leaving <code>9</code> bits for hosts within their networks. This means 2^9, or <code>512</code> addresses, which should be sufficient.</p>
<h2 id="heading-interim-summary-ipv4-addresses">Interim Summary – IPv4 Addresses</h2>
<p>In this section, you've learned about IPv4 addresses. IP addresses are hierarchical, logical addresses that consist of <code>4</code> bytes. IP addresses have two parts: a network identifier that belongs to all hosts in the network, and a host identifier which identifies the specific host in the network.</p>
<p>You've explored various options for determining the network identifier and the host identifier:</p>
<ol>
<li><p>Fixed-length approach – too rigid and limited</p>
</li>
<li><p>Classful addressing approach – better but still wasteful</p>
</li>
<li><p>CIDR (Classless Interdomain Routing) – flexible and efficient</p>
</li>
</ol>
<p>CIDR provides much more flexibility and helps overcome the significant problem of IPv4 address shortage. However, CIDR is only one part of addressing the shortage of IPv4 addresses, with other solutions including NAT (Network Address Translation) and eventually, IPv6.</p>
<p>The next section will explore special IPv4 addresses and then examine the header of IPv4 packets.</p>
<h2 id="heading-test-yourself">Test Yourself</h2>
<p>Now practice the concepts you've learned and make sure you feel comfortable with them.</p>
<p>Take a moment to try answering the following questions before checking the answers.</p>
<h3 id="heading-converting-between-prefix-notation-and-subnet-masks">Converting Between Prefix Notation and Subnet Masks</h3>
<p>How would you represent a network prefix of <code>16</code> bits, written like this <code>/16</code>, as a subnet mask?</p>
<p>You need <code>16</code> bits that are on. When <code>8</code> bits are on you get <code>255</code> in decimal, so you'd use:</p>
<pre><code class="lang-plaintext">255.255.0.0
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744465135834/ff449f60-e660-4fea-b427-994a87be2c89.png" alt="16-bit subnet mask address (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Given this network prefix, do these addresses belong to the same network?</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744465178617/ef7ddeca-86b2-4bb2-8e1d-471ef4f64a45.png" alt="Do these addresses fit in the network defined before? (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Yes, they do, as they share the same most-significant <code>16</code> bits, or two bytes</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744465209149/25744a22-16b3-484d-9821-12920dd59be4.png" alt="These addresses fit in the same network (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Does this address belong to the same network as that of the previous addresses?</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744465232371/92bcb42c-5067-43e6-8cec-1eae9347d16a.png" alt="Additional address (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Yes, it does. Again, it shares the same two most-significant bytes.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744465259087/a4b9c525-3b4d-4501-bcf8-db62ebf47247.png" alt="This address also fits in the network defined before (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>What about this one? Does it belong to the same network as the previous addresses?</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744465285214/f57fd6c2-7665-4565-943e-959b981fedc8.png" alt="Additional address. Does this address fit in the network defined before? (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>No, as the first two bytes are not <code>42.31</code> – this is a different network. So this address describes host <code>1.2</code>, within the network <code>42.32</code>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744465302503/0fdd959f-2d10-4a56-826d-e71604ca5267.png" alt="No, this address does not belong to the same network as the other ones (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-working-backwards-with-subnet-masks">Working Backwards with Subnet Masks</h3>
<p>Let's try the other way around. You have this subnet mask:</p>
<pre><code class="lang-plaintext">255.255.255.0
</code></pre>
<p>How would you express it using a network prefix?</p>
<p>You have three occurrences of <code>255</code>, which means three times <code>8</code> bits that are on, so overall you have <code>24</code> bits that are on. So you can also write <code>/24</code>. This means <code>3</code> bytes.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744465331643/b1f3ab4c-8e7e-449d-8879-fee3bf90ce1c.png" alt="24-bit subnet mask (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Given this subnet mask, do addresses (1) and (3) above belong to the same network?</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744465436680/ca71584d-53dc-4116-a109-d32c11e997ef.png" alt="Do these addresses have the same network ID given a 24-bit subnet mask? (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>They do, as they both have the same most-significant three bytes – network <code>42.31.93</code>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744465461745/c01f5958-f675-45c5-bc41-de857483e25d.png" alt="24-bit subnet mask (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>What about addresses (1) and (2)?</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744465532664/a0ef8f73-27d5-4488-98a9-1dbeaf457797.png" alt="Do these addresses have the same network ID given a 24-bit subnet mask? (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Given this network prefix, they don't belong to the same network. The first address belongs to network <code>42.31.93</code>, and the second address belongs to network <code>42.31.1</code>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744465498737/6d4cb056-126a-422f-94bc-4392a996869c.png" alt="24-bit subnet mask (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-non-byte-aligned-prefixes">Non-Byte-Aligned Prefixes</h3>
<p>Network prefixes do not have to align to <code>8</code> bits, or full bytes. Let's say you have a network prefix of <code>14</code> bits. How would you convert that to a subnet mask?</p>
<p>Well, the first byte is clear: you have <code>8</code> bits on, so the first byte is <code>255</code>. What about the next one?</p>
<p>In binary, you'd want to have six additional 1s, and then 2 0s – so in binary you'd write:</p>
<pre><code class="lang-plaintext">11111100
</code></pre>
<p>Converting to decimal, this binary number represents <code>252</code>. So your subnet mask is:</p>
<pre><code class="lang-plaintext">255.252.0.0
</code></pre>
<p>Another way to make this conversion: You know that eight 1s in binary represent <code>255</code> in decimal. You also know that <code>11</code> in binary is <code>3</code>, so you can simply subtract <code>3</code> from <code>255</code> and get <code>252</code>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744465576989/bb1a90c1-1563-4970-b0f5-e0f502e82563.png" alt="14-bit subnet mask (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Next, try the other way around. You have the following subnet mask:</p>
<pre><code class="lang-plaintext">255.255.224.0
</code></pre>
<p>How many bits represent the network prefix?</p>
<p>The first two bytes are clear: you have <code>16</code> bits. Converting the third byte to binary: <code>224</code> in decimal is <code>11100000</code> in binary. This means you have an additional three 1s, so you can write the subnet mask above as a prefix of <code>/19</code> bits – <code>16</code> bits for the two <code>255</code> bytes, and <code>3</code> additional bits for the <code>224</code> byte.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744465642118/2587e3bc-0c88-48a9-b876-b96fd3a493d1.png" alt="19-bit subnet mask (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-determining-network-membership">Determining Network Membership</h3>
<p>Let's consider the following addresses:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744465744667/86337750-0f67-4ed7-b8c2-7d6fcf330a71.png" alt="Two IP addresses (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Are they part of the same network? 🤔</p>
<p>It depends on the subnet mask.</p>
<p>If the network prefix is <code>/8</code>, then they are part of the same network, as they share the same network ID.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744465761356/67c590e1-daf5-4276-96ff-a39ee914d2d3.png" alt="8-bit subnet mask (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>On the other hand, if the network prefix is <code>/16</code>, then they have different network IDs, and thus don't belong to the same network. But what happens with prefixes in between? Will they reside in the same network for a prefix of <code>/9</code>? <code>/14</code>?</p>
<p>The way to approach this question is to convert the second byte of these addresses to binary. For the first address, this byte is <code>24</code>, which in binary is:</p>
<pre><code class="lang-plaintext">00011000
</code></pre>
<p>For the second address, the second byte is <code>23</code>, which in binary is:</p>
<pre><code class="lang-plaintext">00010111
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744465797029/fcbc4bd8-e273-4032-afb3-f10e2028738b.png" alt="12-bit subnet mask (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>You can see that the most significant <code>4</code> bits within the second byte are identical. If you add the first <code>8</code> bits of the address, you see that the most significant <code>12</code> bits of these addresses are the same.</p>
<p>So, if you have a network prefix of <code>/11</code>, do these addresses belong to the same network?</p>
<p>Yes, they do – their most significant <code>11</code> bits are identical.</p>
<p>What about <code>/13</code>?</p>
<p>No, with this network prefix, they don't share the same network identifier, as their <code>13</code>th bit is different.</p>
<p>This practice should help you feel comfortable with subnet masks and network prefixes. In the next section, you'll learn about special IP addresses and then examine the header of IP packets.</p>
<h2 id="heading-special-ipv4-addresses">Special IPv4 Addresses</h2>
<p>Now that you're comfortable with IP addresses and subnet masks, let's explore some IP addresses that have special meanings.</p>
<h3 id="heading-the-this-host-address-0000">The "This Host" Address: 0.0.0.0</h3>
<p>The address <code>0.0.0.0</code> means "this host" and is used in two scenarios:</p>
<p>First, when a machine boots up and doesn't yet have an IP address. IP addresses are logical addresses that need to be assigned to a machine. Prior to this assignment, a device has no IP address at all. If the device needs to communicate at this stage, it may use this special address, <code>0.0.0.0</code>.</p>
<p>Second, when writing network applications that need to listen for incoming connections on all network interfaces. For example, if a machine has two interfaces – one with the IP address <code>1.1.1.1</code>, and another with the address <code>2.2.2.2</code> – listening on the address <code>0.0.0.0</code> means accepting connections regardless of which network interface receives them.</p>
<h3 id="heading-this-network-addresses">"This Network" Addresses</h3>
<p>Another class of special addresses are those starting with zeros, where the zeros mean "this network."</p>
<p>For example, if you have a machine with the address:</p>
<pre><code class="lang-plaintext">12.34.55.55
</code></pre>
<p>And a network prefix of <code>16</code> bits, this machine can send a packet to another device on the network using its full address, for example <code>12.34.66.66</code>, or alternatively use the special zeros notation and send the packet to:</p>
<pre><code class="lang-plaintext">0.0.66.66
</code></pre>
<p>This means "send a packet to the host <code>66.66</code> on this network." Of course, the recipient must also know the relevant network prefix to correctly interpret this address.</p>
<h3 id="heading-broadcast-addresses">Broadcast Addresses</h3>
<p>The address <code>255.255.255.255</code>, where all bits are set to <code>1</code>, is the address of all hosts in the local network – the broadcast address. This is similar to the <a target="_blank" href="https://www.freecodecamp.org/news/the-complete-guide-to-the-ethernet-protocol/#heading-unicast-and-multicast-bits">broadcast address in Ethernet</a> (<code>FF:FF:FF:FF:FF:FF</code>). In both cases, all bits are set to <code>1</code>.</p>
<p>Using a proper network identifier where the host identifier is all set to 1s can be used to send a broadcast packet to remote networks. For example, consider a network <code>12.34.0.0/16</code> and another network with the network ID of <code>12.35.0.0/16</code>. If a machine at <code>12.34.55.55</code> wants to send a packet to all devices in the other network, it could use the destination address: <code>12.35.255.255</code>.</p>
<p>Even though this is allowed according to the IP specification (RFC), in practice this feature is often disabled as it can create security vulnerabilities.</p>
<h3 id="heading-loopback-addresses-1270008">Loopback Addresses: 127.0.0.0/8</h3>
<p>All addresses in the network <code>127.0.0.0/8</code> (that is, all addresses that start with <code>127</code>) are loopback addresses. Packets sent to any of these addresses are not put onto the physical network but are processed locally within the operating system. This is extremely useful for development and debugging.</p>
<p>For example, when developing a simple chat program, you need two clients that exchange data. One approach would be to use two different physical computers, but this is tedious – you'd need to write a message on one computer, check the other computer to see if it was received, then write a message on the second computer, and go back to the first to validate receipt.</p>
<p>A much simpler approach is to use a loopback address. Both clients can run on the same machine and connect with one another. You can run two different client programs on the same physical computer and exchange messages between them without needing an additional machine.</p>
<p>For instance, you might use the address <code>127.0.0.1</code>, with one client listening on port <code>1337</code> and the other on port <code>1338</code>. When client A sends a packet to client B, this packet never leaves your network card but remains within the operating system. Client B receives the packet from the loopback interface as if it had been received from the physical network.</p>
<p>After debugging is complete, your client code doesn't need to change – the only difference is that they will communicate using real IP addresses instead of the loopback address.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744736895494/fd1e4a8d-a834-4bf4-b4b9-1e83cf851161.png" alt="Loopback operation (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-summary-of-special-ipv4-addresses">Summary of Special IPv4 Addresses</h3>
<p>To summarize the special IPv4 addresses you've learned about:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Special Address</td><td>Meaning</td><td>Usage</td></tr>
</thead>
<tbody>
<tr>
<td><code>0.0.0.0</code></td><td>"This host"</td><td>Used during boot or to listen on all interfaces</td></tr>
<tr>
<td>Addresses starting with <code>0</code></td><td>"This network"</td><td>Sending to hosts on the local network</td></tr>
<tr>
<td><code>255.255.255.255</code></td><td>Broadcast</td><td>Sending to all hosts on the local network</td></tr>
<tr>
<td>Network ID with all 1s in host part</td><td>Directed broadcast</td><td>Sending to all hosts on a specific network</td></tr>
<tr>
<td><code>127.0.0.0/8</code></td><td>Loopback</td><td>Testing and debugging without using the physical network</td></tr>
</tbody>
</table>
</div><p>In the next section, you'll learn about the structure of the IPv4 header.</p>
<h2 id="heading-ipv4-header">IPv4 Header</h2>
<p>Now that you understand IP addresses, subnets, and special addresses, it's time to examine the IPv4 header structure in detail.</p>
<h3 id="heading-the-header-structure">The Header Structure</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745583720695/21521520-3029-4a0a-b4e7-fa484ca350ab.png" alt="IPv4 Header (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>The diagram above shows the header of IPv4 as defined in RFC 791. Let's examine each field:</p>
<h4 id="heading-version-4-bits">Version (4 bits)</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745589954987/cb357d49-73ab-43e6-93b5-c2b7c7e3eb4a.png" alt="Version field within IPv4 Header (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>The header starts with the Version field, which consists of four bits. For an IPv4 packet, the version is <code>4</code>, so this field will always carry the value of <code>4</code> (or <code>0100</code> in binary).</p>
<p>❓ Why does the header start with the Version field? 🤔</p>
<p>(Note – when I start a sentence with the ❓mark – it’s a question addressed at you, and I encourage you to try and answer it before reading on).</p>
<p>The reason is that the remaining fields may differ according to the version. If a network device reads an IP packet and the version field carries the value of <code>4</code>, it will expect the remainder of the packet to follow the IPv4 structure. If it carries another value, such as <code>6</code>, the remaining fields are different, as in IPv6.</p>
<h4 id="heading-internet-header-length-ihl-4-bits">Internet Header Length (IHL) (4 bits)</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745590070221/ca452338-299c-422c-aef4-8fe8569dd218.png" alt="IHL field within IPv4 Header (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>This field indicates the length of the header itself.</p>
<p>❓ Why do we need to specify the length? 🤔</p>
<p>Unlike <a target="_blank" href="https://www.freecodecamp.org/news/the-complete-guide-to-the-ethernet-protocol/">Ethernet</a>, where the header size is fixed, the IPv4 header length can vary because of optional fields. For an IP packet without special options, the header consists of <code>20</code> bytes, which is the most common case.</p>
<p>The IHL field doesn't specify the length in bytes directly but in units of 4-byte words. So to specify a length of <code>20</code> bytes, the value would be <code>5</code> (5 × 4 = 20). This encoding allows the field to use only 4 bits while specifying header lengths up to <code>60</code> bytes (when IHL = <code>15</code>).</p>
<p>A common IPv4 packet therefore begins with the byte <code>0x45</code> in hexadecimal, meaning it's version <code>4</code> of the IP protocol, and the header is <code>20</code> bytes long.</p>
<h4 id="heading-type-of-service-tos-8-bits">Type of Service (TOS) (8 bits)</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745590323255/e8a30561-bfbf-4bcd-a07c-3dbce88fc6c4.png" alt="TOS field within IPv4 Header (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>The idea behind this field is that not all packets are equally important. You may want to give priority to some packets over others.</p>
<p>For example, packets carrying real-time data (like voice or video conferencing) are more time-sensitive than packets carrying, say, email or file downloads. If a router is currently experiencing high load, it should ideally prioritize time-sensitive packets.</p>
<p>The Type of Service field allows senders to indicate the priority of their packets. However, on the public internet, this field is often ignored by routers because any sender can set any priority value. In most cases, this field carries the value of <code>0</code>.</p>
<h4 id="heading-total-length-16-bits">Total Length (16 bits)</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745590421285/07a4b428-3a97-4ea8-9006-5fd8bb215d95.png" alt="Total Length field within IPv4 Header (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>This field specifies the total length of the IP packet, including both the header and the payload (data).</p>
<p>❓ Why is this needed to specify the length? 🤔</p>
<p>Unfortunately, the IP layer doesn’t necessarily know if some of the bytes in the packet are actually a padding of the second layer. I described this in detail in <a target="_blank" href="https://www.freecodecamp.org/news/the-complete-guide-to-the-ethernet-protocol/#heading-the-problem-with-the-type-length-field">a previous post</a>, where I showed that in Ethernet protocol, in some cases, <a target="_blank" href="https://www.freecodecamp.org/news/the-complete-guide-to-the-ethernet-protocol/#heading-the-problem-with-the-type-length-field">the receiving Ethernet entity cannot tell which bytes belong to the payload and which bytes are simply padding</a>. The IP layer needs to know precisely which bytes belong to the actual packet, hence the Total Length field.</p>
<p>❓What is the maximum size of an IPv4 packet? 🤔</p>
<p>Since this field is <code>16</code> bits long, an IPv4 packet may contain a maximum of 2^16-1 bytes, or <code>65,535</code> bytes, including the header. The minimum size is <code>20</code> bytes, consisting of just the header without options or payload.</p>
<h4 id="heading-fragmentation-fields-32-bits">Fragmentation Fields (32 bits)</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745591136348/bb1035af-c967-4bb8-992c-c10e31b64cd1.png" alt="Fragmentation fields within IPv4 Header (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>The next four bytes are dedicated to fragmentation control. I’ll cover these fields in a separate section, as they involve a complex topic deserving special attention.</p>
<h4 id="heading-time-to-live-8-bits">Time to Live (8 bits)</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745591194176/3f3f98f6-b079-43d3-9ee3-b052b7f4f6d7.png" alt="TTL field within IPv4 Header (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Despite its name, this field doesn't actually measure time but rather the maximum number of routing hops a packet can traverse before being discarded.</p>
<p>To understand its purpose, consider this scenario: If Machine A sends a packet to Machine B through a series of routers, but there's a routing loop where Router 2 sends to Router 3, which sends to Router 4, which sends back to Router 2, the packet could circulate indefinitely, consuming bandwidth and never reaching its destination.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745775904428/72ba07f9-461d-483f-be16-773218d8f863.png" alt="A routing issue causing an infinite loop (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>The TTL field prevents this by setting a limit on how many hops a packet can take:</p>
<ol>
<li><p>The sender sets an initial TTL value (often <code>64</code> or <code>128</code>)</p>
</li>
<li><p>Each router that handles the packet decrements the TTL by <code>1</code></p>
</li>
<li><p>If a router receives a packet with TTL = <code>1</code>, it decrements it to <code>0</code> and discards the packet</p>
</li>
<li><p>The router then sends an ICMP "Time Exceeded" message back to the original sender</p>
</li>
</ol>
<p>This doesn't solve the underlying problem of routing loops, but it prevents packets from circulating forever.</p>
<p>In IPv6, this field is renamed "Hop Limit," which more accurately describes its function.</p>
<h4 id="heading-protocol-8-bits">Protocol (8 bits)</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745591243041/ab9be6ea-5f11-4bb1-b93f-f0d9deef0c6f.png" alt="Protocol field within IPv4 Header (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>This field describes the payload of the IPv4 packet. For example:</p>
<ul>
<li><p>A value of <code>6</code> means the payload is TCP</p>
</li>
<li><p>A value of <code>17</code> means the payload is UDP</p>
</li>
</ul>
<p>This helps the receiving system know which protocol handler should process the packet's contents. It's similar to <a target="_blank" href="https://www.freecodecamp.org/news/the-complete-guide-to-the-ethernet-protocol/#heading-type-length-field-ethernet-ii-type-2-bytes">the Type field in Ethernet</a>, which specifies the protocol of the layer encapsulated within the Ethernet frame.</p>
<h4 id="heading-header-checksum-16-bits">Header Checksum (16 bits)</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745591295127/9953fb34-2b2f-4c9f-bf39-7a18ceaf2b1a.png" alt="Header checksum field within IPv4 Header (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>This is a 16-bit checksum used to verify the validity of the header only (that is, excluding the payload). The sender computes this value based on the fields of the header, and the receiver also computes it to validate that the header was received correctly.</p>
<p>❓The checksum must be recalculated by each router. Why is that? 🤔</p>
<p>Because the TTL field changes at each hop. For example, if a packet starts with TTL = <code>7</code>, each router will:</p>
<ol>
<li><p>Verify the current checksum based on TTL = <code>7</code></p>
</li>
<li><p>Decrement TTL to <code>6</code></p>
</li>
<li><p>Calculate a new checksum based on TTL = <code>6</code></p>
</li>
<li><p>Forward the packet with the new checksum</p>
</li>
</ol>
<p>If the checksum verification fails, the device drops the packet. This prevents packets with corrupted headers (which might have incorrect destination addresses, for instance) from being forwarded.</p>
<h4 id="heading-source-and-destination-addresses-32-bits-each">Source and Destination Addresses (32 bits each)</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745591643443/b2409ba4-d2e3-468a-af2a-a71fc4ce4c30.png" alt="Source and Destination IP Addresses fields within IPv4 Header (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>These fields contain the source and destination IPv4 addresses, respectively. Each is 4 bytes (32 bits) long, as you learned in the previous sections on IPv4 addressing.</p>
<h4 id="heading-options-variable-length">Options (Variable Length)</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745591747762/66a3d602-4379-453a-b221-b4f694c3363c.png" alt="Options within IPv4 Header (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Most IPv4 packets don't include options, but when present, they can provide additional functionality:</p>
<ul>
<li><p><strong>Record Route</strong>: Each router that handles the packet adds its own address to this option, creating a trace of the packet's path</p>
</li>
<li><p><strong>Source Routing</strong>: Allows the sender to specify the route the packet should take:</p>
<ul>
<li><p>Strict Source Routing: The entire route must be followed exactly</p>
</li>
<li><p>Loose Source Routing: Certain routers must be traversed, but the exact path between them is flexible</p>
</li>
</ul>
</li>
</ul>
<h4 id="heading-padding">Padding</h4>
<p>In some cases, the header ends with padding bytes (usually <code>0</code>s).</p>
<p>❓Why does the IPv4 header have padding?🤔</p>
<p>As explained before, the IHL field specifies the header length in 4-byte units, so the total header length must be a multiple of 4 bytes. If options make the header length not divisible by 4, padding bytes (usually <code>0</code>) are added to reach the next multiple of 4.</p>
<p>For example, if you have 3 bytes of options, you would need 1 byte of padding to make the total header length a multiple of 4 bytes.</p>
<h3 id="heading-ipv4-header-interim-summary">IPv4 Header – Interim Summary</h3>
<p>You've now learned about the structure of the IPv4 header, with the exception of the fragmentation fields which I’ll cover in the next section.</p>
<p>The IPv4 header efficiently packs all the necessary routing and control information into a compact structure, typically 20 bytes long (without options). This design allows for fast processing by routers while providing the flexibility needed for internet communication. It is amazing how prominent IPv4 is, even so many years after its publication.</p>
<p>In the next section, you'll learn about IPv4 fragmentation.</p>
<h2 id="heading-ipv4-fragmentation">IPv4 Fragmentation</h2>
<p>In the previous section, you learned about most of the IPv4 header structure, with the exception of 32 bits dedicated to fragmentation. This topic deserves special attention, as it reveals important aspects of how IP packets travel across different networks.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745591136348/bb1035af-c967-4bb8-992c-c10e31b64cd1.png" alt="Fragmentation fields within IPv4 Header (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-why-fragmentation-is-needed">Why Fragmentation Is Needed</h3>
<p>To understand what fragmentation is and why it's needed, consider the following network scenario:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745770107962/b3bc6c7a-2adb-4868-893c-ec9e51303567.png" alt="Two networks with different MTUs (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>In this diagram, you have two different networks where Machine A resides in one network and Machine B resides in another. A router forwards packets between these two networks.</p>
<p>These two networks have different Maximum Transmission Units (MTUs). MTU refers to the maximum size of a frame that can be transmitted in a network. For example:</p>
<ul>
<li><p>Machine B is connected to an Ethernet network with an MTU of <code>1500</code> bytes</p>
</li>
<li><p>Machine A is connected to a different network with an MTU of <code>2000</code> bytes</p>
</li>
</ul>
<p>Different MTUs stem from the different protocols and hardware that different networks have. Ethernet has an MTU of <code>1500</code> bytes. This maximum size was chosen because RAM was expensive back in the late 1970s when Ethernet was planned, and a receiver would need more RAM if a frame could be bigger. Other networks were devised at different times where RAM prices might have been lower, or just have other considerations that affect the MTU.</p>
<p>Now, consider this scenario: Machine A wants to send a packet to Machine B. This packet is <code>1800</code> bytes long. From A's perspective, there's no problem since its network supports packets of this size. Machine A transmits the packet.</p>
<p>When the router receives this packet, it faces a problem: it cannot simply forward the packet to B's network because the packet is too big for the network's MTU. The router must <strong>fragment</strong> the packet – splitting it into smaller chunks of up to <code>1500</code> bytes, which will then be reassembled by Machine B.</p>
<h3 id="heading-how-fragmentation-works-in-ip">How Fragmentation Works in IP</h3>
<p>Let's examine the scenario further. The router needs to take an IP packet of <code>1800</code> bytes and split it into two fragments, each consisting of up to <code>1500</code> bytes. If Machine A sends another packet of <code>1800</code> bytes to Machine B, the router will have to split that one too – resulting in four different fragments that will be reassembled into two separate packets.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745770316245/b137efa8-ae1c-42cb-918a-f6d0ee7b2c3a.png" alt="Two IP packets, each consisting of two fragments (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>When Machine B receives these fragments, it must ensure that it reassembles fragment #1 together with fragment #2 of packet A, and fragment #1 with fragment #2 of packet B – and not, for instance, fragment #1 of packet A with fragment #2 of packet B. It must also reassemble the fragments in the correct order – so structure a packet that consists of #1#2 and not #2#1.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745770377464/12aad8f1-0251-4289-bc9a-75084dbc1f7a.png" alt="Possible issues in reassembling packets from two fragments (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-identification-field">Identification Field</h3>
<p>First, focus on making sure Machine B reassembles fragments of the same packet (for example, fragment #1 and fragment #2 of packet A in the example above, rather than fragment #1 of packet A and fragment #2 of packet B). This is achieved using the identification field of IPv4. Fragments belonging to the same packet will have the same identification value. For example, both fragments of packet A might have identification set to <code>100</code>, and both fragments of packet B might have identification of <code>200</code>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745770785114/6f04e59b-adfc-44a9-bf6e-1118ab748160.png" alt="The identification fields ensures fragments of the same original packet are reassembled together (Source: https://youtube.com/BriefVid)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>It's important to note that sharing identification values isn't sufficient for fragments to belong to the same packet. Fragments of the same packet must also share:</p>
<ul>
<li><p>The same source IP address</p>
</li>
<li><p>The same destination IP address</p>
</li>
<li><p>The same protocol value (indicating whether the payload is TCP, UDP, and so on)</p>
</li>
</ul>
<h3 id="heading-fragment-offset">Fragment Offset</h3>
<p>Since IP is a connectionless protocol, there's no guarantee that fragments will arrive at Machine B in the correct order. Fragment #2 of packet A may arrive before fragment #1. To handle this issue, each fragment carries an Offset field, which denotes the offset from the beginning of the original packet.</p>
<p>The Offset field consists of 13 bits, which means it can carry values from <code>0</code> to <code>8191</code> (2^13-1). This poses a potential problem, as the maximum size of an IP packet can be <code>65,535</code> bytes (since the Total Length field of the IP header consists of 16 bits).</p>
<p>To address this limitation, the value encoded in the Offset field is actually multiplied by <code>8</code> (2^3). This means the minimum size of a fragment is <code>8</code> bytes, with the exception of the last fragment.</p>
<p>❓Why do IP packets carry an offset in bytes divided by 8, instead of just a sequential fragment number?🤔</p>
<p>While using sequence numbers might seem simpler, it would create problems when packets need to be fragmented multiple times.</p>
<p>For example, if Computer A sends a packet to the first router, which fragments it into pieces of <code>1480</code> bytes and <code>320</code> bytes, and then these fragments are sent to another router that needs to fragment them again into even smaller pieces, how would you number them?</p>
<p>With byte offsets, the solution is straightforward – if the first fragment has an offset of <code>0</code> and the next one has an offset of <code>1480</code>, then if we need to split them into maximum <code>800</code>-byte fragments, we'd have:</p>
<ul>
<li><p>First fragment: <code>800</code> bytes with offset <code>0</code></p>
</li>
<li><p>Second fragment: <code>680</code> bytes with offset <code>800</code></p>
</li>
<li><p>Third fragment: <code>320</code> bytes with offset <code>1480</code></p>
</li>
</ul>
<h3 id="heading-more-fragments-and-dont-fragment-flags">More Fragments and Don't Fragment Flags</h3>
<p>When Machine B receives a fragment, it needs to know whether this is an entire packet by itself or if it should expect additional fragments. For this purpose, each IP fragment carries a More Fragments (<code>MF</code>) bit that is set to <code>1</code> for every fragment that is not the last fragment of the packet. For the last fragment, it's set to <code>0</code>.</p>
<p>In case the packet consists of a single fragment – the <code>MF</code> bit will be set to <code>0</code>, and the offset field will also hold the value <code>0</code> (that is, 13 bits of <code>0</code>s).</p>
<p>Another bit related to fragmentation is the Don't Fragment (<code>DF</code>) bit. When this flag is turned on, intermediate devices should not fragment the original packet, even if it exceeds the MTU. Instead, they should drop it and typically send an ICMP "Fragmentation Needed" message back to the source.</p>
<p>In our example, if Machine A sets the Don't Fragment bit to <code>1</code>, the router would drop the packet, and notify Machine A about it.</p>
<p>Note that right after the identification field and before the <code>DF</code> flag, there is a reserved bit set to <code>0</code>. This bit was reserved in case it is needed in the future, for a reason unknown to the original authors of IPv4.</p>
<h3 id="heading-fragmentation-example">Fragmentation Example</h3>
<p>Consider again our example above – with Machine A residing in a network where the MTU is <code>2000</code>, and Machine B residing in a network where the MTU is <code>1500</code>. Machine A sends a packet which is <code>1800</code> bytes long.</p>
<p>❓Can you fill the values in these tables?</p>
<p><strong>First Fragment:</strong></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Total Length</td><td></td></tr>
</thead>
<tbody>
<tr>
<td>Identification</td><td></td></tr>
<tr>
<td>Don’t Fragment</td><td></td></tr>
<tr>
<td>More Fragments</td><td></td></tr>
<tr>
<td>Offset</td></tr>
</tbody>
</table>
</div><p><strong>Second Fragment:</strong></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Total Length</td><td></td></tr>
</thead>
<tbody>
<tr>
<td>Identification</td><td></td></tr>
<tr>
<td>Don’t Fragment</td><td></td></tr>
<tr>
<td>More Fragments</td><td></td></tr>
<tr>
<td>Offset</td></tr>
</tbody>
</table>
</div><p>For our example above, the values of the relevant fragmentation fields in IP would be as follows:</p>
<p><strong>First Fragment:</strong></p>
<ul>
<li><p>Total Length: <code>1500</code> (including <code>20</code> bytes of IP header, so <code>1480</code> bytes of payload)</p>
</li>
<li><p>Identification: <code>1337</code> (arbitrary value)</p>
</li>
<li><p>Don't Fragment bit: <code>0</code> (off, to allow further fragmentation if needed)</p>
</li>
<li><p>More Fragments bit: <code>1</code> (on, as this is not the last fragment)</p>
</li>
<li><p>Offset: <code>0</code> (it's the first fragment)</p>
</li>
</ul>
<p><strong>Second Fragment:</strong></p>
<ul>
<li><p>Total Length: <code>340</code> (including <code>20</code> bytes of IP header, so <code>320</code> bytes of payload – together with the first fragment, we get to <code>1800</code> bytes of payload)</p>
</li>
<li><p>Identification: <code>1337</code> (same as first fragment, indicating they belong together)</p>
</li>
<li><p>Don't Fragment bit: <code>0</code> (off, to allow further fragmentation if needed)</p>
</li>
<li><p>More Fragments bit: <code>0</code> (off, as this is the last fragment)</p>
</li>
<li><p>Offset: <code>185</code> (1480/8 = 185, or <code>0xB9</code> in hexadecimal)</p>
</li>
</ul>
<h3 id="heading-ipv4-fragmentation-summary">IPv4 Fragmentation – Summary</h3>
<p>You've now learned about the final part of the IPv4 Header: fragmentation. Fragmentation is necessary to allow packets to travel across networks with different MTUs. The IPv4 header includes several fields specifically designed to support fragmentation:</p>
<ul>
<li><p>Identification (16 bits): Identifies which fragments belong together</p>
</li>
<li><p>Flags (3 bits): Including the "More Fragments" and "Don't Fragment" flags</p>
</li>
<li><p>Fragment Offset (13 bits): Indicates where in the original packet this fragment belongs</p>
</li>
</ul>
<p>With this knowledge, you now understand every bit and byte of the IPv4 header and how IP packets can traverse networks with different characteristics.</p>
<h2 id="heading-summary-ipv4">Summary – IPv4</h2>
<p>In this comprehensive guide to IPv4, you've learned about the fundamental building blocks of Internet communications. Let's recap the key concepts we covered:</p>
<h3 id="heading-addressing-and-network-structure">Addressing and Network Structure</h3>
<ul>
<li><p>IPv4 addresses are 32-bit numbers typically written in dotted decimal notation</p>
</li>
<li><p>Networks can be identified using various methods:</p>
<ul>
<li><p>Fixed-length approach (historically)</p>
</li>
<li><p>Classful addressing (A, B, C, D, E classes)</p>
</li>
<li><p>CIDR (modern approach allowing flexible network sizes)</p>
</li>
</ul>
</li>
<li><p>Special addresses serve specific purposes:</p>
<ul>
<li><p><code>0.0.0.0</code> for "this host"</p>
</li>
<li><p><code>127.0.0.0/8</code> for loopback</p>
</li>
<li><p><code>255.255.255.255</code> for broadcast</p>
</li>
</ul>
</li>
</ul>
<h3 id="heading-ipv4-header-structure">IPv4 Header Structure</h3>
<ul>
<li><p>The header contains crucial fields for packet routing and processing:</p>
<ul>
<li><p>Version and IHL for header interpretation</p>
</li>
<li><p>Type of Service for traffic prioritization</p>
</li>
<li><p>Total Length for packet size</p>
</li>
<li><p>Various fields for fragmentation control</p>
</li>
<li><p>TTL to prevent infinite routing loops</p>
</li>
<li><p>Protocol to identify the encapsulated protocol</p>
</li>
<li><p>Checksum for error detection</p>
</li>
<li><p>Source and destination addresses</p>
</li>
</ul>
</li>
</ul>
<h3 id="heading-fragmentation">Fragmentation</h3>
<ul>
<li><p>Allows IPv4 packets to traverse networks with different MTUs</p>
</li>
<li><p>Uses three key fields:</p>
<ul>
<li><p>Identification to group fragments</p>
</li>
<li><p>Flags to control fragmentation</p>
</li>
<li><p>Fragment Offset to reassemble packets</p>
</li>
</ul>
</li>
</ul>
<h3 id="heading-final-words">Final Words</h3>
<p>While IPv4 has limitations, particularly its address space constraints, its elegant design and robust features have allowed it to remain the backbone of the Internet for over four decades. Understanding IPv4 provides essential context for working with modern networks and helps in transitioning to newer protocols like IPv6.</p>
<h2 id="heading-about-the-author"><strong>About the Author</strong></h2>
<p><a target="_blank" href="https://www.linkedin.com/in/omer-rosenbaum-034a08b9/">Omer Rosenbaum</a> is <a target="_blank" href="https://swimm.io/">Swimm</a>’s Chief Technology Officer. He's the author of the Brief <a target="_blank" href="https://youtube.com/@BriefVid">YouTube Channel</a>. He's also a cyber training expert and founder of Checkpoint Security Academy. He's the author of <a target="_blank" href="https://www.freecodecamp.org/news/gitting-things-done-book/">Gitting Things Done</a> (in English) and <a target="_blank" href="https://data.cyber.org.il/networks/networks.pdf">Computer Networks (in Hebrew)</a>. You can find him on <a target="_blank" href="https://twitter.com/Omer_Ros">Twitter</a>.</p>
<h3 id="heading-additional-references"><strong>Additional References</strong></h3>
<ul>
<li><a target="_blank" href="https://www.youtube.com/playlist?list=PL9lx0DXCC4BMS7dB7vsrKI5wzFyVIk2Kg">Computer Networks Playlist - on my Brief channel</a></li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Gitting Things Done – A Visual and Practical Guide to Git [Full Book] ]]>
                </title>
                <description>
                    <![CDATA[ Introduction Git is awesome. Most software developers use Git on a daily basis. But how many truly understand Git? Do you feel like you know what's going on under the hood as you use Git to perform various tasks? For example, what happens when you us... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/gitting-things-done-book/</link>
                <guid isPermaLink="false">66c17c2bea5637f064224a06</guid>
                
                    <category>
                        <![CDATA[ book ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Git ]]>
                    </category>
                
                    <category>
                        <![CDATA[ GitHub ]]>
                    </category>
                
                    <category>
                        <![CDATA[ version control ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Omer Rosenbaum ]]>
                </dc:creator>
                <pubDate>Mon, 08 Jan 2024 17:12:21 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/12/Gitting-Things-Done-Cover-with-Photo.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <h2 id="heading-introduction">Introduction</h2>
<p>Git is awesome.</p>
<p>Most software developers use Git on a daily basis. But how many truly understand Git? Do <em>you</em> feel like you know what's going on under the hood as you use Git to perform various tasks?</p>
<p>For example, what happens when you use <code>git commit</code>? What is stored between commits? Is it just a diff between the current and previous commit? If so, how is the diff encoded? Or is an entire snapshot of the repository stored each time?</p>
<p>Most people who use Git don't know the answers to the questions posed above. But does it really matter? Do you really have to know all of those things?</p>
<p>I'd argue that it does matter. As professionals, we should strive to understand the tools we use, especially if we use them all the time, like Git.</p>
<p>Even more acutely, I've found that understanding how Git actually works is <strong>useful</strong> in many scenarios — whether resolving merge conflicts, looking to conduct an interesting rebase, or even just when something goes slightly wrong. </p>
<p>So many times have I received questions about Git from experienced, highly skilled software engineers. I have seen wonderful developers react in fear when something happened in their commit history, and they just didn't know what to do. It doesn't have to be this way.</p>
<p>By reading this book, you will gain a new perspective of Git. You will feel <strong>confident</strong> when working with Git, and you will <strong>understand</strong> Git's underlying mechanisms, at least those that are useful to understand. You will <em>Git</em> it. You will be <em>Gitting things done</em>.</p>
<h1 id="heading-table-of-contents">Table of Contents</h1>
<ul>
<li><a class="post-section-overview" href="#heading-introduction">Introduction</a></li>
<li><a class="post-section-overview" href="#heading-part-1-main-objects-and-introducing-changes">Part 1 - Main Objects and Introducing Changes</a><ul>
<li><a class="post-section-overview" href="#heading-chapter-1-git-objects">Chapter 1 - Git Objects</a></li>
<li><a class="post-section-overview" href="#heading-chapter-2-branches-in-git">Chapter 2 - Branches in Git</a></li>
<li><a class="post-section-overview" href="#heading-chapter-3-how-to-record-changes-in-git">Chapter 3 - How to Record Changes in Git</a></li>
<li><a class="post-section-overview" href="#heading-chapter-4-how-to-create-a-repo-from-scratch">Chapter 4 - How to Create a Repo From Scratch</a></li>
<li><a class="post-section-overview" href="#heading-chapter-5-how-to-work-with-branches-in-git-under-the-hood">Chapter 5 - How to Work with Branches in Git — Under the Hood</a></li>
</ul>
</li>
<li><a class="post-section-overview" href="#heading-part-2-branching-and-integrating-changes">Part 2 - Branching and Integrating Changes</a><ul>
<li><a class="post-section-overview" href="#heading-chapter-6-diffs-and-patches">Chapter 6 - Diffs and Patches</a></li>
<li><a class="post-section-overview" href="#heading-chapter-7-understanding-git-merge">Chapter 7 - Understanding Git Merge</a></li>
<li><a class="post-section-overview" href="#heading-chapter-8-understanding-git-rebase">Chapter 8 - Understanding Git Rebase</a></li>
</ul>
</li>
<li><a class="post-section-overview" href="#heading-part-3-undoing-changes">Part 3 - Undoing Changes</a><ul>
<li><a class="post-section-overview" href="#heading-chapter-9-git-reset">Chapter 9 - Git Reset</a></li>
<li><a class="post-section-overview" href="#heading-chapter-10-additional-tools-for-undoing-changes">Chapter 10 - Additional Tools for Undoing Changes</a></li>
<li><a class="post-section-overview" href="#heading-chapter-11-exercises">Chapter 11 - Exercises</a></li>
</ul>
</li>
<li><a class="post-section-overview" href="#heading-part-4-amazing-and-useful-git-tools">Part 4 - Amazing and Useful Git Tools</a><ul>
<li><a class="post-section-overview" href="#heading-chapter-12-git-log">Chapter 12 - Git Log</a></li>
<li><a class="post-section-overview" href="#heading-chapter-13-git-bisect">Chapter 13 - Git Bisect</a></li>
<li><a class="post-section-overview" href="#heading-chapter-14-other-useful-commands">Chapter 14 - Other Useful Commands</a></li>
</ul>
</li>
<li><a class="post-section-overview" href="#heading-summary">Summary</a></li>
<li><a class="post-section-overview" href="#heading-appendixes">Appendixes</a></li>
</ul>
<h2 id="heading-who-is-this-book-for">Who Is This Book For?</h2>
<p>Any software developer who wants to deepen their knowledge about Git.</p>
<p>If you are experienced with Git - I am sure you will be able to deepen your knowledge. Even if you are new to Git - I will start with an overview of the mechanisms of Git, and the terms used throughout this book.</p>
<p>This book is for you. I wrote it so you can learn more about Git, and also come to appreciate, or even love Git.</p>
<p>You will also notice that I use a casual style throughout the book. I believe that learning Git should be insightful and fun. Learning new things is always hard, and I felt like writing in a less casual style wouldn't really make a good service. And as I already mentioned - this book is for you.</p>
<h2 id="heading-who-am-i">Who Am I?</h2>
<p>This book is about you, and your journey with Git. But I would like to tell you a bit about why I think I can contribute to your journey.</p>
<p>I am the CTO and one of the co-founders of <a target="_blank" href="https://swimm.io">Swimm.io</a>, a knowledge management tool for code. Part of what we do is linking parts from code in Git repositories to parts of the documentation, and then tracking changes in the repository to update the documentation if needed. </p>
<p>At Swimm, I got to dissect parts of Git, understand its underlying mechanisms and also gain intuition about why Git is implemented the way it is.</p>
<p>Before founding Swimm I practiced teaching in many different environments - among them, managing the Cyber track of Israel Tech Challenge, founding Check Point Security Academy, and writing a full text book.</p>
<p>This book is my attempt to make the most of both worlds - my teaching experience as well as my in-depth hands-on experience with Git, and give you the best learning experience I can.</p>
<h2 id="heading-the-approach-of-this-book">The Approach of This Book</h2>
<p>This is definitely not the first book about Git. When sitting down to write it, I had three principles in mind.</p>
<ol>
<li><strong>Practical</strong> - in this book, you will learn how to accomplish things in Git. How to introduce changes, how to undo them, and how to fix things when they go wrong. You will understand how Git works not just for the sake of understanding, but with a practical mindset. I sometimes refer to this as the "practicality principle" - which guides me in deciding whether to include certain topics, and to what extent.</li>
<li><strong>In depth</strong> - you will dive deep into Git's way of operating, to understand its mechanisms. You will build your understanding gradually, and always link your knowledge to real scenarios you might face in your work. In order to achieve an in-depth understanding, I almost always prefer the command line over graphical interfaces, so you can really see what commands are running.</li>
<li><strong>Visual</strong> - as I strive to provide you with intuition, the chapters will be accompanied by visual aids.</li>
</ol>
<h2 id="heading-why-is-this-book-publicly-available">Why Is This Book Publicly Available?</h2>
<p>I think everyone should have access to high quality content about Git, and I'd like this book to get to as many people as possible.</p>
<p>If you would like to support this book, you are welcome to buy the <a target="_blank" href="https://www.amazon.com/dp/B0CQXTJ5V5">Paperback version</a>, an <a target="_blank" href="https://www.buymeacoffee.com/omerr/e/197232">E-Book version</a>, or <a target="_blank" href="https://www.buymeacoffee.com/omerr">buy me a coffee</a>. Thank you!</p>
<h2 id="heading-accompanying-videos">Accompanying Videos</h2>
<p>I have covered many topics from this book on my YouTube channel - Brief (<a target="_blank" href="https://www.youtube.com/@BriefVid">https://www.youtube.com/@BriefVid</a>). You are welcome to check them out as well.</p>
<h2 id="heading-get-your-hands-dirty">Get Your Hands Dirty</h2>
<p>Throughout this book, I will mostly use the second person singular - and directly write to <em>you</em>. I will ask <em>you</em> to get your hands dirty, run the commands yourself, so you actually get to <em>feel</em> what it's like to use do things with Git, not just read about it.</p>
<h2 id="heading-gits-feelings">Git's Feelings</h2>
<p>Throughout the book, I sometimes refer to Git with words such as "believes", "thinks", or "wants". As you may argue, Git is not a human, and it doesn't have feelings or beliefs. Well, that's true, but in order for us to enjoy playing around with Git, and to help you enjoy reading (and me writing) this book, I feel like referring to Git as more than just code makes it all so much more enjoyable.</p>
<h2 id="heading-my-setup">My Setup</h2>
<p>I will include screenshots. There's no need for your setup to match mine, but if you're curious about my setup, then:</p>
<ul>
<li>I am using Ubuntu 20.04 (WSL).</li>
<li>For my terminal, I use <a target="_blank" href="https://ohmyz.sh/">Oh My Zsh</a></li>
<li>I also use plugins for Oh My Zsh, you can <a target="_blank" href="https://www.freecodecamp.org/news/jazz-up-your-zsh-terminal-in-seven-steps-a-visual-guide-e81a8fd59a38/">follow this tutorial on freeCodeCamp</a>.</li>
<li><a target="_blank" href="https://github.com/mlange-42/git-graph">git-graph (my alias is <code>gg</code>)</a></li>
</ul>
<h2 id="heading-feedback-is-welcome">Feedback Is Welcome</h2>
<p>This book has been created to help you and people like you learn, understand Git, and apply that knowledge in real life. </p>
<p>Right from the beginning, I asked for feedback and was lucky to receive it from great people (see <a class="post-section-overview" href="#heading-acknowledgements">Acknowledgments</a>) to make sure the book achieves these goals. If you liked something about this book, felt that something was missing, or that something needed improvement - I would love to hear from you. Please reach out at <a target="_blank" href="mailto:gitting.things@gmail.com">gitting.things@gmail.com</a>.</p>
<h2 id="heading-note">Note</h2>
<p>This book is provided for free on freeCodeCamp as described above and according to <a target="_blank" href="https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International</a>.</p>
<p>If you would like to support this book, you are welcome to buy the <a target="_blank" href="https://www.amazon.com/dp/B0CQXTJ5V5">Paperback version</a>, an <a target="_blank" href="https://www.buymeacoffee.com/omerr/e/197232">E-Book version</a>, or <a target="_blank" href="https://www.buymeacoffee.com/omerr">buy me a coffee</a>. Thank you!</p>
<h1 id="heading-part-1-main-objects-and-introducing-changes">Part 1 - Main Objects and Introducing Changes</h1>
<h2 id="heading-chapter-1-git-objects">Chapter 1 - Git Objects</h2>
<p>It's time to start your journey into the depths of Git. In this chapter - starting with the basics - you will learn about the most important Git objects, and adopt a way of thinking about Git. Let's get to it!</p>
<h3 id="heading-git-as-a-system-for-maintaining-a-file-system">Git as a System for Maintaining a File System</h3>
<p>While there are different ways to use Git, I'll adopt here the way I've found to be the most clear and useful: Viewing Git as a system maintaining a file system, and specifically  -  snapshots of that file system over time.</p>
<p>A file system begins with a root directory (in UNIX-based systems, <code>/</code>), which usually contains other directories (for example, <code>/usr</code> or <code>/bin</code>). These directories contain other directories, and/or files (for example, <code>/usr/1.txt</code>). On a Windows machine, a root directory of a drive would be <code>C:\</code>, and a subdirectory could be <code>C:\users</code>. I will adopt the convention of UNIX-based systems throughout this book.</p>
<h3 id="heading-blobs">Blobs</h3>
<p>In Git, the contents of files are stored in objects called <strong>blob</strong>s, short for binary large objects.</p>
<p>The difference between blobs and files is that files also contain meta-data. For example, a file "remembers" when it was created, so if you move that file from one directory into another directory, its creation time remains the same.</p>
<p>Blobs, in contrast, are just binary streams of data, like a file's contents. A blob does not register its creation date, its name, or anything other than its contents.</p>
<p>Every blob in Git is identified by its <a target="_blank" href="https://en.wikipedia.org/wiki/SHA-1">SHA-1 hash</a>. SHA-1 hashes consist of 20 bytes, usually represented by 40 characters in hexadecimal form. Throughout this book I will sometimes show just the first characters of that hash. As hashes, and specifically SHA-1 hashes are so ubiquitous within Git, it is important you understand the basic characteristics of hashes.</p>
<h3 id="heading-hashes">Hashes</h3>
<p>A hash is a deterministic, one-way mathematical function.</p>
<p><em>Deterministic</em> means that the same input will provide the same output. That is - you take a stream of data, run a hash function on that stream, and you get a result. </p>
<p>For example, if you provide the SHA-1 hash function with the stream <code>hello</code>, you will get <code>0xaaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d</code>. If you run the SHA-1 hash function again, from a different machine, and provide it the same data (<code>hello</code>), you will get the same value.</p>
<p>Git uses SHA-1 as its hash function in order to identify objects. It relies on it being deterministic, such that an object will always have the same identifier.</p>
<p>A <em>one-way</em> function is a function that is hard to invert given an output. That is,  it is impossible (or at least, very hard) to tell, given the result of the hash function (for example <code>0xaaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d</code>), what input yielded that result (in this example, <code>hello</code>).</p>
<h3 id="heading-back-to-git">Back to Git</h3>
<p>Back to Git - Blobs, just like other Git objects, have SHA-1 hashes associated with them.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/blob_sha.png" alt="Blobs have corresponding SHA-1 values" width="600" height="400" loading="lazy">
<em>Blobs have corresponding SHA-1 values</em></p>
<p>As I said in the beginning, Git can be viewed as a system to maintain a file system. File systems consist of files and directories. A blob is the Git object representing the contents of a file.</p>
<h3 id="heading-trees">Trees</h3>
<p>In Git, the equivalent of a directory is a <strong>tree</strong>. A tree is basically a directory listing, referring to blobs, as well as other trees.</p>
<p>Trees are identified by their SHA-1 hashes as well. Referring to these objects, either blobs or other trees, happens via the SHA-1 hash of the objects.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/tree_objs.png" alt="A tree is a directory listing" width="600" height="400" loading="lazy">
<em>A tree is a directory listing</em></p>
<p>Consider the drawing above. Note that the tree <code>CAFE7</code> refers to the blob <code>F92A0</code> as the file <code>pic.png</code>. In another tree, that same blob may have another name - but as long as the contents are the same, it will still be the same blob object, and still have the same SHA-1 value.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/tree_sub_trees.png" alt="A tree may contain sub-trees, as well as blobs" width="600" height="400" loading="lazy">
<em>A tree may contain sub-trees, as well as blobs</em></p>
<p>The diagram above is equivalent to a file system with a root directory that has one file at <code>/test.js</code>, and a directory named <code>/docs</code> consisting of two files: <code>/docs/pic.png</code>, and <code>/docs/1.txt</code>.</p>
<h3 id="heading-commits">Commits</h3>
<p>Now it's time to take a snapshot of that file system — and store all the files that existed at that time, along with their contents.</p>
<p>In Git, a snapshot is a <strong>commit</strong>. A commit object includes a pointer to the main tree (the root directory of the file system), as well as other meta-data such as the committer (the user who authored the commit), a commit message, and the commit time.</p>
<p>In most cases, a commit also has one or more parent commits — the previous snapshot (or snapshots). Of course, commit objects are also identified by their SHA-1 hashes. These are the hashes you are probably used to seeing when you use commands such as <code>git log</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit.png" alt="A commit is a snapshot in time. It refers to the root tree. As this is the first commit, it has no parents" width="600" height="400" loading="lazy">
<em>A commit is a snapshot in time. It refers to the root tree. As this is the first commit, it has no parents</em></p>
<p>Every commit holds the entire snapshot, not just differences between itself and its parent commit or commits.</p>
<p>How can that work? Doesn't that mean that Git has to store a lot of data for every commit?</p>
<p>Examine what happens if you change the contents of a file. Say that you edit the file <code>1.txt</code>, and add an exclamation mark — that is, you changed the content from <code>HELLO WORLD</code>, to <code>HELLO WORLD!</code>.</p>
<p>Well, this change means that Git creates a new blob object, with a new SHA-1 hash. This makes sense, as <code>sha1("HELLO WORLD")</code> is different from <code>sha1("HELLO WORLD!")</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/new_blob_new_sha.png" alt="Changing the blob results in a new SHA-1" width="600" height="400" loading="lazy">
<em>Changing the blob results in a new SHA-1</em></p>
<p>Since you have a new hash, then the tree's listing should also change. After all, your tree no longer points to blob <code>73D8A</code>, but rather blob <code>62E7A</code> instead. Since you change the tree's contents, you also change its hash.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/new_tree_new_hash.png" alt="The tree that points to the changed blob needs to change as well" width="600" height="400" loading="lazy">
<em>The tree that points to the changed blob needs to change as well</em></p>
<p>And now, since the hash of that tree is different, you also need to change the parent tree — as the latter no longer points to tree <code>CAFE7</code>, but rather to tree <code>24601</code>. Consequently, the parent tree will also have a new hash.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/new_root_tree.png" alt="The root tree also changes, and so does its hash" width="600" height="400" loading="lazy">
<em>The root tree also changes, and so does its hash</em></p>
<p>Almost ready to create a new commit object, and it seems like you are going to store a lot of data — the entire file system, once more! But is that really necessary?</p>
<p>Actually, some objects, specifically blob objects, haven't changed since the previous commit — the blob <code>F92A0</code> remained intact, and so did the blob <code>F00D1</code>.</p>
<p>So this is the trick — as long as an object doesn't change, Git doesn't store it again. In this case, Git doesn't need to store blob <code>F92A0</code> or blob <code>F00D1</code> once more. Git can refer to them using only their hash values. You can then create your commit object.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/new_commit.png" alt="Blobs that remained intact are referenced by their hash values" width="600" height="400" loading="lazy">
<em>Blobs that remained intact are referenced by their hash values</em></p>
<p>Since this commit is not the first commit, it also has a parent commit — commit <code>A1337</code>.</p>
<h3 id="heading-considering-hashes">Considering Hashes</h3>
<p>After introducing blobs, trees, and commits - consider the hashes of these objects. Assume I wrote the string <code>Git is awesome!</code>, and created a blob object from it. You did the same on your system. Would we have the same hash?</p>
<p>The answer is — Yes. Since the blobs consist of the same data, they'll have the same SHA-1 values.</p>
<p>What if I made a tree that references the blob of <code>Git is awesome!</code>, and gave it a specific name and metadata, and you did exactly the same on your system. Would we have the same hash?</p>
<p>Again, yes. Since the tree objects are the same, they would have the same hash.</p>
<p>What if I created a commit pointing to that tree with the commit message <code>Hello</code>, and you did the same on your system? Would we have the same hash?</p>
<p>In this case, the answer is — No. Even though our commit objects refer to the same tree, they have different commit details — time, committer, and so on.</p>
<h3 id="heading-how-are-objects-stored">How Are Objects Stored?</h3>
<p>You now understand the purpose of blobs, trees, and commits. In the next chapters, you will also create these objects yourself. Despite being interesting, understanding how these objects are actually encoded and stored is not vital to your understanding, and for gitting things done.</p>
<h4 id="heading-short-recap-git-objects">Short Recap - Git Objects</h4>
<p>To recap, in this section we introduced three Git objects:</p>
<ul>
<li><strong>Blob</strong> — contents of a file.</li>
<li><strong>Tree</strong> — a directory listing (of blobs and trees).</li>
<li><strong>Commit</strong> — a snapshot of the working tree.</li>
</ul>
<p>In the next chapter, we will understand branches in Git.</p>
<h2 id="heading-chapter-2-branches-in-git">Chapter 2 - Branches in Git</h2>
<p>In the previous chapter, I suggested that we should view Git as a system for maintaining a file system.</p>
<p>One of the wonders of Git is that it enables multiple people to work on that file system, in parallel, (mostly) without interfering with each other's work. Most people would say that they are "working on branch <code>X</code>." But what does that <em>actually</em> mean?</p>
<p><strong>A branch is just a named reference to a commit.</strong></p>
<p>You can always reference a commit by its SHA-1 hash, but humans usually prefer other ways to name objects. A branch is one way to reference a commit, but it's really just that.</p>
<p>In most repositories, the main line of development is done in a branch called <code>main</code>. This is just a name, and it's created when you use <code>git init</code>, making it widely used. However, you could use any other name you'd like.</p>
<p>Typically, the branch points to the latest commit in the line of development you are currently working on.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/branch_01.png" alt="A branch is just a named reference to a commit" width="600" height="400" loading="lazy">
<em>A branch is just a named reference to a commit</em></p>
<p>To create another branch, you can use the <code>git branch</code> command. When you do that, Git creates another pointer. If you created a branch called <code>test</code>, by using <code>git branch test</code>, you would be creating another pointer that points to the same commit as the branch you are on:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_branch.png" alt="Using  creates another pointer" width="600" height="400" loading="lazy">
<em>Using <code>git branch</code> creates another pointer</em></p>
<p>How does Git know which branch you're currently on? It keeps another designated pointer, called <code>HEAD</code>. Usually, <code>HEAD</code> points to a branch, which in turns points to a commit. In the case described, <code>HEAD</code> might point to <code>main</code>, which in turn points to commit <code>B2424</code>. In some cases, <code>HEAD</code> can also point to a commit directly.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/head_main.png" alt=" points to the branch you are currently on" width="600" height="400" loading="lazy">
<em><code>HEAD</code> points to the branch you are currently on</em></p>
<p>To switch the active branch to be <code>test</code>, you can use the command <code>git checkout test</code>, or <code>git switch test</code>. Now you can already guess what this command actually does — it just changes <code>HEAD</code> to point to <code>test</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/head_test.png" alt=" changes where  points" width="600" height="400" loading="lazy">
<em><code>git checkout test</code> changes where <code>HEAD</code> points</em></p>
<p>You could also use <code>git checkout -b test</code> before creating the <code>test</code> branch, which is the equivalent of running <code>git branch test</code> to create the branch, and then <code>git checkout test</code> to move <code>HEAD</code> to point to the new branch.</p>
<p>At the point represented in the drawing above, what would happen if you made some changes and created a new commit using <code>git commit</code>? Which branch will the new commit be added to?</p>
<p>The answer is the <code>test</code> branch, as this is the active branch (since <code>HEAD</code> points to it). Afterwards, the <code>test</code> pointer will move to the newly added commit. Note that <code>HEAD</code> still points to <code>test</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/test_commit-1.png" alt="Every time we use , the branch pointer moves to the newly created commit" width="600" height="400" loading="lazy">
<em>Every time we use <code>git commit</code>, the branch pointer moves to the newly created commit</em></p>
<p>If you go back to <code>main</code> by using <code>git checkout main</code>, Git will move <code>HEAD</code> to point to <code>main</code> again.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/back_to_main-1.png" alt="The resulting state after using " width="600" height="400" loading="lazy">
<em>The resulting state after using <code>git checkout main</code></em></p>
<p>Now, if you create another commit, which branch will it be added to?</p>
<p>That's right, it will be added to the <code>main</code> branch (and its parent would be commit <code>B2424</code>).</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit_to_main-1.png" alt="The resulting state after creating another commit on the  branch" width="600" height="400" loading="lazy">
<em>The resulting state after creating another commit on the <code>main</code> branch</em></p>
<h3 id="heading-short-recap-branches">Short Recap - Branches</h3>
<ul>
<li>A branch is a named reference to a commit.</li>
<li>When you use <code>git commit</code>, Git creates a commit object, and moves the branch to point to the newly created commit.</li>
<li><code>HEAD</code> is a special pointer telling Git which branch is the active branch (in rare cases, it can point directly to a commit).</li>
</ul>
<p>In the next chapters, you will learn how to introduce changes to Git. You will create a repository from scratch — without using <code>git init</code>, <code>git add</code>, or <code>git commit</code>. This will allow you to deepen your understanding of what is happening under the hood when you work with Git. You will also create new branches, switch branches, and create additional commits — all without using <code>git branch</code> or <code>git checkout</code>. I don't know about you, but I am excited already!</p>
<h2 id="heading-chapter-3-how-to-record-changes-in-git">Chapter 3 - How to Record Changes in Git</h2>
<p>So far, we've learned about four different entities in Git:</p>
<ol>
<li><strong>Blob</strong> — contents of a file.</li>
<li><strong>Tree</strong> — a directory listing (of blobs and trees).</li>
<li><strong>Commit</strong> — a snapshot of the working tree, with some meta-data such as the time or the commit message.</li>
<li><strong>Branch</strong> — a named reference to a commit.</li>
</ol>
<p>The first three are <em>objects</em>, whereas the fourth is one way to refer to objects (specifically, commits).</p>
<p>Now, it's time to understand how to introduce changes in Git.</p>
<p>When you work on your source code, you work from a <strong>working dir</strong>. A working dir(ectory) (also called "working tree") is any directory on your file system which has a repository associated with it. It contains the folders and files of your project, and also a directory called <code>.git</code> that we will talk more about later. Remember that we said that Git is a system to maintain a file system. The working directory is the root of the file system for Git.</p>
<p>After you make some changes, you might want to record them in your repository. A <strong>repository</strong> (in short: "repo") is a collection of commits, each of which is an archive of what the project's working tree looked like at a past date, whether on your machine or someone else's. That is, as I said before, a commit is a snapshot of the working tree.</p>
<p>A repository also includes things other than your code files, such as <code>HEAD</code> and <code>branches</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/working_dir_repo.png" alt="A working dir alongside the repository" width="600" height="400" loading="lazy">
<em>A working dir alongside the repository</em></p>
<p>Note regarding the drawing conventions I use: I include <code>.git</code> within the working directory, to remind you that it is a folder within the project's folder on the filesystem. The <code>.git</code> folder actually contains the objects of the repository, as we will see in <a class="post-section-overview" href="#heading-chapter-4-how-to-create-a-repo-from-scratch">chapter 4</a>.</p>
<p>There are other version control systems where changes are committed directly from the working dir to the repository. In Git, this is not the case. Instead, changes are first registered in something called the <strong>index</strong>, or the <strong>staging area</strong>.</p>
<p>Both of these terms refer to the same thing, and they are used often in Git's documentation. I will use these terms interchangeably throughout this book, as you should feel comfortable with both of them.</p>
<p>You can think of adding changes to the index as a way of "confirming" your changes, one by one, before creating a commit (which records all your approved changes at once).</p>
<p>When you <code>checkout</code> a branch, Git populates the index and the working dir with the contents of the files as they exist in the commit that branch is pointing to. When you use <code>git commit</code>, Git creates a new commit object based on the state of the index.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/working_dir_index_repo.png" alt="The three &quot;states&quot; - working dir, index, and repository" width="600" height="400" loading="lazy">
<em>The three "states" - working dir, index, and repository</em></p>
<p>Using the index allows you to carefully prepare each commit. For example, you may have two files with changes in your working dir:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/working_dir_index_repo_02.png" alt="Working dir includes two files with changes" width="600" height="400" loading="lazy">
<em>Working dir includes two files with changes</em></p>
<p>For example, assume these two files are <code>1.txt</code> and <code>2.txt</code>. It is possible to only add one of them (for instance, <code>1.txt</code>) to the index, by using <code>git add 1.txt</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/working_dir_index_repo_03.png" alt="The state after staging " width="600" height="400" loading="lazy">
<em>The state after staging <code>1.txt</code></em></p>
<p>As a result, the state of the index matches the state of <code>HEAD</code> (in this case, "Commit 2"), with the exception of the file <code>1.txt</code>, which matches the state of <code>1.txt</code> in the working directory. Since you did not stage <code>2.txt</code>, the index does not include the updated version of <code>2.txt</code>. So the state of <code>2.txt</code> in the index matches the state of <code>2.txt</code> in "Commit 2".</p>
<p>Behind the scenes - once you stage a version of a file, Git creates a blob object with the file's contents. This blob object is then added to the index. As long as you only modify the file on the working directory, without staging it, the changes you make are not recorded in blob objects. </p>
<p>When considering the previous figure, note that I do not draw the staged version of the file as part of the "repository", as in this representation, the "repository" refers to a tree of commits and their references, and this blob has not been a part of any commit.</p>
<p>Now, you can use <code>git commit</code> to record the change to <code>1.txt</code> <em>only</em>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/working_dir_index_repo_04.png" alt="The state after using " width="600" height="400" loading="lazy">
<em>The state after using <code>git commit</code></em></p>
<p>Using <code>git commit</code> performs two main operations:</p>
<ol>
<li>It creates a new commit object. This commit object reflects the state of the index when you ran the <code>git commit</code> command.</li>
<li>Updates the active branch to point to the newly created commit. In this example, <code>main</code> now points to "Commit 3", the new commit object.</li>
</ol>
<h3 id="heading-how-to-create-a-repo-the-conventional-way">How to Create a Repo — The Conventional Way</h3>
<p>Let's make sure that you understand how the terms we've introduced relate to the process of creating a new repository. This is a quick high-level view, before diving much deeper into this process.</p>
<p>Initialize a new repository using <code>git init my_repo</code>, and then change your directory to that of the repository using <code>cd my_repo</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_init.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git init</code></em></p>
<p>By using <code>tree -f .git</code> you can see that running <code>git init my_repo</code> resulted in quite a few sub-directories inside <code>.git</code>. (The flag <code>-f</code> includes files in tree's output).</p>
<p>Note: if you're using Windows, run <code>tree /f .git</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_init_tree_f.png" alt="The output of  after using " width="600" height="400" loading="lazy">
<em>The output of <code>tree -f .git</code> after using <code>git init</code></em></p>
<p>Create a file inside the <code>my_repo</code> directory:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/create_f_txt.png" alt="Creating " width="600" height="400" loading="lazy">
<em>Creating <code>f.txt</code></em></p>
<p>This file is within your working directory. If you run <code>git status</code>, you'll see this file is untracked:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/create_f_txt_git_status.png" alt="The result of " width="600" height="400" loading="lazy">
<em>The result of <code>git status</code></em></p>
<p>Files in your working directory can be in one of two states: <strong>tracked</strong> or <strong>untracked</strong>.</p>
<p><strong>Tracked</strong> files are files that Git "knows" about. They either were in the last commit, or they are staged now (that is, they are in the staging area).</p>
<p><strong>Untracked</strong> files are everything else — any files in your working directory that were not in your last commit, and are not in your staging area.</p>
<p>The new file (<code>f.txt</code>) is currently untracked, as you haven't added it to the staging area, and it hasn't been included in a previous commit.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/drawing_new_untracked_file.png" alt=" is in the working directory (and untracked)" width="600" height="400" loading="lazy">
<em><code>f.txt</code> is in the working directory (and untracked)</em></p>
<p>You can now add this file to the staging area (also referred to as staging this file) by using <code>git add f.txt</code>. You can verify that it has been staged by running <code>git status</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_add_status.png" alt="Adding the new file to the staging area" width="600" height="400" loading="lazy">
<em>Adding the new file to the staging area</em></p>
<p>So now the state of the index matches that of the working dir:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/drawing_new_staged_file.png" alt="The state after adding the new file" width="600" height="400" loading="lazy">
<em>The state after adding the new file</em></p>
<p>You can now create a commit using <code>git commit</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/initial_commit.png" alt="Committing an initial commit" width="600" height="400" loading="lazy">
<em>Committing an initial commit</em></p>
<p>If you run <code>git status</code> again, you'll see that the status is clean - that is, the state of <code>HEAD</code> (which points to your initial commit) equals the state of the index, and also the state of the working dir. By using <code>git log</code> you will see indeed that <code>HEAD</code> points to <code>main</code> which in turn points to your new commit:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/initial_commit_git_log.png" alt="The output of  after introducing the first commit" width="600" height="400" loading="lazy">
<em>The output of <code>git log</code> after introducing the first commit</em></p>
<p>Has something changed within the <code>.git</code> directory? Run <code>tree -f .git</code> to check:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/tree_f_after_initial_commit.png" alt="A lot of things have changed within " width="600" height="400" loading="lazy">
<em>A lot of things have changed within <code>.git</code></em></p>
<p>Apparently, quite a lot has changed. It's time to dive deeper into the structure of <code>.git</code> and understand what is going on under the hood when you run <code>git init</code>, <code>git add</code> or <code>git commit</code>. That's exactly what the next chapter will cover.</p>
<h3 id="heading-recap-how-to-record-changes-in-git">Recap - How to Record Changes in Git</h3>
<p>You learned about the three different "states" of the file system that Git maintains:</p>
<ul>
<li><strong>Working dir(ectory)</strong> (also called "working tree") - any directory on your file system which has a repository associated with it.</li>
<li><strong>Index</strong>, or the <strong>Staging Area</strong> - a playground for the next commit.</li>
<li><strong>Repository</strong> (in short: "repo") - a collection of commits, each of which is a snapshot of the working tree.</li>
</ul>
<p>When you introduce changes in Git, you almost always follow this order:</p>
<ol>
<li>You change the working directory first</li>
<li>Then you stage these changes (or some of them) to the index</li>
<li>And finally, you commit these changes - thereby updating the repository with a new commit. The state of this new commit matches the state of the index.</li>
</ol>
<p>Ready to dive deeper?</p>
<h2 id="heading-chapter-4-how-to-create-a-repo-from-scratch">Chapter 4 - How to Create a Repo From Scratch</h2>
<p>So far we've covered some Git fundamentals, and now you should be ready to really <em>Git</em> going (I can't seem to get enough of that pun).</p>
<p>In order to deeply understand how Git works, you will create a repository, but this time — you will build it from scratch. As in other chapters, I encourage you to try out the commands alongside this chapter.</p>
<h3 id="heading-how-to-set-up-git">How to Set Up <code>.git</code></h3>
<p>Create a new directory, and run <code>git status</code> within it:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/new_dir_git_status.png" alt=" in a new directory" width="600" height="400" loading="lazy">
<em><code>git status</code> in a new directory</em></p>
<p>Alright, so Git seems unhappy as you don't yet have a <code>.git</code> folder. The natural thing to do would be to create that directory and try again:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/mkdir_git_git_status.png" alt=" after creating " width="600" height="400" loading="lazy">
<em><code>git status</code> after creating <code>.git</code></em></p>
<p>Apparently, creating a <code>.git</code> directory is just not enough. You need to add some content to that directory.</p>
<p>A Git repository has two main components:</p>
<ul>
<li>A collection of <strong>objects</strong> — blobs, trees, and commits.</li>
<li>A system of <strong>naming</strong> those objects — called references.</li>
</ul>
<p>A repository may also contain other things, such as hooks, but at the very least — it must include objects and references.</p>
<p>Create a directory for the objects at <code>.git/objects</code>, and a directory for the references (in short: "refs") at <code>.git/refs</code> (on Windows systems — <code>.git\   objects</code> and <code>.git\refs</code>, respectively).</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/create_folders_git_tree.png" alt="Considering the directory tree" width="600" height="400" loading="lazy">
<em>Considering the directory tree</em></p>
<p>One type of reference is branches. Internally, Git calls branches by the name <code>heads</code>. Create a directory for branches — <code>.git/refs/heads</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/create_heads_folder_git_tree.png" alt="The directory tree" width="600" height="400" loading="lazy">
<em>The directory tree</em></p>
<p>This still doesn't change the result of <code>git status</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/create_heads_folder_git_status.png" alt=" after creating " width="600" height="400" loading="lazy">
<em><code>git status</code> after creating <code>.git/refs/heads</code></em></p>
<p>How does Git know where to start when looking for a commit in the repository? As I explained earlier, it looks for <code>HEAD</code>, which points to the current active branch (or commit, in some cases).</p>
<p>So, you need to create <code>HEAD</code>, which is just a file residing at <code>.git/HEAD</code>. You can apply the following:</p>
<p>On UNIX:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> <span class="hljs-string">"ref: refs/heads/main"</span> &gt; .git/HEAD
</code></pre>
<p>On Windows:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> ref: refs/heads/main &gt; .git\HEAD
</code></pre>
<p>So you now know how <code>HEAD</code> is implemented — it is simply a file, and its contents describe what it points to.</p>
<p>Following the command above, <code>git status</code> seems to change its mind:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/create_head_git_status.png" alt=" is just a file" width="600" height="400" loading="lazy">
<em><code>HEAD</code> is just a file</em></p>
<p>Notice that Git "believes" you are on a branch called <code>main</code>, even though you haven't created this branch. <code>main</code> is just a name. You can also make Git believe you are on a branch called <code>banana</code> if you wish:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/banana.png" alt="Creating a branch named " width="600" height="400" loading="lazy">
<em>Creating a branch named <code>banana</code></em></p>
<p>Switch back to <code>main</code>, as you will keep working from (mostly) there throughout this chapter, just to adhere to the regular convention:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> <span class="hljs-string">"ref: refs/heads/main"</span> &gt; .git/HEAD
</code></pre>
<p>Now that you have your <code>.git</code> directory ready, you can work your way to make a commit (again, without using <code>git add</code> or <code>git commit</code>).</p>
<h3 id="heading-plumbing-vs-porcelain-commands-in-git">Plumbing vs Porcelain Commands in Git</h3>
<p>At this point, it would be helpful to make a distinction between two types of Git commands: plumbing and porcelain. The application of the terms oddly comes from toilets, traditionally made of porcelain, and the infrastructure of plumbing (pipes and drains).</p>
<p>The porcelain layer provides a user-friendly interface to the plumbing. Most people only deal with the porcelain. Yet, when things go (terribly) wrong, and someone wants to understand why, they would have to roll up their sleeves and deal with the plumbing.</p>
<p>Git uses this terminology as an analogy to separate the low-level commands that users don't usually need to use directly ("plumbing" commands) from the more user-friendly high level commands ("porcelain" commands).</p>
<p>So far, you have dealt with porcelain commands — <code>git init</code>, <code>git add</code> or <code>git commit</code>. It's time to go deeper, and get yourself acquainted with some plumbing commands.</p>
<h3 id="heading-how-to-create-objects-in-git">How to Create Objects in Git</h3>
<p>Start by creating an object and writing it into the objects database of Git, residing within <code>.git/objects</code>. To know the SHA-1 hash value of a blob, you can <code>git hash-object</code> (yes, a plumbing command), in the following way:</p>
<p>On UNIX:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> <span class="hljs-string">"Git is awesome"</span> | git hash-object --stdin
</code></pre>
<p>On Windows:</p>
<pre><code class="lang-bash">&gt; <span class="hljs-built_in">echo</span> Git is awesome | git hash-object --stdin
</code></pre>
<p>By using <code>--stdin</code> you are instructing <code>git hash-object</code> to take its input from the standard input. This will provide you with the relevant hash value:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/hash_object.png" alt="Getting a blob's SHA-1" width="600" height="400" loading="lazy">
<em>Getting a blob's SHA-1</em></p>
<p>In order to actually write that blob into Git's object database, you can add the <code>-w</code> switch for <code>git hash-object</code>. Then, you check the contents of the <code>.git</code> folder, and see that they have changed:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/write_blob.png" alt="Writing a blob to the objects' database" width="600" height="400" loading="lazy">
<em>Writing a blob to the objects' database</em></p>
<p>You can see that the hash of your blob is <code>7a9bd34a0244eaf2e0dda907a521f43d417d94f6</code>. You can also see that a directory has been created under <code>.git/objects</code>, a directory named <code>7a</code>, and within it, a file by the name of <code>9bd34a0244eaf2e0dda907a521f43d417d94f6</code>.</p>
<p>What Git did here is take the <em>first two characters</em> of the SHA-1 hash, and use them as the name of a directory. The remaining characters are used as the filename for the file that actually contains the blob.</p>
<p>Why is that so? Consider a fairly big repository, one that has 400,000 objects (blobs, trees, and commits) in its database. Looking up a hash inside that list of 400,000 hashes might take a while. Thus, Git simply divides that problem by <code>256</code>. </p>
<p>To look up the hash above, Git would first look for the directory named <code>7a</code> inside the directory <code>.git/objects</code>, which may have up to 256 directories (<code>00</code> through <code>FF</code>). Then, it will search within that directory, narrowing down the search as it goes.</p>
<p>Back to the process of generating a commit. You have just created an object. What is the type of that object? You can use another plumbing command, <code>git cat-file -t</code> (<code>-t</code> stands for "type"), to check that out:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/cat_file_t_blob.png" alt="Using  reveals the type of the Git object" width="600" height="400" loading="lazy">
_Using <code>git cat-file -t &amp;lt;object_sha&amp;gt;</code> reveals the type of the Git object_</p>
<p>Not surprisingly, this object is a blob. You can also use <code>git cat-file -p</code> (<code>-p</code> stands for "pretty-print") to see its contents:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/cat_file_p_blob.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git cat-file -p</code></em></p>
<p>This process of creating a blob object under <code>.git/objects</code> usually happens when you add something to the staging area — that is, when you use <code>git add</code>. So blobs are not created every time you save a file to the file system (the working dir), but only when you stage it.</p>
<p>Remember that Git creates a blob of the <em>entire</em> file that is staged. Even if a single character is modified or added, the file has a new blob with a new hash (as in the example in <a class="post-section-overview" href="#heading-chapter-1-git-objects">chapter 1</a> where you added <code>!</code> at the end of a line).</p>
<p>Will there be any change to <code>git status</code>?</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_status_after_blob.png" alt=" after creating a blob object" width="600" height="400" loading="lazy">
<em><code>git status</code> after creating a blob object</em></p>
<p>Apparently, no. Adding a blob object to Git's internal database does not change the status, as Git does not know of any tracked (or untracked) files at this stage.</p>
<p>You need to track this file — add it to the staging area. To do that, you can use another plumbing command, <code>git update-index</code>, like so:</p>
<pre><code class="lang-bash">git update-index --add --cacheinfo 100644 &lt;blob-hash&gt; &lt;filename&gt;
</code></pre>
<p>Note: The <code>cacheinfo</code> is a 16-bit file mode as stored by Git, following the layout of POSIX types and modes. This is not within the scope of this book, as it is really not important for you to Git things done.</p>
<p>Running the command above will result in a change to <code>.git</code>'s contents:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/update_index.png" alt="The state of  after updating the index" width="600" height="400" loading="lazy">
<em>The state of <code>.git</code> after updating the index</em></p>
<p>Can you spot the change? A new file by the name of <code>index</code> has been created. This is it — the famous index (or staging area), is basically a file that resides within <code>.git/index</code>.</p>
<p>So now that your blob has been added to the index, do you expect <code>git status</code> to look different?</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_status_after_update_index.png" alt=" after using " width="600" height="400" loading="lazy">
<em><code>git status</code> after using <code>git update-index</code></em></p>
<p>That's interesting! Two things happened here.</p>
<p>First, you can see that <code>awesome.txt</code> appears in <em>green</em>, in the "Changes to be committed" area. That is so because the index now includes <code>awesome.txt</code>, waiting to be committed.</p>
<p>Second, we can see that <code>awesome.txt</code> appears in <em>red</em> — because Git believes the file <code>awesome.txt</code> has been deleted, and the fact that the file has been deleted is not staged.</p>
<p>(Note: You may have noticed that I sometimes refer to Git with words such as "believes", "thinks", or "wants". As I explained in the <a class="post-section-overview" href="#heading-introduction">introduction of this book</a> - in order for us to enjoy playing around with Git, and reading (and writing) this book, I feel like referring to Git as more than just code makes it all so much more enjoyable.)</p>
<p>This happens as you added the blob with the contents <code>Git is awesome</code> to the objects' database, and updated the index that the file <code>awesome.txt</code> holds the contents of that blob, but you never actually created that file on disk.</p>
<p>You can easily solve this by taking the contents of the blob and writing them to our file system, to a file called <code>awesome.txt</code>:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> <span class="hljs-string">"Git is awesome"</span> &gt; awesome.txt
</code></pre>
<p>As a result, it will no longer appear in red by <code>git status</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_status_after_creating_file_on_disk.png" alt=" after creating  on disk" width="600" height="400" loading="lazy">
<em><code>git status</code> after creating <code>awesome.txt</code> on disk</em></p>
<p>So now it's time to create a commit object from your staging area. As explained in <a class="post-section-overview" href="#heading-chapter-1-git-objects">chapter 1</a>, a commit object has a reference to a tree, so you need to create a tree.</p>
<p>You can achieve this using the command <code>git write-tree</code>, which records the contents of the index in a tree object. Of course, you can use git <code>cat-file -t</code> to see that it's indeed a tree:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/write_tree.png" alt="Creating a tree object with the contents of the index" width="600" height="400" loading="lazy">
<em>Creating a tree object with the contents of the index</em></p>
<p>And you can use <code>git cat-file -p</code> to see its contents:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/cat_file_p_tree.png" alt=" to see the tree's contents" width="600" height="400" loading="lazy">
<em><code>git cat-file -p</code> to see the tree's contents</em></p>
<p>Great, so you created a tree, and now you need to create a commit object that references this tree. To do that, you can use the command:</p>
<pre><code class="lang-bash">git commit-tree &lt;tree-hash&gt; -m &lt;commit message&gt;
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit-tree.png" alt="Committing using the tree object" width="600" height="400" loading="lazy">
<em>Committing using the tree object</em></p>
<p>You should now feel comfortable with the commands used to check the created object's type, and print its contents:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit_object_cat_file.png" alt="Creating a commit object" width="600" height="400" loading="lazy">
<em>Creating a commit object</em></p>
<p>Note that this commit object doesn't have a parent, because it is the first commit. When you add another commit you will probably want to declare its parent — don't worry, you will do so later.</p>
<p>The last hash that we got — <code>b6d05ee40344ef5d53502539772086da14ad2b07</code> – is a commit's hash. You should actually be used to using these hashes — you probably look at them all the time (when using <code>git log</code>, for instance). Note that this commit object points to a tree object, with its own hash, which you rarely specify explicitly.</p>
<p>Will something change in <code>git status</code>?</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_status_after_creating_commit_object.png" alt=" after creating a commit object" width="600" height="400" loading="lazy">
<em><code>git status</code> after creating a commit object</em></p>
<p>No, nothing has changed. Why is that?</p>
<p>Well, to know that your file has been committed, Git needs to know about the latest commit. How does Git do that? It goes to the <code>HEAD</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/looking_at_head_1.png" alt="Looking at the contents of " width="600" height="400" loading="lazy">
<em>Looking at the contents of <code>HEAD</code></em></p>
<p><code>HEAD</code> points to <code>main</code>, but what is <code>main</code>? You haven't really created it yet.</p>
<p>As we explained earlier in <a class="post-section-overview" href="#heading-chapter-2-branches-in-git">chapter 2</a>, a branch is simply a named reference to a commit. And in this case, we would like <code>main</code> to refer to the commit object with the hash <code>b6d05ee40344ef5d53502539772086da14ad2b07</code>.</p>
<p>You can achieve this by creating a file at <code>.git/refs/heads/main</code>, with the contents of this hash, like so:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/creating_main.png" alt="Creating " width="600" height="400" loading="lazy">
<em>Creating <code>main</code></em></p>
<p>In sum, a branch is just a file inside <code>.git/refs/heads</code>, containing a hash of the commit it refers to.</p>
<p>Now, finally, <code>git status</code> and <code>git log</code> seem to appreciate our efforts:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_status_commit_1.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git status</code></em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_commit_1.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git log</code></em></p>
<p>You have successfully created a commit without using porcelain commands! How cool is that?</p>
<h3 id="heading-recap-how-to-create-a-repo-from-scratch">Recap - How to Create a Repo From Scratch</h3>
<p>In this chapter, you fearlessly deep-dived into Git. You stopped using porcelain commands and switched to plumbing commands.</p>
<p>By using echo and low-level commands such as <code>git hash-object</code>, you were able to create a blob, add it to the index, create a tree of the index, and create a commit object pointing to that tree.</p>
<p>You also learned that <code>HEAD</code> is a file, located in <code>.git/HEAD</code>. Branches are also files, located under <code>.git/refs/heads</code>. When you understand how Git operates, those abstract notions of <code>HEAD</code> or "branches" become very tangible.</p>
<p>The next chapter will deepen your understanding of how branches work under the hood.</p>
<h2 id="heading-chapter-5-how-to-work-with-branches-in-git-under-the-hood">Chapter 5 - How to Work with Branches in Git — Under the Hood</h2>
<p>In the previous chapter you created a repository and a commit without using <code>git init</code>, <code>git add</code> or <code>git commit</code>. In this chapter, you we will create and switch between branches without using porcelain commands (<code>git branch</code>, <code>git switch</code>, or <code>git checkout</code>).</p>
<p>It's perfectly understandable if you are excited, I am too!</p>
<p>Continuing from the previous chapter - you only have one branch, named <code>main</code>. To create another one with the name <code>test</code> (as the equivalent of <code>git branch test</code>), you would need to create a file named <code>test</code> within <code>.git/refs/heads</code>, and the contents of that file would be the same commit's hash as the <code>main</code> branch points to.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/create_test_branch.png" alt="Creating  branch" width="600" height="400" loading="lazy">
<em>Creating <code>test</code> branch</em></p>
<p>If you use <code>git log</code>, you can see that this is indeed the case — both <code>main</code> and <code>test</code> point to this commit:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_after_creating_test_branch.png" alt=" after creating  branch" width="600" height="400" loading="lazy">
<em><code>git log</code> after creating <code>test</code> branch</em></p>
<p>(Note: if you run this command and don't see a valid output, you may have written something other than the commit's hash into <code>.git/refs/heads/test</code>.)</p>
<p>Next, switch to our newly created branch (the equivalent of <code>git checkout test</code>). How would you do that? Try to answer for yourself before moving on to the next paragraph.</p>
<p>To change the active branch, you should change <code>HEAD</code> to point to your new branch:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/change_head_to_test.png" alt="Switching to branch  by changing " width="600" height="400" loading="lazy">
<em>Switching to branch <code>test</code> by changing <code>HEAD</code></em></p>
<p>As you can see, <code>git status</code> confirms that <code>HEAD</code> now points to <code>test</code>, which is, therefore, the active branch.</p>
<p>You can now use the commands you have already used in the previous chapter to create another file and add it to the index:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/writing_another_file.png" alt="Writing and staging another file" width="600" height="400" loading="lazy">
<em>Writing and staging another file</em></p>
<p>Following the commands above, you:</p>
<ul>
<li>Create a blob with the content of <code>Another File</code> (using <code>git hash-object</code>).</li>
<li>Add it to the index by the name <code>another_file.txt</code> (using <code>git update-index</code>).</li>
<li>Create a corresponding file on disk with the contents of the blob (using <code>git cat-file -p</code>).</li>
<li>Create a tree object representing the index (using <code>git write-tree</code>).</li>
</ul>
<p>It's now time to create a commit referencing this tree. This time, you should also specify the parent of this commit — which would be the previous commit. You specify the parent using the <code>-p</code> switch of <code>git commit-tree</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit_2.png" alt="Creating another commit object" width="600" height="400" loading="lazy">
<em>Creating another commit object</em></p>
<p>We have just created a commit, with a tree as well as a parent, as you can see:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/cat_file_commit_2.png" alt="Observing the new commit object" width="600" height="400" loading="lazy">
<em>Observing the new commit object</em></p>
<p>(Note: the SHA-1 value of your commit object will be different than the one shown in the screenshot above, as it includes the timestamp of the commit, and also author's details - which would be different on your machine.)</p>
<p>Will <code>git log</code> show us the new commit?</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_after_creating_commit_2.png" alt=" after creating &quot;Commit 2&quot;" width="600" height="400" loading="lazy">
<em><code>git log</code> after creating "Commit 2"</em></p>
<p>As you can see, <code>git log</code> doesn't show anything new. Why is that?</p>
<p>Remember that <code>git log</code> traces the branches to find relevant commits to show. It shows us now <code>test</code> and the commit it points to, and it also shows <code>main</code> which points to the same commit.</p>
<p>That's right — you need to change <code>test</code> to point to the new commit object. You can do that by changing the contents of <code>.git/refs/heads/test</code>:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> 22267a945af8fde78b62ee7f705bbecfdd276b3d &gt; .git/refs/heads/<span class="hljs-built_in">test</span>
</code></pre>
<p>And now if you run <code>git log</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_after_updating_test_branch.png" alt=" after updating  branch" width="600" height="400" loading="lazy">
<em><code>git log</code> after updating <code>test</code> branch</em></p>
<p>It worked!</p>
<p><code>git log</code> goes to <code>HEAD</code>, which tells Git to go to the branch <code>test</code>, which points to commit <code>222..3d</code>, which links back to its parent commit <code>b6d..07</code>.</p>
<p>Feel free to admire the beauty, I Git you. 😊</p>
<p>By inspecting your repository's folder, you can see that you have six different objects under the folder <code>.git/objects</code> - these are the two blobs you created (one for <code>awesome.txt</code> and one for <code>file.txt</code>), two commit objects ("Commit 1" and "Commit 2"), and the tree objects - each pointed to by one of the commit objects.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/tree_after_commit_2.png" alt="The tree listing after creating &quot;Commit 2&quot;" width="600" height="400" loading="lazy">
<em>The tree listing after creating "Commit 2"</em></p>
<p>You also have <code>.git/HEAD</code> that points to the active branch or commit, and two branches - within <code>.git/refs/heads</code>.</p>
<h3 id="heading-recap-how-to-work-with-branches-in-git-under-the-hood">Recap - How to Work with Branches in Git — Under the Hood</h3>
<p>In this chapter you understood how branches actually work in Git.</p>
<p>The main things we covered:</p>
<ul>
<li>A branch is a file under <code>.git/refs/heads</code>, where the content of the file is a SHA-1 value of a commit.</li>
<li>To create a new branch, Git simply creates a new file under <code>.git/refs/heads</code> with the name of the branch - for example, <code>.git/refs/heads/my_branch</code> for the branch <code>my_branch</code>.</li>
<li>To switch the active branch, Git modifies the contents of <code>.git/HEAD</code> to refer to the new active branch. <code>.git/HEAD</code> may also point to a commit object directly.</li>
<li>When committing using <code>git commit</code>, Git creates a commit object, and also moves the current branch (that is, the contents of the file under <code>.git/refs/heads</code>) to point to the newly created commit object.</li>
</ul>
<h2 id="heading-part-1-summary">Part 1 - Summary</h2>
<p>This part introduced you to the internals of Git. We started by covering <a class="post-section-overview" href="#heading-chapter-1-git-objects">the basic objects</a> — blobs, trees, and commits.</p>
<p>You learned that a <strong>blob</strong> holds the contents of a file. A <strong>tree</strong> is a directory-listing, containing blobs and/or sub-trees. A <strong>commit</strong> is a snapshot of our working directory, with some meta-data such as the time or the commit message.</p>
<p>You learned about <strong><a class="post-section-overview" href="#heading-chapter-2-branches-in-git">branches</a></strong>, seeing that they are nothing but a named reference to a commit.</p>
<p>You learned the process of <a class="post-section-overview" href="#heading-chapter-3-how-to-record-changes-in-git">recording changes in Git</a>, and that it involves the <strong>working directory</strong>, a directory that has a repository associated with it, the <strong>staging area (index)</strong> which holds the tree for the next commit, and the <strong>repository</strong>, which is a collection of commits and references.</p>
<p>We clarified how these terms relate to Git commands we know by creating a new repository and committing a file using the well-known <code>git init</code>, <code>git add</code>, and <code>git commit</code>.</p>
<p>Then you <a class="post-section-overview" href="#heading-chapter-4-how-to-create-a-repo-from-scratch">created a new repository from scratch</a>, by using <code>echo</code> and low-level commands such as <code>git hash-object</code>. You created a blob, added it to the index, created a tree object representing the index, and even created a commit object pointing to that tree.</p>
<p>You were also able to create and <a class="post-section-overview" href="#heading-chapter-5-how-to-work-with-branches-in-git-under-the-hood">switch between branches by modifying files directly</a>. Kudos to those of you who tried this on your own!</p>
<p>All together, after following along through this part, you should feel that you've deepened your understanding of what is happening under the hood when working with Git.</p>
<p>The next part will explore different strategies for integrating changes when working in different branches in Git - specifically, merge and rebase.</p>
<h1 id="heading-part-2-branching-and-integrating-changes">Part 2 - Branching and Integrating Changes</h1>
<h2 id="heading-chapter-6-diffs-and-patches">Chapter 6 - Diffs and Patches</h2>
<p>In Part 1 you learned how Git works under the hood, the different Git objects, and how to create a repo from scratch.</p>
<p>When teams work with Git, they introduce sequences of changes, usually in branches, and then they need to combine different change histories together. To really understand how this is achieved, you should learn how Git treats diffs and patches. You will then apply your knowledge to understand the process of merge and rebase.</p>
<p>Many of the interesting processes in Git like merging, rebasing, or even committing are based on diffs and patches. Developers work with diffs all the time, whether using Git directly or relying on the IDE's diff view. In this chapter, you will learn what Git diffs and patches are, their structure, and how to apply patches.</p>
<p>As a reminder from the <a class="post-section-overview" href="#heading-chapter-1-git-objects">chapter on Git Objects</a>, a commit is a snapshot of the working tree at a certain point in time, in addition to some meta-data.</p>
<p>Yet, it is really hard to make sense of individual commits by looking at the entire working tree. Rather, it is more helpful to look at how different a commit is from its parent commit, that is, the diff between these commits.</p>
<p>So, what do I mean when I say "diff"? Let's start with some history.</p>
<h3 id="heading-git-diffs-history">Git Diff's History</h3>
<p>Git's <code>diff</code> is based on the diff utility on UNIX systems. <code>diff</code> was developed in the early 1970's on the Unix operating system. The first released version shipped with the Fifth Edition of Unix in 1974.</p>
<p><code>git diff</code> is a command that takes two inputs, and computes the difference between them. Inputs can be commits, but also files, and even files that have never been introduced to the repository.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_definition.png" alt="Git diff takes two inputs, which can be commits or files" width="600" height="400" loading="lazy">
<em>Git diff takes two inputs, which can be commits or files</em></p>
<p>This is important - <code>git diff</code> computes the <em>difference</em> between two strings, which most of the time happen to consist of code, but not necessarily.</p>
<h3 id="heading-time-to-get-hands-on">Time to Get Hands-On</h3>
<p>As always, you are encouraged to run the commands yourself while reading this chapter. Unless noted otherwise, I will use the following repository:</p>
<p><a target="_blank" href="https://github.com/Omerr/gitting_things_repo.git">https://github.com/Omerr/gitting_things_repo.git</a></p>
<p>You can clone it locally and have the same starting point I am using for this chapter.</p>
<p>Consider this short text file on my machine, called <code>file.txt</code>, which consists of 6 lines:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/file_txt_1.png" alt=" consists of six lines" width="600" height="400" loading="lazy">
<em><code>file.txt</code> consists of six lines</em></p>
<p>Now, modify this file a bit. Remove the second line, and insert a new line as the fourth line. Add an exclamation mark (<code>!</code>) to the end of the last line, so you get this result:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/new_file_txt_1.png" alt="After modifying , we get different six lines" width="600" height="400" loading="lazy">
<em>After modifying <code>file.txt</code>, we get different six lines</em></p>
<p>Save this file with a new name, <code>new_file.txt</code>.</p>
<p>Now you can run <code>git diff</code> to compute the difference between the files like so:</p>
<pre><code class="lang-bash">git diff --no-index file.txt new_file.txt
</code></pre>
<p>(I will explain the <code>--no-index</code> switch of this command later. For now it's enough to understand it allows us to compare between two files that are not part of a Git repository.)</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_1.png" alt="The output of " width="600" height="400" loading="lazy">
_The output of <code>git diff --no-index file.txt new_file.txt</code>_</p>
<p>The output of <code>git diff</code> shows quite a lot of things.</p>
<p>Focus on the part starting with <code>This is a file</code>. You can see that the added line (<code>// new test</code>) is preceded by a <code>+</code> sign. The deleted line is preceded by a <code>-</code> sign.</p>
<p>Interestingly, notice that Git views a modified line as a sequence of two changes - erasing a line and adding a new line instead. So the patch includes deleting the last line, and adding a new line that's equal to that line, with the addition of a <code>!</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/diff_format_lines.png" alt="Addition lines are preceded by , deletion lines by , and modification lines are sequences of deletions and additions" width="600" height="400" loading="lazy">
<em>Addition lines are preceded by <code>+</code>, deletion lines by <code>-</code>, and modification lines are sequences of deletions and additions</em></p>
<p>Now would be a good time to discuss the terms "patch" and "diff". These two are often used interchangeably, although there is a distinction, at least historically. </p>
<p>A <strong>diff</strong> shows the differences between two files, or snapshots, and can be quite minimal in doing so. A <strong>patch</strong> is an extension of a diff, augmented with further information such as context lines and filenames, which allow it to be <em>applied</em> more widely. It is a text document that describes how to alter an existing file or codebase.</p>
<p>These days, the Unix <code>diff</code> program, and <code>git diff</code>, can produce patches of various kinds.</p>
<p>A patch is a compact representation of the differences between two files. It describes how to turn one file into another.</p>
<p>In other words, if you apply the "instructions" produced by <code>git diff</code> on <code>file.txt</code> - that is, remove the second line, insert <code>// new test</code> as the fourth line, remove the last line, and add instead a line with the same content and <code>!</code> - you will get the content of <code>new_file.txt</code>.</p>
<p>Another important thing to note is that a patch is <strong>asymmetric</strong>: the patch from <code>file.txt</code> to <code>new_file.txt</code> is not the same as the patch for the other direction. Generating a patch between <code>new_file.txt</code> and <code>file.txt</code>, in this order, would mean exactly the opposite instructions than before - add the second line instead of removing it, and so on.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/patch_asymmetric.png" alt="A patch consists of asymmetric instructions to get from one file to another" width="600" height="400" loading="lazy">
<em>A patch consists of asymmetric instructions to get from one file to another</em></p>
<p>Try it out:</p>
<pre><code class="lang-bash">git diff --no-index new_file.txt file.txt
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_2.png" alt="Running git diff in the reverse direction yields the reverse instructions - add a line instead of removing it, and so on" width="600" height="400" loading="lazy">
<em>Running git diff in the reverse direction yields the reverse instructions - add a line instead of removing it, and so on</em></p>
<p>The patch format uses context, as well as line numbers, to locate differing file regions. This allows a patch to be applied to a somewhat earlier or later version of the first file than the one from which it was derived, as long as the applying program can still locate the context of the change. We will see exactly how these are used.</p>
<h3 id="heading-the-structure-of-a-diff">The Structure of a Diff</h3>
<p>It's time to dive deeper.</p>
<p>Generate a diff from <code>file.txt</code> to <code>new_file.txt</code> again, and consider the output more carefully:</p>
<pre><code class="lang-bash">git diff --no-index file.txt new_file.txt
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_1-1.png" alt="The output of " width="600" height="400" loading="lazy">
_The output of <code>git diff --no-index file.txt new_file.txt</code>_</p>
<p>The first line introduces the compared files. Git always gives one file the name <code>a</code>, and the other the name <code>b</code>. So in this case <code>file.txt</code> is called <code>a</code>, whereas <code>new_file.txt</code> is called <code>b</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/diff_structure_1.png" alt="The first line in 's output introduces the files being compared" width="600" height="400" loading="lazy">
<em>The first line in <code>diff</code>'s output introduces the files being compared</em></p>
<p>Then the second line, starting with <code>index</code>, includes the blob SHAs of these files. So even though in our case they are not even stored within a Git repo, Git shows their corresponding SHA-1 values.</p>
<p>The third value in this line, <code>100644</code>, is the "mode bits", indicating that this is a "regular" file: not executable and not a symbolic link.</p>
<p>The use of two dots (<code>..</code>) here between the blob SHAs is just as a separator (unlike other cases where it's used within Git).</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/diff_structure_2.png" alt="The second line in 's output includes the blob SHAs of the compared files, as well as the mode bits" width="600" height="400" loading="lazy">
<em>The second line in <code>diff</code>'s output includes the blob SHAs of the compared files, as well as the mode bits</em></p>
<p>Other header lines might indicate the old and new mode bits if they've changed, old and new filenames if the files were being renamed, and so on.</p>
<p>The blob SHAs (also called "blob IDs") are helpful if this patch is later applied by Git to the same project and there are conflicts while applying it. You will better understand what this means when you learn about the merges in <a class="post-section-overview" href="#heading-chapter-7-understanding-git-merge">the next chapter</a>.</p>
<p>After the blob IDs, we have two lines: one starting with <code>-</code> signs, and the other starting with <code>+</code> signs. This is the traditional "unified diff" header, again showing the files being compared and the direction of the changes: <code>-</code> signs show lines in the A version that are missing from the B version, and <code>+</code> signs show lines missing in the A version but present in B.</p>
<p>If the patch were of this file being added or deleted in its entirety, then one of these would be <code>/dev/null</code> to signal that.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/diff_structure_3.png" alt=" signs show lines in the A version but missing from the B version; and  signs, lines missing in A version but present in B" width="600" height="400" loading="lazy">
<em><code>-</code> signs show lines in the A version but missing from the B version, and <code>+</code> signs, lines missing in A version but present in B</em></p>
<p>Consider the case where you delete a file:</p>
<pre><code class="lang-bash">rm awesome.txt
</code></pre>
<p>And then use <code>git diff</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/rm_diff.png" alt="'s output for a deleted file" width="600" height="400" loading="lazy">
<em><code>git diff</code>'s output for a deleted file</em></p>
<p>The <code>A</code> version, representing the state of the index, is currently <code>awesome.txt</code>, compared to the working dir where this file does not exist, so it is <code>/dev/null</code>. All lines are preceded by <code>-</code> signs as they exist only in the <code>A</code> version.</p>
<p>For now, undo the deleting (more on undoing changes in Part 3):</p>
<pre><code class="lang-bash">git restore awesome.txt
</code></pre>
<p>Going back to the diff we started with:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_1-2.png" alt="The output of " width="600" height="400" loading="lazy">
_The output of <code>git diff --no-index file.txt new_file.txt</code>_</p>
<p>After this unified diff header, we get to the main part of the diff, consisting of "difference sections", also called "hunks" or "chunks" in Git. Note that these terms are used interchangeably, and you may stumble upon either of them in Git's documentation and tutorials, as well as Git's source code.</p>
<p>Every hunk begins with a single line, starting with two <code>@</code> signs. These signs are followed by at most four numbers, and then a header for the chunk - which is an educated guess by Git. Usually, it will include the beginning of a function or a class, when possible.</p>
<p>In this example it doesn't include anything as this is a text file, so consider another example for a moment:</p>
<pre><code class="lang-bash">git diff --no-index example.py example_changed.py
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/diff_example_changed.png" alt="When possible, Git includes a header for each hunk, for example a function or class definition" width="600" height="400" loading="lazy">
<em>When possible, Git includes a header for each hunk, for example a function or class definition</em></p>
<p>In the image above, the hunk's header includes the beginning of the function that includes the changed lines - <code>def example_function(x)</code>.</p>
<p>Back to our previous example then:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_1-3.png" alt="Back to the previous diff" width="600" height="400" loading="lazy">
<em>Back to the previous diff</em></p>
<p>After the two <code>@</code> signs, you'll find four numbers:</p>
<p>The first numbers are preceded by a <code>-</code> sign as they refer to <code>file A</code>. The first number represents the line number corresponding to the first line in <code>file A</code> that this hunk refers to. In the example above, it is <code>1</code>, meaning that the line <code>This is a file</code> corresponds to line number <code>1</code> in version <code>file A</code>.</p>
<p>This number is followed by a comma (<code>,</code>), and then the number of lines this chunk consists of in <code>file A</code>. This number includes all context lines (the lines preceded with a space in the <code>diff</code>), or lines marked with a <code>-</code> sign, as they are part of <code>file A</code>, but not lines marked with a <code>+</code> sign, as they do not exist in <code>file A</code>.</p>
<p>In our example, this number is <code>6</code>, counting the context line <code>This is a file</code>, the <code>-</code> line <code>It has a nice poem:</code>, then the three context lines, and lastly <code>Are belong to you</code>.</p>
<p>As you can see, the lines beginning with a space character are context lines, which means they appear as shown in both <code>file A</code> and <code>file B</code>.</p>
<p>Then, we have a <code>+</code> sign to mark the two numbers that refer to <code>file B</code>. First, there's the line number corresponding to the first line in <code>file B</code>, followed by the number of lines this chunk consists of in <code>file B</code>.</p>
<p>This number includes all context lines, as well as lines marked with the <code>+</code> sign, as they are part of <code>file B</code>, but not lines marked with a <code>-</code> sign.</p>
<p>These four numbers are followed by two additional <code>@</code> signs.</p>
<p>After the header of the chunk, we get the actual lines - either context, <code>-</code>, or <code>+</code> lines.</p>
<p>Typically and by default, a hunk starts and ends with three context lines. For example, if you modify lines 4-5 in a file with ten lines:</p>
<ul>
<li>Line 1 - context line (before the changed lines)</li>
<li>Line 2 - context line (before the changed lines)</li>
<li>Line 3 - context line (before the changed lines)</li>
<li>Line 4 - changed line</li>
<li>Line 5 - another changed line</li>
<li>Line 6 - context line (after the changed lines)</li>
<li>Line 7 - context line (after the changed lines)</li>
<li>Line 8 - context line (after the changed lines)</li>
<li>Line 9 - this line will not be part of the hunk</li>
</ul>
<p>So by default, changing lines 4-5 results in a hunk consisting of lines 1-8, that is, three lines before and three lines after the modified lines.</p>
<p>If that file doesn't have nine lines, but rather six lines - then the hunk will contain only one context line after the changed lines, and not three. Similarly, if you change the second line of a file, then there would be only one line of context before the changed lines.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/diff_structure_4.png" alt="The patch format by " width="600" height="400" loading="lazy">
<em>The patch format by <code>git diff</code></em></p>
<h3 id="heading-how-to-produce-diffs">How to Produce Diffs</h3>
<p>The last example we considered shows a diff between two files. A single patch file can contain the differences for <em>any</em> number of files, and <code>git diff</code> produces diffs for all altered files in the repository in a single patch.</p>
<p>Often, you will see the output of <code>git diff</code> showing two versions of the same file and the difference between them.</p>
<p>To demonstrate, consider the state in another branch called <code>diffs</code>:</p>
<pre><code class="lang-bash">git checkout diffs
</code></pre>
<p>Again, I encourage you to run the commands with me - make sure you clone the repository from:</p>
<p><a target="_blank" href="https://github.com/Omerr/gitting_things_repo.git">https://github.com/Omerr/gitting_things_repo.git</a></p>
<p>At the current state, the active directory is a Git repository, with a clean status:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_status_branch_diffs.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git status</code></em></p>
<p>Take an existing file, <code>my_file.py</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/nano_my_file.png" alt="An example file - " width="600" height="400" loading="lazy">
_An example file - <code>my_file.py</code>_</p>
<p>And change the second line from <code>print('An example function!')</code> to <code>print('An example function! And it has been changed!')</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/nano_my_file_after_change.png" alt="The contents of  after modifying the second line" width="600" height="400" loading="lazy">
_The contents of <code>my_file.py</code> after modifying the second line_</p>
<p>Save your changes, but don't stage or commit them. Next, run <code>git diff</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/diff_my_file.png" alt="The output of  for  after changing it" width="600" height="400" loading="lazy">
_The output of <code>git diff</code> for <code>my_file.py</code> after changing it_</p>
<p>The output of <code>git diff</code> shows the difference between <code>my_file.py</code>'s version in the staging area, which in this case is the same as the last commit (<code>HEAD</code>), and the version in the working directory.</p>
<p>I covered the terms "working directory", "staging area", and "commit" in the <a class="post-section-overview" href="#heading-chapter-1-git-objects">Git objects chapter</a>, so check it out in ccase you would like to refresh your memory. As a reminder, the terms "staging area" and "index" are interchangeable, and both are widely used.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/repo_state_commit_2_staging_area.png" alt="At this state, the status of the working dir is different from the status of the index. The status of the index is the same as that of " width="600" height="400" loading="lazy">
<em>At this state, the status of the working dir is different from the status of the index. The status of the index is the same as that of <code>HEAD</code></em></p>
<p>To see the difference between the <strong>working dir</strong> and the <strong>staging area</strong>, use <code>git diff</code>, without any additional flags.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/repo_state_commit_2_git_diff-1.png" alt="Without switches,  shows the difference between the staging area and the working directory" width="600" height="400" loading="lazy">
<em>Without switches, <code>git diff</code> shows the difference between the staging area and the working directory</em></p>
<p>As you can see, <code>git diff</code> lists here both <code>file A</code> and <code>file B</code> pointing to <code>my_file.py</code>. <code>file A</code> here refers to the version of <code>my_file.py</code> in the staging area, whereas <code>file B</code> refers to its version in the working dir.</p>
<p>Note that if you modify <code>my_file.py</code> in a text editor, and don't save the file, then <code>git diff</code> will not be aware of the changes you've made. This is because they haven't been saved to the working dir.</p>
<p>We can provide a few switches to <code>git diff</code> to get the diff between the working dir and a specific commit, or between the staging area and the latest commit, or between two commits, and so on.</p>
<p>First create a new file, <code>new_file.txt</code>, and save it:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/nano_new_file.png" alt="A simple new file saved as new_file.txt" width="600" height="400" loading="lazy">
_A simple new file saved as <code>new_file.txt</code>_</p>
<p>Currently the file is in the working dir, and it is actually untracked in Git.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/new_file_working_dir.png" alt="A new, untracked file" width="600" height="400" loading="lazy">
<em>A new, untracked file</em></p>
<p>Now stage and commit this file:</p>
<pre><code class="lang-bash">git add new_file.txt
git commit -m <span class="hljs-string">"Commit 3"</span>
</code></pre>
<p>Now, the state of <code>HEAD</code> is the same as the state of the staging area, as well as the working tree:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/repo_state_commit_3.png" alt="The state of HEAD is the same as the index and the working dir" width="600" height="400" loading="lazy">
<em>The state of <code>HEAD</code> is the same as the index and the working dir</em></p>
<p>Next, edit <code>new_file.txt</code> by adding a new line at the beginning and another new line at the end:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/new_file_edited.png" alt="Modifying  by adding a line in the beginning and another in the end" width="600" height="400" loading="lazy">
_Modifying <code>new_file.txt</code> by adding a line in the beginning and another in the end_</p>
<p>As a result, the state is as follows:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/repo_state_start_end.png" alt="After saving, the state in the working dir is different than that of the index or " width="600" height="400" loading="lazy">
<em>After saving, the state in the working dir is different than that of the index or <code>HEAD</code></em></p>
<p>A nice trick would be to use <code>git add -p</code>, which allows you to split the changes even within a file, and consider which ones you'd like to stage.</p>
<p>In this case, add the first line to the index, but not the last line. To do that, you can split the hunk using <code>s</code>, then accept to stage the first hunk (using <code>y</code>), and not the second part (using <code>n</code>).</p>
<p>If you are not sure what each letter stands for, you can always use a <code>?</code> and Git will tell you.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/add_p.png" alt="Using , you can stage only the first change" width="600" height="400" loading="lazy">
<em>Using <code>git add -p</code>, you can stage only the first change</em></p>
<p>So now the state in <code>HEAD</code> is without either of those new lines. In the staging area you have the first line but not the last line, and in the working dir you have both new lines.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/repo_state_after_add_p.png" alt="The state after staging only the first line" width="600" height="400" loading="lazy">
<em>The state after staging only the first line</em></p>
<p>If you use <code>git diff</code>, what will happen?</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_3.png" alt=" shows the difference between the index and the working dir" width="600" height="400" loading="lazy">
<em><code>git diff</code> shows the difference between the index and the working dir</em></p>
<p>Well, as stated before, you get the diff between the staging area and the working tree.</p>
<p>What happens if you want to get the diff between <code>HEAD</code> and the staging area? For that, you can use <code>git diff --cached</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_cached.png" alt=" shows the difference between  and the index" width="600" height="400" loading="lazy">
<em><code>git diff --cached</code> shows the difference between <code>HEAD</code> and the index</em></p>
<p>And what if you want the difference between <code>HEAD</code> and the working tree? For that you can run <code>git diff HEAD</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_HEAD.png" alt=" shows the difference between  and the working dir" width="600" height="400" loading="lazy">
<em><code>git diff HEAD</code> shows the difference between <code>HEAD</code> and the working dir</em></p>
<p>To summarize the different switches for git diff we have seen so far, here's a diagram:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_diagram_1.png" alt="Different switches for " width="600" height="400" loading="lazy">
<em>Different switches for <code>git diff</code></em></p>
<p>As a reminder, at the beginning of this chapter you used <code>git diff --no-index</code>. With the <code>--no-index</code> switch, you can compare two files that are not part of the repository - or of any staging area.</p>
<p>Now, commit the changes you have in the staging area:</p>
<pre><code class="lang-bash">git commit -m <span class="hljs-string">"Commit 4"</span>
</code></pre>
<p>To observe the diff between this commit and its parent commit, you can run the following command:</p>
<pre><code class="lang-bash">git diff HEAD~1 HEAD
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_HEAD_1_HEAD.png" alt="The output of " width="600" height="400" loading="lazy">
<em>The output of <code>git diff HEAD~1 HEAD</code></em></p>
<p>By the way, you can omit the <code>1</code> above and write <code>HEAD~</code>, and get the same result. Using <code>1</code> is the explicit way to state you are referring to the first parent of the commit.</p>
<p>Note that writing the parent commit here, <code>HEAD~1</code>, first results in a diff showing how to get <em>from</em> the parent commit <em>to</em> the current commit. Of course, I could also generate the reverse diff by writing:</p>
<pre><code class="lang-bash">git diff HEAD HEAD~1
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_HEAD_HEAD_1.png" alt="The output of  generates the reverse patch" width="600" height="400" loading="lazy">
<em>The output of <code>git diff HEAD HEAD~1</code> generates the reverse patch</em></p>
<p>To summarize all the different switches for git diff we covered in this section, see this diagram:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_diagram_2.png" alt="The different switches for " width="600" height="400" loading="lazy">
<em>The different switches for <code>git diff</code></em></p>
<p>A short way to view the diff between a commit and its parent is by using <code>git show</code>, for example:</p>
<pre><code class="lang-bash">git show HEAD
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_show_HEAD.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git show HEAD</code></em></p>
<p>This is the same as writing:</p>
<pre><code class="lang-bash">git diff HEAD~ HEAD
</code></pre>
<p>We can now update our diagram:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_diagram_3.png" alt=" is used to show the difference between commits" width="600" height="400" loading="lazy">
<em><code>git diff HEAD~ HEAD</code> is used to show the difference between commits</em></p>
<p>You can go back to this diagram as a reference when needed.</p>
<p>As a reminder, Git commits are snapshots - of the entire working directory of the repository, at a certain point in time. Yet, it's sometimes not useful to regard a commit as a whole snapshot, but rather by the <strong>changes</strong> this specific commit introduced. In other words, by the diff between a parent commit to the next commit.</p>
<p>As you learned in the <a class="post-section-overview" href="#heading-chapter-1-git-objects">Git Objects chapter</a>, Git stores the <strong>entire</strong> snapshots. The diff is dynamically generated from the snapshot data - by comparing the root trees of the commit and its parent.</p>
<p>Of course, Git can compare any two snapshots in time, not just adjacent commits, and also generate a diff of files not included in a repository.</p>
<h3 id="heading-how-to-apply-patches">How to Apply Patches</h3>
<p>By using <code>git diff</code> you can see a patch Git generates, and you can then apply this patch using <code>git apply</code>.</p>
<h4 id="heading-historical-note">Historical Note</h4>
<p>Actually, sharing patches used to be the main way to share code in the early days of open source. But now - virtually all projects have moved to sharing Git commits directly through pull requests (called "merge requests" on some platforms).</p>
<p>The biggest problem with using patches is that it is hard to apply a patch when your working directory does not match the sender's previous commit. Losing the commit history makes it difficult to resolve conflicts. You will better understand this as you dive deeper into the process of <code>git apply</code>, especially in the next chapter where we cover merges.</p>
<h4 id="heading-a-simple-patch">A Simple Patch</h4>
<p>What does it mean to apply a patch? It's time to try it out!</p>
<p>Take the output of <code>git diff</code>:</p>
<pre><code class="lang-bash">git diff HEAD~1 HEAD
</code></pre>
<p>And store it in a file:</p>
<pre><code class="lang-bash">git diff HEAD~1 HEAD &gt; my_patch.patch
</code></pre>
<p>Use <code>reset</code> to undo the last commit:</p>
<pre><code class="lang-bash">git reset --hard HEAD~1
</code></pre>
<p>Don't worry about the last command - I'll explain it in detail in Part 3, where we discuss undoing changes. In short, it allows us to "reset" the state of where <code>HEAD</code> is pointing to, as well as the state of the index and of the working dir. In the example above, they are all set to the state of <code>HEAD~1</code>, or "Commit 3" in the diagram.</p>
<p>So after running the reset command, the contents of the file are as follows (the state from "Commit 3"):</p>
<pre><code class="lang-bash">nano new_file.txt
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/nano_new_file-1.png" alt="Image" width="600" height="400" loading="lazy">
_<code>new_file.txt</code>_</p>
<p>And you will apply this patch that you've just saved:</p>
<pre><code class="lang-bash">nano my_patch.patch
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/my_patch.png" alt="The patch you are about to apply, as generated by git diff" width="600" height="400" loading="lazy">
<em>The patch you are about to apply, as generated by git diff</em></p>
<p>This patch tells Git to find the lines:</p>
<pre><code class="lang-txt">This is a new file
With new content!
</code></pre>
<p>Those lines used to be line number 1 and line number 2 in <code>new_file.txt</code>, and add a line with the content <code>START!</code> right above them.</p>
<p>Run this command to apply the patch:</p>
<pre><code class="lang-bash">git apply my_patch.patch
</code></pre>
<p>And as a result, you get this version of your file, just like the commit you have created before:</p>
<pre><code class="lang-bash">nano new_file.txt
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/new_file_after_applying.png" alt="The contents of  after applying the patch" width="600" height="400" loading="lazy">
_The contents of <code>new_file.txt</code> after applying the patch_</p>
<h4 id="heading-understanding-the-context-lines">Understanding the Context Lines</h4>
<p>To understand the importance of context lines, consider a more advanced scenario. What happens if line numbers have changed since you created the patch file?</p>
<p>To test, start by creating another file:</p>
<pre><code class="lang-bash">nano test.text
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/testing_file.png" alt="Creating another file - " width="600" height="400" loading="lazy">
<em>Creating another file - <code>test.txt</code></em></p>
<p>Stage and commit this file:</p>
<pre><code class="lang-bash">git add test.txt

git commit -m <span class="hljs-string">"Test file"</span>
</code></pre>
<p>Now, change this file by adding a new line, and also erasing the line before the last one:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/testing_file_modified.png" alt="Changes to " width="600" height="400" loading="lazy">
<em>Changes to <code>test.txt</code></em></p>
<p>Observe the difference between the original version of the file and the version including your changes:</p>
<pre><code class="lang-bash">git diff -- test.txt
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/testing_file_diff.png" alt="The output for git diff -- " width="600" height="400" loading="lazy">
<em>The output for <code>git diff -- test.txt</code></em></p>
<p>(Using <code>-- test.txt</code> tells Git to run the command <code>diff</code>, taking into consideration only <code>test.txt</code>, so you don't get the diff for other files.)</p>
<p>Store this diff into a patch file:</p>
<pre><code class="lang-bash">git diff -- test.txt &gt; new_patch.patch
</code></pre>
<p>Now, reset your state to that before introducing the changes:</p>
<pre><code class="lang-bash">git reset --hard
</code></pre>
<p>If you were to apply new_patch.patch now, it would simply work.</p>
<p>Let's now consider a more interesting case. Modify <code>test.txt</code> again by adding a new line at the beginning:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/testing_file_added_first_line.png" alt="Adding a new line at the beginning of " width="600" height="400" loading="lazy">
<em>Adding a new line at the beginning of <code>test.txt</code></em></p>
<p>As a result, the line numbers are different from the original version where the patch has been created. Consider the patch you created before:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/new_patch.png" alt="Image" width="600" height="400" loading="lazy">
_<code>new_patch.patch</code>_</p>
<p>It assumes that the line <code>With more text</code> is the second line in <code>test.txt</code>, which is no longer the case. So...will <code>git apply</code> work?</p>
<pre><code class="lang-bash">git apply new_patch.patch
</code></pre>
<p>It worked!</p>
<p>By default, Git looks for 3 lines of context before and after each change introduced in the patch - as you can see, they are included in the patch file. If you take three lines before and after the added line, and three lines before and after the deleted line (actually only one line after, as no other lines exist) - you get to the patch file. If these lines all exist - then applying the patch works, even if the line numbers changed.</p>
<p>Reset the state again:</p>
<pre><code class="lang-bash">git reset --hard
</code></pre>
<p>What happens if you change one of the context lines? Try it out by changing the line <code>With more text</code> to <code>With more text!</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/testing_file_modifying_second_line.png" alt="Changing the line  to " width="600" height="400" loading="lazy">
<em>Changing the line <code>With more text</code> to <code>With more text!</code></em></p>
<p>And now:</p>
<pre><code class="lang-bash">git apply new_patch.patch
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_apply_new_patch.png" alt=" doesn't apply the patch" width="600" height="400" loading="lazy">
<em><code>git apply</code> doesn't apply the patch</em></p>
<p>Well, no. The patch does not apply. If you are not sure why, or just want to better understand the process Git is performing, you can add the <code>--verbose</code> flag to <code>git apply</code>, like so:</p>
<pre><code class="lang-bash">git apply --verbose new_patch.patch
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_apply_new_patch_verbose.png" alt=" shows the process Git is taking to apply the patch" width="600" height="400" loading="lazy">
<em><code>git apply --verbose</code> shows the process Git is taking to apply the patch</em></p>
<p>It seems that Git searched lines from the file, including the line "With more text", right before the line "It has some really nice lines". This sequence of lines no longer exists in the file. As Git cannot find this sequence, it cannot apply the patch.</p>
<p>As mentioned earlier, by default, Git looks for 3 lines of context before and after each change introduced in the patch. If the surrounding three lines do not exist, Git cannot apply the patch.</p>
<p>You can ask Git to rely on fewer lines of context, using the <code>-C</code> argument. For example, to ask Git to look for 1 line of the surrounding context, run the following command:</p>
<pre><code class="lang-bash">git apply -C1 new_patch.patch
</code></pre>
<p>The patch applies!</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_apply_c1.png" alt="Image" width="600" height="400" loading="lazy">
_<code>git apply -C1 new_patch.patch</code>_</p>
<p>Why is that? Consider the patch again:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/new_patch-1.png" alt="Image" width="600" height="400" loading="lazy">
_<code>new_patch.patch</code>_</p>
<p>When applying the patch with the <code>-C1</code> option, Git is looking for the lines:</p>
<pre><code class="lang-txt">Like this one
And that one
</code></pre>
<p>in order to add the line <code>!!!This is the new line!!!</code> between these two lines. These lines exist (and, importantly, they appear one right after the other). As a result, Git can successfully add the line between them, even though the line numbers changed.</p>
<p>Similarly, Git would look for the lines:</p>
<pre><code class="lang-txt">How wonderful
So we are writing an example
Git is awesoome!
</code></pre>
<p>As Git can find these lines, Git can erase the middle one.</p>
<p>If we changed one of these lines, say, changed "How wonderful" to "How very wondeful", then Git would not be able to find the string above, and thus the patch would not apply.</p>
<h3 id="heading-recap-git-diff-and-patch">Recap - Git Diff and Patch</h3>
<p>In this chapter, you learned what a diff is, and the difference between a diff and a patch. You learned how to generate various patches using different switches for <code>git diff</code>. You also learned what the output of git diff looks like, and how it is constructed. Ultimately, you learned how patches are applied, and specifically the importance of context.</p>
<p>Understanding diffs is a major milestone for understanding many other processes within Git - for example, merging or rebasing, that we will explore in the next chapters.</p>
<h2 id="heading-chapter-7-understanding-git-merge">Chapter 7 - Understanding Git Merge</h2>
<p>By reading this chapter, you are going to really understand <code>git merge</code>, one of the most common operations you'll perform in your Git repositories.</p>
<h3 id="heading-what-is-a-merge-in-git">What is a Merge in Git?</h3>
<p>Merging is the process of combining the recent changes from several branches into a single new commit. This commit points back to these branches.</p>
<p>In a way, merging is the complement of branching in version control: a branch allows you to work simultaneously with others on a particular set of files, whereas a merge allows you to later combine separate work on branches that diverged from a common ancestor commit.</p>
<p>OK, let's take this bit by bit.</p>
<p>Remember that in Git, a branch is just a name pointing to a single commit. When we think about commits as being "on" a specific branch, they are actually reachable through the parent chain from the commit that the branch is pointing to.</p>
<p>That is, if you consider this commit graph:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit_graph_1.png" alt="Commit graph with " width="600" height="400" loading="lazy">
_Commit graph with <code>feature_1</code>_</p>
<p>You see the branch <code>feature_1</code>, which points to a commit with the SHA-1 value of <code>ba0d2</code>. As in previous chapters, I only write the first 5 digits of the SHA-1 value for brevity.</p>
<p>Notice that commit <code>54a9d</code> is also "on" this branch, as it is the parent commit of <code>ba0d2</code>. So if you start from the pointer of <code>feature_1</code>, you get to <code>ba0d2</code>, which then points to <code>54a9d</code>. You can go on the chain of parents, and all these reachable commits are considered to be "on" <code>feature_1</code>.</p>
<p>When you merge with Git, you merge commits. Almost always, we merge two commits by referring to them with the branch names that point to them. Thus we say we "merge branches" - though under the hood, we actually merge commits.</p>
<h3 id="heading-time-to-get-hands-on-1">Time to Get Hands-on</h3>
<p>For this chapter, I will use the following repository:</p>
<p><a target="_blank" href="https://github.com/Omerr/gitting_things_merge.git">https://github.com/Omerr/gitting_things_merge.git</a></p>
<p>As in previous chapters, I encourage you to clone it locally and have the same starting point I am using for this chapter.</p>
<p>OK, so let's say I have this simple repository here, with a branch called <code>main</code>, and a few commits with the commit messages of "Commit 1", "Commit 2", and "Commit 3":</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commits_1_3.png" alt="A simple repository with three commits" width="600" height="400" loading="lazy">
<em>A simple repository with three commits</em></p>
<p>Next, create a feature branch by typing <code>git branch new_feature</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_branch_new_feature.png" alt="Creating a new branch with " width="600" height="400" loading="lazy">
<em>Creating a new branch with <code>git branch</code></em></p>
<p>And switch <code>HEAD</code> to point to this new branch, by using <code>git checkout new_feature</code> (or <code>git switch new_feature</code>). You can look at the outcome by using git log:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_checkout_new_feature.png" alt="The output of  after using " width="600" height="400" loading="lazy">
_The output of <code>git log</code> after using <code>git checkout new_feature</code>_</p>
<p>As a reminder, you could also write <code>git checkout -b new_feature</code>, which would both create a new branch and change <code>HEAD</code> to point to this new branch.</p>
<p>If you need a reminder about branches and how they're implemented under the hood, please check out <a class="post-section-overview" href="#heading-chapter-2-branches-in-git">chapter 2</a>. Yes, check out. Pun intended 😇</p>
<p>Now, on the <code>new_feature</code> branch, implement a new feature. In this example, I will edit an existing file that looks like this before the edit:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/code_py_before_changes.png" alt=" before editing it" width="600" height="400" loading="lazy">
<em><code>code.py</code> before editing it</em></p>
<p>And I will now edit it to include a new function:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/code_py_new_feature.png" alt="Implementing " width="600" height="400" loading="lazy">
_Implementing <code>new_feature</code>_</p>
<p>And luckily, this is not a programming book, so this function is legit 😇</p>
<p>Next, stage and commit this change:</p>
<pre><code class="lang-bash">git add code.py

git commit -m <span class="hljs-string">"Commit 4"</span>
</code></pre>
<p>Looking at the history, you have the <code>branch new_feature</code>, now pointing to "Commit 4", which points to its parent, "Commit 3". The branch main is also pointing to "Commit 3".</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commits_1_4.png" alt="The history after committing &quot;Commit 4&quot;" width="600" height="400" loading="lazy">
<em>The history after committing "Commit 4"</em></p>
<p>Time to merge the new feature! That is, merge these two branches, <code>main</code> and <code>new_feature</code>. Or, in Git's lingo, merge <code>new_feature</code> <em>into</em> <code>main</code>. This means merging "Commit 4" and "Commit 3". This is pretty trivial, as after all, "Commit 3" is an ancestor of "Commit 4".</p>
<p>Check out the main branch (with <code>git checkout main</code>), and perform the merge by using <code>git merge new_feature</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_merge_new_feature.png" alt="Merging  into " width="600" height="400" loading="lazy">
_Merging <code>new_feature</code> into <code>main</code>_</p>
<p>Since <code>new_feature</code> never really diverged from main, Git could just perform a fast-forward merge. So what happened here? Consider the history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_ff_merge.png" alt="The result of a fast-forward merge" width="600" height="400" loading="lazy">
<em>The result of a fast-forward merge</em></p>
<p>Even though you used <code>git merge</code>, there was no actual merging here. Actually, Git did something very simple - it <code>reset</code> the main branch to point to the same commit as the branch <code>new_feature</code>.</p>
<p>In case you don't want that to happen, but rather you want Git to really perform a merge, you could either change Git's configuration, or run the merge command with the <code>--no-ff</code> flag.</p>
<p>First, undo the last commit:</p>
<pre><code class="lang-bash">git reset --hard HEAD~1
</code></pre>
<p>Reminder: if this way of using reset is not clear to you, don't worry - we will cover it in detail in Part 3. It is not crucial for this introduction of merge, though. For now, it's important to understand that it basically undoes the merge operation.</p>
<p>Just to clarify, now if you checked out <code>new_feature</code> again:</p>
<pre><code class="lang-bash">git checkout new_feature
</code></pre>
<p>The history would look just like before the merge:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_reset_after_merge.png" alt="The history after using " width="600" height="400" loading="lazy">
<em>The history after using <code>git reset --hard HEAD~1</code></em></p>
<p>Next, perform the merge with the <code>--no-fast-forward</code> flag (<code>--no-ff</code> for short):</p>
<pre><code class="lang-bash">git checkout main
git merge new_feature --no-ff
</code></pre>
<p>Now, if we look at the history using <code>git lol</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_lol_1.png" alt="History after merging with the  flag" width="600" height="400" loading="lazy">
<em>History after merging with the <code>--no-ff</code> flag</em></p>
<p>(Reminder: <code>git lol</code> is an alias I added to Git to visibly see the history in a graphical manner. You can find it, along with the other components of my setup, at the <a class="post-section-overview" href="#heading-my-setup">My Setup</a> part of the <a class="post-section-overview" href="#heading-introduction">Introduction</a> chapter.)</p>
<p>Considering this history, you can see Git created a new commit, a merge commit.</p>
<p>If you consider this commit a bit closer:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span> -n1
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_after_lol_1.png" alt="The merge commit has two parents" width="600" height="400" loading="lazy">
<em>The merge commit has two parents</em></p>
<p>You will see that this commit actually has two parents - "Commit 4", which was the commit that <code>new_feature</code> pointed to when you ran <code>git merge</code>, and "Commit 3", which was the commit that <code>main</code> pointed to.</p>
<p><strong>A merge commit has two parents: the two commits it merged.</strong></p>
<p>The merge commit shows us the concept of merge quite well. Git takes two commits, usually referenced by two different branches, and merges them together.</p>
<p>After the merge, as you started the process from <code>main</code>, you are still on <code>main</code>, and the history from <code>new_feature</code> has been <em>merged</em> into this branch. Since you started with <code>main</code>, then "Commit 3", which <code>main</code> pointed to, is the first parent of the merge commit, whereas "Commit 4", which you merged into <code>main</code>, is the second parent of the merge commit.</p>
<p>Notice that you started on <code>main</code> when it pointed to "Commit 3", and Git went quite a long way for you. It changed the working tree, the index, and also <code>HEAD</code> and created a new commit object. At least when you use <code>git merge</code> without the <code>--no-commit</code> flag and when it's not a fast-forward merge, Git does all of that.</p>
<p>This was a super simple case, where the branches you merged didn't diverge at all. We will soon consider more interesting cases.</p>
<p>By the way, you can use <code>git merge</code> to merge more than two commits - actually, any number of commits. This is rarely done, and to adhere to the practicality principle of this book, I won't delve into it.</p>
<p>Another way to think of <code>git merge</code> is by joining two or more development histories together. That is, when you merge, you incorporate changes from the named commits, since the time their histories diverged <em>from</em> the current branch, <em>into</em> the current branch. I used the term "branch" here, but I am stressing this again - <strong>we are actually merging commits</strong>.</p>
<h3 id="heading-time-for-a-more-advanced-case">Time For a More Advanced Case</h3>
<p>Time to consider a more advanced case, which is probably the most common case where we use <code>git merge</code> explicitly - where you need to merge branches that did diverge from one another.</p>
<p>Assume we have two people working on this repo now, John and Paul.</p>
<p>John created a branch:</p>
<pre><code class="lang-bash">git checkout -b john_branch
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/create_john_branch.png" alt="A new branch, " width="600" height="400" loading="lazy">
_A new branch, <code>john_branch</code>_</p>
<p>And John has written a new song in a new file, <code>lucy_in_the_sky_with_diamonds.md</code>. Well, I believe John Lennon didn't really write in Markdown format, or use Git for that matter, but let's pretend he did for this explanation.</p>
<pre><code class="lang-bash">git add lucy_in_the_sky_with_diamonds.md
git commit -m <span class="hljs-string">"Commit 5"</span>
</code></pre>
<p>While John was working on this song, Paul was also writing, on another branch. Paul had started from main:</p>
<pre><code class="lang-bash">git checkout main
</code></pre>
<p>And created his own branch:</p>
<pre><code class="lang-bash">git checkout -b paul_branch
</code></pre>
<p>And Paul wrote his song into a file called <code>penny_lane.md</code>. Paul staged and committed this file:</p>
<pre><code class="lang-bash">git add penny_lane.md
git commit -m <span class="hljs-string">"Commit 6"</span>
</code></pre>
<p>So now our history looks like this - where we have two different branches, branching out from <code>main</code>, with different histories:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_commit_6.png" alt="The history after John and Paul committed" width="600" height="400" loading="lazy">
<em>The history after John and Paul committed</em></p>
<p>John is happy with his branch (that is, his song), so he decides to merge it into the <code>main</code> branch:</p>
<pre><code class="lang-bash">git checkout main
git merge john_branch
</code></pre>
<p>Actually, this is a fast-forward merge, as we have learned before. You can validate that by looking at the history (using <code>git lol</code>, for example):</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/merge_after_commit_6.png" alt="Merging  into  results in a fast-forward merge" width="600" height="400" loading="lazy">
_Merging <code>john_branch</code> into <code>main</code> results in a fast-forward merge_</p>
<p>At this point, Paul also wants to merge his branch into <code>main</code>, but now a fast-forward merge is no longer relevant - there are two different histories here: the history of <code>main</code>'s and that of <code>paul_branch</code>'s. It's not that <code>paul_branch</code> only adds commits on top of main branch or vice versa.</p>
<p>Now things get interesting. 😎😎</p>
<p>First, let Git do the hard work for you. After that, we will understand what's actually happening under the hood.</p>
<pre><code class="lang-bash">git merge paul_branch
</code></pre>
<p>Consider the history now:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/merge_after_commit_6_paul_branch.png" alt="When you merge , you get a new merge commit\label{fig-history-after-git-merge}" width="600" height="400" loading="lazy">
_When you merge <code>paul_branch</code>, you get a new merge commit_</p>
<p>What you have is a new commit, with two parents - "Commit 5" and "Commit 6".</p>
<p>In the working dir, you can see that both John's song as well as Paul's song are there (if you use <code>ls</code>, you will see both files in the working dir).</p>
<p>Nice, Git really did merge the changes for you. But how does that happen?</p>
<p>Undo this last commit:</p>
<pre><code class="lang-bash">git reset --hard HEAD~
</code></pre>
<h3 id="heading-how-to-perform-a-three-way-merge-in-git">How to Perform a Three-way Merge in Git</h3>
<p>It's time to understand what's really happening under the hood. 😎</p>
<p>What Git has done here is it called a <strong>3-way merge</strong>. In outlining the process of a 3-way merge, I will use the term "branch" for simplicity, but you should remember you could also merge two (or more) commits that are not referenced by a branch.</p>
<p>The 3-way merge process includes these stages:</p>
<p>First, Git locates the common ancestor of the two branches. That is, the common commit from which the merging branches most recently diverged. Technically, this is actually the first commit that is reachable from both branches. This commit is then called the merge base.</p>
<p>Second, Git calculates two diffs - one diff from the merge base to the first branch, and another diff from the merge base to the second branch. Git generates patches based on those diffs.</p>
<p>Third, Git applies both patches to the merge base using a 3-way merge algorithm. The result is the state of the new merge commit.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/3_way_merge.png" alt="The three steps of the 3-way merge algorithm: (1) locate the common ancestor; (2) calculate diffs from the merge base to the first branch, and from the merge base to the second branch; (3) apply both patches together" width="600" height="400" loading="lazy">
<em>The three steps of the 3-way merge algorithm: (1) locate the common ancestor (2) calculate diffs from the merge base to the first branch, and from the merge base to the second branch (3) apply both patches together</em></p>
<p>So, back to our example.</p>
<p>In the first step, Git looks from both branches - <code>main</code> and <code>paul_branch</code> - and traverses the history to find the first commit that is reachable from both. In this case, this would be… which commit?</p>
<p>Correct, the merge commit (the one with "Commit 3" and "Commit 4" as its parents).</p>
<p>If you are not sure, you can always ask Git directly:</p>
<pre><code class="lang-bash">git merge-base main paul_branch
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/3_way_merge_base.png" alt="The merge base is the merge commit with &quot;Commit 3&quot; and &quot;Commit 4&quot; as its parents. Note: the previous commit merge is blurred as it is not reachable via the current history following the  command" width="600" height="400" loading="lazy">
<em>The merge base is the merge commit with "Commit 3" and "Commit 4" as its parents. Note: the previous commit merge is blurred as it is not reachable via the current history following the <code>reset</code> command</em></p>
<p>By the way, this is the most common and simple case, where we have a single obvious choice for the merge base. In more complicated cases, there may be multiple possibilities for a merge base, but this is not within our focus.</p>
<p>In the second step, Git calculates the diffs. So it first calculates the diff between the merge commit and "Commit 5":</p>
<pre><code class="lang-bash">git diff 4f90a62 4683aef
</code></pre>
<p>(The SHA-1 values will be different on your machine.)</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/diff_4_5.png" alt="The diff between the merge commit and &quot;Commit 5&quot;\label{fig-john-patch}" width="600" height="400" loading="lazy">
<em>The diff between the merge commit and "Commit 5"</em></p>
<p>If you don't feel comfortable with the output of <code>git diff</code>, you can read the previous chapter where I described it in detail.</p>
<p>You can store that diff to a file:</p>
<pre><code class="lang-bash">git diff 4f90a62 4683aef &gt; john_branch_diff.patch
</code></pre>
<p>Next, Git calculates the diff between the merge commit and "Commit 6":</p>
<pre><code class="lang-bash">git diff 4f90a62 c5e4951
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/diff_4_6.png" alt="The diff between the merge commit and &quot;Commit 6&quot;" width="600" height="400" loading="lazy">
<em>The diff between the merge commit and "Commit 6"</em></p>
<p>Write this one to a file as well:</p>
<pre><code class="lang-bash">git diff 4f90a62 c5e4951 &gt; paul_branch_diff.patch
</code></pre>
<p>Now Git applies those patches on the merge base.</p>
<p>First, try that out directly - just apply the patches (I will walk you through it in a moment). This is not what Git really does under the hood, but it will help you gain a better understanding of why Git needs to do something different.</p>
<p>Checkout the merge base first, that is, the merge commit:</p>
<pre><code class="lang-bash">git checkout 4f90a62
</code></pre>
<p>And apply John's patch first (as a reminder, this is the patch shown in the image with the caption "The diff between the merge commit and "Commit 5""):</p>
<pre><code class="lang-bash">git apply --index john_branch_diff.patch
</code></pre>
<p>Notice that for now there is no merge commit. <code>git apply</code> updates the working dir as well as the index, as we used the <code>--index</code> switch.</p>
<p>You can observe the status using <code>git status</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_status_apply_john.png" alt="Applying John's patch on the merge commit" width="600" height="400" loading="lazy">
<em>Applying John's patch on the merge commit</em></p>
<p>So now John's new song is incorporated into the index. Apply the other patch:</p>
<pre><code class="lang-bash">git apply --index paul_branch_diff.patch
</code></pre>
<p>As a result, the index contains changes from both branches.</p>
<p>Now it's time to commit your merge. Since the porcelain command <code>git commit</code> always generates a commit with a single parent, you would need the underlying plumbing command - <code>git commit-tree</code>.</p>
<p>If you need a reminder about porcelain vs plumbing commands, check out <a class="post-section-overview" href="#heading-chapter-4-how-to-create-a-repo-from-scratch">chapter 4</a> where I explained these terms, and created an entire repo from scratch.</p>
<p>Remember that every Git commit object points to a single tree. So you need to record the contents of the index in a tree:</p>
<pre><code class="lang-bash">git write-tree
</code></pre>
<p>Now you get the SHA-1 value of the created tree, and you can create a commit object using <code>git commit-tree</code>:</p>
<pre><code class="lang-bash">git commit-tree &lt;TREE_SHA&gt; -p &lt;COMMIT_5&gt; -p &lt;COMMIT_6&gt; -m <span class="hljs-string">"Merge commit!"</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/create_merge_commit.png" alt="Creating a merge commit" width="600" height="400" loading="lazy">
<em>Creating a merge commit</em></p>
<p>Great, so you have created a commit object!</p>
<p>Recall that <code>git merge</code> also changes <code>HEAD</code> to point to the new merge commit object. So you can simply do the same:</p>
<pre><code class="lang-bash">git reset --hard db315a
</code></pre>
<p>If you look at the history now:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_reset_to_merge_commit_git_lol.png" alt="The history after creating a merge commit and resetting " width="600" height="400" loading="lazy">
<em>The history after creating a merge commit and resetting <code>HEAD</code></em></p>
<p>(Note: in this state, <code>HEAD</code> is "detached" - that is, it directly points to a commit object rather than a named reference. <code>gg</code> does not show <code>HEAD</code> when it is "detached", so don't be confused if you can't see <code>HEAD</code> in the output of <code>gg</code>.)</p>
<p>This is almost what we wanted. Remember that when you ran <code>git merge</code>, the result was <code>HEAD</code> pointing to <code>main</code> which pointed to the newly created commit (as shown in the image with the caption "When you merge <code>paul_branch</code>, you get a new merge commit". What should you do then?</p>
<p>Well, what you want is to modify <code>main</code>, so you can just point it to the new commit:</p>
<pre><code class="lang-bash">git checkout main
git reset --hard db315a
</code></pre>
<p>And now you have the same result as when you ran <code>git merge</code>: <code>main</code> points to the new commit, which has "Commit 5" and "Commit 6" as its parents. You can use <code>git lol</code> to verify that.</p>
<p>So this is exactly the same result as the merge done by Git, with the exception of the timestamp and thus the SHA-1 value, of course.</p>
<p>Overall, you got to merge both the contents of the two commits - that is, the state of the files, and also the history of those commits - by creating a merge commit that points to both histories.</p>
<p>In this simple case, you could actually just apply the patches using <code>git apply</code>, and everything works quite well.</p>
<h3 id="heading-quick-recap-of-a-three-way-merge">Quick Recap of a Three-way Merge</h3>
<p>So to quickly recap, on a three-way merge, Git:</p>
<ul>
<li>First, locates the merge base - the common ancestor of the two branches. That is, the first commit that is reachable from both branches.</li>
<li>Second, Git calculates two diffs - one diff from the merge base to the first branch, and another diff from the merge base to the second branch.</li>
<li>Third, Git applies both patches to the merge base, using a 3-way merge algorithm. I haven't explained the 3-way merge yet, but I will elaborate on that later. The result is the state of the new merge commit.</li>
</ul>
<p>You can also understand why it's called a "3-way merge": Git merges three different states - that of the first branch, that of the second branch, and their common ancestor. In our previous example, <code>main</code>, <code>paul_branch</code>, and the merge commit (with "Commit 3" and "Commit 4" as parents), respectively.</p>
<p>This is unlike, say, the fast-forward examples we saw before. The fast-forward examples are actually a case of a two-way merge, as Git only compares two states - for example, where <code>main</code> pointed to, and where <code>john_branch</code> pointed to.</p>
<h3 id="heading-moving-on">Moving on</h3>
<p>Still, this was a simple case of a 3-way merge. John and Paul created different songs, so each of them touched a different file. It was pretty straightforward to execute the merge.</p>
<p>What about more interesting cases?</p>
<p>Let's assume that now John and Paul are co-authoring a new song.</p>
<p>So, John checked out <code>main</code> branch and started writing the song:</p>
<pre><code class="lang-bash">git checkout main
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/a_day_in_the_life_md.png" alt="John's new song" width="600" height="400" loading="lazy">
<em>John's new song</em></p>
<p>He staged and committed it ("Commit 7"):</p>
<pre><code class="lang-bash">git add a_day_in_the_life.md
git commit -m <span class="hljs-string">"Commit 7"</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_commit_7.png" alt="John's new song is committed" width="600" height="400" loading="lazy">
<em>John's new song is committed</em></p>
<p>Now, Paul branches:</p>
<pre><code class="lang-bash">git checkout -b paul_branch_2
</code></pre>
<p>And edits the song, adding another verse:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/a_day_in_the_life_paul_verse.png" alt="Paul added a new verse" width="600" height="400" loading="lazy">
<em>Paul added a new verse</em></p>
<p>Of course, the original song does not include the title "Paul's Verse", but I added it here for clarity.</p>
<p>Paul stages and commits the changes:</p>
<pre><code class="lang-bash">git add a_day_in_the_life.md
git commit -m <span class="hljs-string">"Commit 8"</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_commit_8.png" alt="The history after introducing &quot;Commit 8&quot;" width="600" height="400" loading="lazy">
<em>The history after introducing "Commit 8"</em></p>
<p>John also branches out from main and adds an additional two lines at the end:</p>
<pre><code class="lang-bash">git checkout main
git checkout -b john_branch_2
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/a_day_in_the_life_john_addition.png" alt="John added the two last lines" width="600" height="400" loading="lazy">
<em>John added the two last lines</em></p>
<p>John stages and commits his changes too ("Commit 9"):</p>
<pre><code class="lang-bash">git add a_day_in_the_life.md
git commit -m <span class="hljs-string">"Commit 9"</span>
</code></pre>
<p>This is the resulting history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_commit_9.png" alt="The history after John's last commit" width="600" height="400" loading="lazy">
<em>The history after John's last commit</em></p>
<p>So, both Paul and John modified the same file on different branches. Will Git be successful in merging them?</p>
<p>Say now we don't go through <code>main</code>, but John will try to merge Paul's new branch into his branch:</p>
<pre><code class="lang-bash">git merge paul_branch_2
</code></pre>
<p>Wait! Don't run this command! Why would you let Git do all the hard work? You are trying to understand the process here.</p>
<p>So, first, Git needs to find the merge base. Can you see which commit that would be?</p>
<p>Correct, it would be the last commit on the <code>main</code> branch, where the two diverged - that is, "Commit 7".</p>
<p>You can verify that by using:</p>
<pre><code class="lang-bash">git merge-base john_branch_2 paul_branch_2
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/merge_base_2.png" alt="&quot;Commit 7&quot; is the merge base" width="600" height="400" loading="lazy">
<em>"Commit 7" is the merge base</em></p>
<p>Checkout the merge base so you can later apply the patches you will create:</p>
<pre><code class="lang-bash">git checkout main
</code></pre>
<p>Great, now Git should compute the diffs and generate the patches. You can observe the diffs directly:</p>
<pre><code class="lang-bash">git diff main paul_branch_2
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/diff_main_paul_branch_2.png" alt="The output of " width="600" height="400" loading="lazy">
_The output of <code>git diff main paul_branch_2</code>_</p>
<p>Will applying this patch succeed? Well, no problem, Git has all the context lines in place.</p>
<p>Switch to the merge-base (which is "Commit 7", also referenced by <code>main</code>), and ask Git to apply this patch:</p>
<pre><code class="lang-bash">git checkout main
git diff main paul_branch_2 &gt; paul_branch_2.patch
git apply --index paul_branch_2.patch
</code></pre>
<p>And this worked, no problem at all.</p>
<p>Now, compute the diff between John's new branch and the merge base. Notice that you haven't committed the applied changes, so <code>john_branch_2</code> still points at the same commit as before, "Commit 9":</p>
<pre><code class="lang-bash">git diff main john_branch_2
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/diff_main_john_branch_2.png" alt="The output of " width="600" height="400" loading="lazy">
_The output of <code>git diff main john_branch_2</code>_</p>
<p>Will applying this diff work?</p>
<p>Well, indeed, yes. Notice that even though the line numbers have changed on the current version of the file, thanks to the context lines Git is able to locate where it needs to add these lines…</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/diff_main_john_branch_2_context.png" alt="Git can rely on the context lines" width="600" height="400" loading="lazy">
<em>Git can rely on the context lines</em></p>
<p>Save this patch and apply it then:</p>
<pre><code class="lang-bash">git diff main john_branch_2 &gt; john_branch_2.patch
git apply --index john_branch_2.patch
</code></pre>
<p>Observe the result file:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/a_day_in_the_life_after_merge.png" alt="The result after applying Paul's patch" width="600" height="400" loading="lazy">
<em>The result after applying Paul's patch</em></p>
<p>Cool, exactly what we wanted.</p>
<p>You can now create the tree and relevant commit:</p>
<pre><code class="lang-bash">git write-tree
</code></pre>
<p>Don't forget to specify both parents:</p>
<pre><code class="lang-bash">git commit-tree &lt;TREE-ID&gt; -p paul_branch_2 -p john_branch_2 -m <span class="hljs-string">"Merging new changes"</span>
</code></pre>
<p>See how I used the branch names here? After all, they are just pointers to the commits we want.</p>
<p>Cool, look at the log from the new commit:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_lol_merging_new_changes.png" alt=" after creating the merge commit" width="600" height="400" loading="lazy">
_<code>git lol &amp;lt;SHA_OF_THE_MERGE_COMMIT&amp;gt;</code> after creating the merge commit_</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_merging_new_changes_commit.png" alt="The history after creating the merge commit" width="600" height="400" loading="lazy">
<em>The history after creating the merge commit</em></p>
<p>Exactly what we wanted.</p>
<p>You can also let Git perform the job for you. You can checkout <code>john_branch_2</code>, which you haven't moved - so it still points to the same commit as it did before the merge. So all you need to do is run:</p>
<pre><code class="lang-bash">git checkout john_branch_2
git merge paul_branch_2
</code></pre>
<p>Observe the resulting history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/merge_branches_2.png" alt=" after letting Git perform the merge" width="600" height="400" loading="lazy">
<em><code>git lol</code> after letting Git perform the merge</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_merging_with_git.png" alt="A visualization of the history after letting Git perform the merge" width="600" height="400" loading="lazy">
<em>A visualization of the history after letting Git perform the merge</em></p>
<p>Just as before, you have a merge commit pointing to "Commit 8" and "Commit 9" as its parents. "Commit 9" is the first parent since you merged into it.</p>
<p>But this was still quite simple… John and Paul worked on the same file, but on very different parts. You could also directly apply Paul's changes to John's branch. If you go back to John's branch before the merge:</p>
<pre><code class="lang-bash">git reset --hard HEAD~
</code></pre>
<p>And now apply Paul's changes:</p>
<pre><code class="lang-bash">git apply --index paul_branch_2.patch
</code></pre>
<p>You will get the same result.</p>
<p>But what happens when the two branches include changes on the same files, in the same locations?</p>
<h3 id="heading-more-advanced-git-merge-cases">More Advanced Git Merge Cases</h3>
<p>What would happen if John and Paul were to coordinate a new song, and work on it together?</p>
<p>In this case, John creates the first version of this song in the main branch:</p>
<pre><code class="lang-bash">git checkout main
nano everyone.md
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/everyone_1.png" alt="The contents of  prior to the first commit" width="600" height="400" loading="lazy">
<em>The contents of <code>everyone.md</code> prior to the first commit</em></p>
<p>By the way, this text is indeed taken from the version that John Lennon recorded for a demo in 1968. But this isn't a book about the Beatles. If you're curious about the process the Beatles underwent while writing this song, you can follow the links in the end of this chapter.</p>
<pre><code class="lang-bash">git add everyone.md
git commit -m <span class="hljs-string">"Commit 10"</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_commit_10.png" alt="Introducing &quot;Commit 10&quot;" width="600" height="400" loading="lazy">
<em>Introducing "Commit 10"</em></p>
<p>Now John and Paul split. Paul creates a new verse in the beginning:</p>
<pre><code class="lang-bash">git checkout -b paul_branch_3
nano everyone.md
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/everyone_2.png" alt="Paul added a new verse in the beginning" width="600" height="400" loading="lazy">
<em>Paul added a new verse in the beginning</em></p>
<p>Also, while talking to John, they decided to change the word "feet" to "foot", so Paul adds this change as well.</p>
<p>And Paul adds and commits his changes to the repo:</p>
<pre><code class="lang-bash">git add everyone.md
git commit -m <span class="hljs-string">"Commit 11"</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_commit_11.png" alt="The history after introducing &quot;Commit 11&quot;" width="600" height="400" loading="lazy">
<em>The history after introducing "Commit 11"</em></p>
<p>You can observe Paul's changes, by comparing this branch's state to the state of branch <code>main</code>:</p>
<pre><code class="lang-bash">git diff main
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_main.png" alt="The output of  from Paul's branch" width="600" height="400" loading="lazy">
<em>The output of <code>git diff main</code> from Paul's branch</em></p>
<p>Store this diff in a patch file:</p>
<pre><code class="lang-bash">git diff main &gt; paul_3.patch
</code></pre>
<p>Now back to <code>main</code>…</p>
<pre><code class="lang-bash">git checkout main
</code></pre>
<p>John decides to make another change, in his own new branch:</p>
<pre><code class="lang-bash">git checkout -b john_branch_3
</code></pre>
<p>And he replaces the line "Everyone had the boot in" with the line "Everyone had a wet dream". In addition, John changed the word "feet" to "foot", following his talk with Paul.</p>
<p>Observe the diff:</p>
<pre><code class="lang-bash">git diff main
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_main_2.png" alt="The output of  from John's branch" width="600" height="400" loading="lazy">
<em>The output of <code>git diff main</code> from John's branch</em></p>
<p>Store this output as well:</p>
<pre><code class="lang-bash">git diff main &gt; john_3.patch
</code></pre>
<p>Now, stage and commit:</p>
<pre><code class="lang-bash">git add everyone.md
git commit -m <span class="hljs-string">"Commit 12"</span>
</code></pre>
<p>This should be your current history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_commit_12.png" alt="The history after introducing &quot;Commit 12&quot;" width="600" height="400" loading="lazy">
<em>The history after introducing "Commit 12"</em></p>
<p>Note that I deleted <code>john_branch_2</code> and <code>paul_branch_2</code> for simplicity. Of course, you can erase them from Git by using <code>git branch -D &lt;branch_name&gt;</code>. As a result, these branch names will not appear in the output of <code>git log</code> or other similar commands.</p>
<p>This also applies to commits that are no longer reachable from any named reference, such as "Commit 8" or "Commit 9". Since they are not reachable from any named reference via the parents' chain, they will not be included in the output of commands such as <code>git log</code>.</p>
<p>Back to our story - Paul told John he had added a new verse, so John would like to merge Paul's changes.</p>
<p>Can John simply apply Paul's patch?</p>
<p>Consider the patch again:</p>
<pre><code class="lang-bash">git diff main paul_branch_3
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_main-1.png" alt="The output of " width="600" height="400" loading="lazy">
_The output of <code>git diff main paul_branch_3</code>_</p>
<p>As you can see, this diff relies on the line "Everyone had the boot in", but this line no longer exists on John's branch. As a result, you could expect applying the patch to fail. Go on, give it a try:</p>
<pre><code class="lang-bash">git apply paul_3.patch
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_apply_paul_3.png" alt="Applying the patch failed" width="600" height="400" loading="lazy">
<em>Applying the patch failed</em></p>
<p>Indeed, you can see that it failed.</p>
<p>But should it really fail?</p>
<p>As explained earlier, <code>git merge</code> uses a 3-way merge algorithm, and this can come in handy here. What would be the first step of this algorithm?</p>
<p>Well, first, Git would find the merge base - that is, the common ancestor of Paul's branch and John's branch. Consider the history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_commit_12-1.png" alt="The history after introducing &quot;Commit 12&quot;" width="600" height="400" loading="lazy">
<em>The history after introducing "Commit 12"</em></p>
<p>So the common ancestor of "Commit 11" and "Commit 12" is "Commit 10". You can verify this by running the command:</p>
<pre><code class="lang-bash">git merge-base john_branch_3 paul_branch_3
</code></pre>
<p>Now we can take the patches we generated from the diffs on both branches, and apply them to <code>main</code>. Would that work?</p>
<p>First, try to apply John's patch, and then Paul's patch.</p>
<p>Consider the diff:</p>
<pre><code class="lang-bash">git diff main john_branch_3
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_main_2-1.png" alt="The output of " width="600" height="400" loading="lazy">
_The output of <code>git diff main john_branch_3</code>_</p>
<p>We can store it in a file:</p>
<pre><code class="lang-bash">git diff main john_branch_3 &gt; john_3.patch
</code></pre>
<p>And apply this patch on main:</p>
<pre><code class="lang-bash">git checkout main
git apply john_3.patch
</code></pre>
<p>Let's consider the result:</p>
<pre><code class="lang-bash">nano everyone.md
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/everyone_3.png" alt="The contents of  after applying John's patch" width="600" height="400" loading="lazy">
<em>The contents of <code>everyone.md</code> after applying John's patch</em></p>
<p>The line changed as expected. Nice 😎</p>
<p>Now, can Git apply Paul's patch? To remind you, this is the patch:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_main-2.png" alt="The contents of Paul's patch" width="600" height="400" loading="lazy">
<em>The contents of Paul's patch</em></p>
<p>Well, Git cannot apply this patch, because this patch assumes that the line "Everyone had the boot in" exists. Trying to apply it is liable to fail:</p>
<pre><code class="lang-bash">git apply -v paul_3.branch
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_apply_v_paul_3.png" alt="Applying Paul's patch failed" width="600" height="400" loading="lazy">
<em>Applying Paul's patch failed</em></p>
<p>What you tried to do now, applying Paul's patch on the <code>main</code> branch after applying John's patch, is the same as being on <code>john_branch_3</code>, and attempting to apply the patch. That is, running:</p>
<pre><code class="lang-bash">git apply paul_3.patch
</code></pre>
<p>What would happen if we tried the other way around?</p>
<p>First, clean up the state:</p>
<pre><code class="lang-bash">git reset --hard
</code></pre>
<p>And start from Paul's branch:</p>
<pre><code class="lang-bash">git checkout paul_branch_3
</code></pre>
<p>Can we apply John's patch? As a reminder, this is the status of <code>everyone.md</code> on this branch:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/everyone_2-1.png" alt="The contents of  on " width="600" height="400" loading="lazy">
_The contents of <code>everyone.md</code> on <code>paul_branch_3</code>_</p>
<p>And this is John's patch:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_main_2-2.png" alt="The contents of John's patch" width="600" height="400" loading="lazy">
<em>The contents of John's patch</em></p>
<p>Would applying John's patch work?</p>
<p>Try to answer yourself before reading on.</p>
<p>You can try:</p>
<pre><code class="lang-bash">git apply john_3.patch
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_apply_3_john_3.png" alt="Git fails to apply John's patch" width="600" height="400" loading="lazy">
<em>Git fails to apply John's patch</em></p>
<p>Well, no! Again, if you are not sure what happened, you can always ask <code>git apply</code> to be a bit more verbose:</p>
<pre><code class="lang-bash">git apply -v john_3.patch
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_apply_v_john_3.png" alt="You can get more information by using the  flag" width="600" height="400" loading="lazy">
<em>You can get more information by using the <code>-v</code> flag</em></p>
<p>Git is looking for "Everyone put the feet down", but Paul has already changed this line so it now consists of the word "foot" instead of "feet". As a result, applying this patch fails.</p>
<p>Notice that changing the number of context lines here (that is, using <code>git apply</code> with the <code>-C</code> flag, as discussed in the <a class="post-section-overview" href="#heading-chapter-6-diffs-and-patches">previous chapter</a>) is irrelevant - Git is unable to locate the actual line that the patch is trying to erase.</p>
<p>But actually, Git can make this work, if you just add a flag to apply, telling it to perform a 3-way merge under the hood:</p>
<pre><code class="lang-bash">git apply -3 john_3.patch
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_apply_3_john_3-1.png" alt="Applying with  flag succeeds" width="600" height="400" loading="lazy">
<em>Applying with <code>-3</code> flag succeeds</em></p>
<p>And consider the result:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/everyone_4.png" alt="The contents of  after the merge" width="600" height="400" loading="lazy">
<em>The contents of <code>everyone.md</code> after the merge</em></p>
<p>Exactly what we wanted! You have Paul's verse, and both of John's changes!</p>
<p>So, how was Git able to accomplish that?</p>
<p>Well, as I mentioned, Git really did a <strong>3-way merge</strong>, and with this example, it will be a good time to dive into what this actually means.</p>
<h3 id="heading-how-gits-3-way-merge-algorithm-works">How Git's 3-way Merge Algorithm Works</h3>
<p>Get back to the state before applying this patch:</p>
<pre><code class="lang-bash">git reset --hard
</code></pre>
<p>You have now three versions: the merge base, which is "Commit 10", Paul's branch, and John's branch. In general terms, we can say these are the <code>merge base</code>, <code>commit A</code> and <code>commit B</code>. Notice that the <code>merge base</code> is by definition an ancestor of both <code>commit A</code> and <code>commit B</code>.</p>
<p>To perform the merge, Git looks at the diff between the three different versions of the file in question on these three revisions. In your case, it's the file everyone.md, and the revisions are "Commit 10", Paul's branch - that is, "Commit 11", and John's branch, that is, "Commit 12".</p>
<p>Git makes the merging decision based on the status of each line in each of these versions.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/3_versions.png" alt="The three versions considered for the 3-way merge" width="600" height="400" loading="lazy">
<em>The three versions considered for the 3-way merge</em></p>
<p>In case not all three versions match, that is a conflict. Git can resolve many of these conflicts automatically, as we will now see.</p>
<p>Let's consider specific lines.</p>
<p>The first lines here exist only on Paul's branch:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/3_versions_1.png" alt="Lines that appear on Paul's branch only" width="600" height="400" loading="lazy">
<em>Lines that appear on Paul's branch only</em></p>
<p>This means that the state of John's branch is equal to the state of the merge base. So the 3-way merge goes with Paul's version.</p>
<p>In general, if the state of the merge base is the same as <code>A</code>, the algorithm goes with <code>B</code>. The reason is that since the merge base is the ancestor of both <code>A</code> and <code>B</code>, Git assumes that this line hasn't changed in <code>A</code>, and it <em>has</em> changed in <code>B</code>, which is the most recent version for that line, and should thus be taken into account.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/3_way_merge_1.png" alt="If the state of the merge base is the same as , and this state is different from , the algorithm goes with " width="600" height="400" loading="lazy">
<em>If the state of the merge base is the same as <code>A</code>, and this state is different from <code>B</code>, the algorithm goes with <code>B</code></em></p>
<p>Next, you can see lines where all three versions agree - they exist on the merge base, <code>A</code> and <code>B</code>, with equal data.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/3_versions_2.png" alt="Lines where all three versions agree" width="600" height="400" loading="lazy">
<em>Lines where all three versions agree</em></p>
<p>In this case the algorithm has a trivial choice - just take that version.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/3_way_merge_2.png" alt="In case all three versions agree, the algorithm goes with that single version" width="600" height="400" loading="lazy">
<em>In case all three versions agree, the algorithm goes with that single version</em></p>
<p>In a previous example, we saw that if the merge base and <code>A</code> agree, and <code>B</code>'s version is different, the algorithm picks <code>B</code>. This works in the other direction too - for example, here you have a line that exists on John's branch, different than that on the merge base and Paul's branch.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/3_versions_3.png" alt="A line where Paul's version matches the merge base's version, and John has a different version" width="600" height="400" loading="lazy">
<em>A line where Paul's version matches the merge base's version, and John has a different version</em></p>
<p>Hence, John's version is chosen.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/3_way_merge_3.png" alt="If the state of the merge base is the same as , and this state is different from , the algorithm goes with " width="600" height="400" loading="lazy">
<em>If the state of the merge base is the same as <code>B</code>, and this state is different from <code>A</code>, the algorithm goes with <code>A</code></em></p>
<p>Now consider another case, where both <code>A</code> and <code>B</code> agree on a line, but the value they agree upon is different from the merge base: both John and Paul agreed to change the line "Everyone put their feet down" to "Everyone put their foot down":</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/3_versions_4.png" alt="A line where Paul's version matches John's version; yet the merge base has a different version" width="600" height="400" loading="lazy">
<em>A line where Paul's version matches John's version, yet the merge base has a different version</em></p>
<p>In this case, the algorithm picks the version on both <code>A</code> and <code>B</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/3_way_merge_4.png" alt="In case A and B agree on a version which is different from the merge base's version, the algorithm picks the version on both A and B" width="600" height="400" loading="lazy">
<em>In case <code>A</code> and <code>B</code> agree on a version which is different from the merge base's version, the algorithm picks the version on both <code>A</code> and <code>B</code></em></p>
<p>Notice this is not a democratic vote. In the previous case, the algorithm picked the minority version, as it resembled the newest version of this line. In this case, it happens to pick the majority - but only because <code>A</code> and <code>B</code> are the revisions that agree on the new version.</p>
<p>The same would happen if we used <code>git merge</code>:</p>
<pre><code class="lang-bash">git merge john_branch_3
</code></pre>
<p>Without specifying any flags, <code>git merge</code> will default to using a <code>3-way merge</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_merge_default.png" alt="By default,  uses a 3-way merge algorithm" width="600" height="400" loading="lazy">
<em>By default, <code>git merge</code> uses a 3-way merge algorithm</em></p>
<p>The status of <code>everyone.md</code> after running <code>git merge john_branch</code> would be the same as the result you achieved by applying the patches with <code>git apply -3</code>.</p>
<p>If you consider the history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_merge.png" alt="Git's history after performing the merge" width="600" height="400" loading="lazy">
<em>Git's history after performing the merge</em></p>
<p>You will see that the merge commit indeed has two parents: the first is "Commit 11", that is, where <code>paul_branch_3</code> pointed to before the merge. The second is "Commit 12", where <code>john_branch_3</code> pointed to, and still points to now.</p>
<p>What will happen if you now merge from <code>main</code>? That is, switch to the <code>main</code> branch, which is pointing to "Commit 10":</p>
<pre><code class="lang-bash">git checkout main
</code></pre>
<p>And then merge Paul's branch?</p>
<pre><code class="lang-bash">git merge paul_branch_3
</code></pre>
<p>Indeed, we get a fast-forward merge - as before running this command, <code>main</code> was an ancestor of <code>paul_branch_3</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/fast_forward_merge.png" alt="A fast-forward merge" width="600" height="400" loading="lazy">
<em>A fast-forward merge</em></p>
<p>So, this is a 3-way merge. In general, if all versions agree on a line, then this line is used. If <code>A</code> and the merge base match, and <code>B</code> has another version, <code>B</code> is taken. In the opposite case, where the merge base and <code>B</code> match, the <code>A</code> version is selected. If <code>A</code> and <code>B</code> match, this version is taken, whether the merge base agrees or not.</p>
<p>This description leaves one open question though: What happens in cases where all three versions disagree?</p>
<p>Well, that's a conflict that Git does not resolve automatically. In these cases, Git calls for a human's help.</p>
<h3 id="heading-how-to-resolve-merge-conflicts">How to Resolve Merge Conflicts</h3>
<p>By following so far, you should understand the basics of the command <code>git merge</code>, and how Git can automatically resolve some conflicts. You also understand what cases are automatically resolved.</p>
<p>Next, let's consider a more advanced case.</p>
<p>Say Paul and John keep working on this song.</p>
<p>Paul creates a new branch:</p>
<pre><code class="lang-bash">git checkout -b paul_branch_4
</code></pre>
<p>And he decides to add some "Yeah"s to the song, so he changes this verse as follows:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/paul_branch_4_additions.png" alt="Paul's additions" width="600" height="400" loading="lazy">
<em>Paul's additions</em></p>
<p>So Paul stages and commits these changes:</p>
<pre><code class="lang-bash">git add everyone.md
git commit -m <span class="hljs-string">"Commit 13"</span>
</code></pre>
<p>Paul also creates another song, <code>let_it_be.md</code> and adds it to the repo:</p>
<pre><code class="lang-bash">git add let_it_be.md
git commit -m <span class="hljs-string">"Commit 14"</span>
</code></pre>
<p>This is the history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_commit_14.png" alt="The history after Paul introduced &quot;Commit 14&quot;" width="600" height="400" loading="lazy">
<em>The history after Paul introduced "Commit 14"</em></p>
<p>Going back to <code>main</code>:</p>
<pre><code class="lang-bash">git checkout main
</code></pre>
<p>John also branches out:</p>
<pre><code class="lang-bash">git checkout -b john_branch_4
</code></pre>
<p>And John also works on the song "Everyone had a hard year", later to be called "I've got a feeling" (again, this is not a book about the Beatles, so I won't elaborate on it here. See the additional links if you are curious).</p>
<p>John decides to change all occurrences of "Everyone" to "Everybody":</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/everyone_5.png" alt="John changes all occurrences of &quot;Everyone&quot; to &quot;Everybody&quot;" width="600" height="400" loading="lazy">
<em>John changes all occurrences of "Everyone" to "Everybody"</em></p>
<p>He stages and commits this song to the repo:</p>
<pre><code class="lang-bash">git add everyone.md
git commit -m <span class="hljs-string">"Commit 15"</span>
</code></pre>
<p>Nice. Now John also creates another song, <code>across_the_universe.md</code>. He adds it to the repo as well:</p>
<pre><code class="lang-bash">git add across_the_universe.md
git commit -m <span class="hljs-string">"Commit 16"</span>
</code></pre>
<p>Observe the history again:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_commit_16.png" alt="The history after John introduced &quot;Commit 16&quot;" width="600" height="400" loading="lazy">
<em>The history after John introduced "Commit 16"</em></p>
<p>You can see that the history diverges from <code>main</code>, to two different branches - <code>paul_branch_4</code>, and <code>john_branch_4</code>.</p>
<p>At this point, John would like to merge the changes introduced by Paul.</p>
<p>What is going to happen here?</p>
<p>Remember the changes introduced by Paul:</p>
<pre><code class="lang-bash">git diff main paul_branch_4
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_main_paul_branch_4.png" alt="The output of " width="600" height="400" loading="lazy">
_The output of <code>git diff main paul_branch_4</code>_</p>
<p>What do you think? Will merge work?</p>
<p>Try it out:</p>
<pre><code class="lang-bash">git merge paul_branch_4
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/merge_conflict.png" alt="A merge conflict" width="600" height="400" loading="lazy">
<em>A merge conflict</em></p>
<p>We have a conflict!</p>
<p>Git cannot merge these branches on its own. You can get an overview of the merge state, using <code>git status</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_status_after_merge_failed.png" alt="The output of  right after the merge operation" width="600" height="400" loading="lazy">
<em>The output of <code>git status</code> right after the merge operation</em></p>
<p>The changes that Git had no problem resolving are staged for commit. And there is a separate section for "unmerged paths" - these are files with conflicts that Git could not resolve on its own.</p>
<p>It's time to understand why and when these conflicts happen, how to resolve them, and also how Git handles them under the hood.</p>
<p>Alright then! I hope you are at least as excited as I am. 😇</p>
<p>Let's recall what we know about 3-way merges:</p>
<p>First, Git will look for the merge base - the common ancestor of <code>john_branch_4</code> and <code>paul_branch_4</code>. Which commit would that be?</p>
<p>It would be the tip of the <code>main</code> branch, the commit in which we merged <code>john_branch_3</code> into <code>paul_branch_3</code>.</p>
<p>Again, if you are not sure, you can verify that by running:</p>
<pre><code class="lang-bash">git merge-base john_branch_4 paul_branch_4
</code></pre>
<p>And at the current state, <code>git status</code> knows which files are staged and which aren't.</p>
<p>Consider the process for each <em>file</em>, which is the same as the 3-way merge algorithm we considered per line, but on a file's level:</p>
<p><code>across_the_universe.md</code> exists on John's branch, but doesn't exist on the merge base or on Paul's branch. So Git chooses to include this file. Since you are already on John's branch and this file is included in the tip of this branch, it is not mentioned by <code>git status</code>.</p>
<p><code>let_it_be.md</code> exists on Paul's branch, but doesn't exist on the merge base or John's branch. So <code>git merge</code> "chooses" to include it.</p>
<p>What about <code>everyone.md</code>? Well, here we have three different states of this file: its state on the merge base, its state on John's branch, and its state on Paul's branch. While performing a merge, Git stores all of these versions on the index.</p>
<p>Let's observe that by looking directly at the index with the command <code>git ls-files</code>:</p>
<pre><code class="lang-bash">git ls-files -s --abbrev
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/ls_files_abbrev.png" alt="The output of  after the merge operation" width="600" height="400" loading="lazy">
<em>The output of <code>git ls-files -s --abbrev</code> after the merge operation</em></p>
<p>You can see that <code>everyone.md</code> has three different entries. Git assigns each version a number that represents the "stage" of the file, and this is a distinct property of an index entry, alongside the file's name and the mode bits.</p>
<p>When there is no merge conflict regarding a file, its "stage" is <code>0</code>. This is indeed the state for <code>across_the_universe.md</code>, and for <code>let_it_be.md</code>.</p>
<p>On a conflict's state, we have:</p>
<ul>
<li>Stage <code>1</code> - which is the merge base.</li>
<li>Stage <code>2</code> - which is "your" version. That is, the version of the file on the branch you are merging <em>into</em>. In our example, this would be <code>john_branch_4</code>.</li>
<li>Stage <code>3</code> - which is "their" version, also called the <code>MERGE_HEAD</code>. That is, the version on the branch you are merging (into the current branch). In our example, that is <code>paul_branch_4</code>.</li>
</ul>
<p>To observe the file's contents in a specific stage, you can use a command I introduced in a previous post, git cat-file, and provide the blob's SHA:</p>
<pre><code class="lang-bash">git cat-file -p &lt;BLOB_SHA_FOR_STAGE_2&gt;
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/cat_file.png" alt="Using -file to present the content of the file on John's branch, right from its state in the index" width="600" height="400" loading="lazy">
<em>Using <code>git cat-file</code> to present the content of the file on John's branch, right from its state in the index</em></p>
<p>And indeed, this is the content we expected - from John's branch, where the lines start with "Everybody" rather than "Everyone".</p>
<p>A nice trick that allows you to see the content quickly without providing the blob's SHA-1 value, is by using <code>git show</code>, like so:</p>
<pre><code class="lang-bash">git show :&lt;STAGE&gt;:everyone.md
</code></pre>
<p>For example, to get the content of the same version as with git cat-file -p , you can write <code>git show :2:everyone.md</code>.</p>
<p>Git records the three states of the three commits into the index in this way at the start of the merge. It then follows the three-way merge algorithm to quickly resolve the simple cases:</p>
<p>In case all three stages match, then the selection is trivial.</p>
<p>If one side made a change while the other did nothing - that is, stage <code>1</code> matches stage <code>2</code>- then we choose stage <code>3</code>, or vice versa. That's exactly what happened with <code>let_it_be.md</code> and <code>across_the_universe.md</code>.</p>
<p>In case of a deletion on the incoming branch, for example, and given there were no changes on the current branch, then we would see that stage <code>1</code> matches stage <code>2</code>, but there is no stage <code>3</code>. In this case, <code>git merge</code> removes the file for the merged version.</p>
<p>What's really cool here is that for matching, Git doesn't need the actual files. Rather, it can rely on the SHA-1 values of the corresponding blobs. This way, Git can easily detect the state a file is in.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/3_way_merge_4-1.png" alt="Git performs the same 3-way merge algorithm on a files level" width="600" height="400" loading="lazy">
<em>Git performs the same 3-way merge algorithm on a files level</em></p>
<p>For <code>everyone.md</code> you have this special case - where stage <code>1</code>, stage <code>2</code> and stage <code>3</code> are all different from one another. That is, they have different blob SHAs. It's time to go deeper and understand the merge conflict. 😊</p>
<p>One way to do that would be to simply use <code>git diff</code>. In a <a class="post-section-overview" href="#heading-chapter-6-diffs-and-patches">previous chapter</a>, we examined git diff in detail, and saw that it shows the differences between various combinations of the working tree, index or commits.</p>
<p>But <code>git diff</code> also has a special mode for helping with merge conflicts:</p>
<pre><code class="lang-bash">git diff
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_conflict.png" alt="The output of  during a merge conflict" width="600" height="400" loading="lazy">
<em>The output of <code>git diff</code> during a merge conflict</em></p>
<p>This output may be confusing at first, but once you get used to it, it's pretty clear. Let's start by understanding it, and then see how you can resolve conflicts with other, more visual tools.</p>
<p>The conflicted section is separated by the "equal" marks (<code>====</code>), and marked with the corresponding branches. In this context, "ours" is the current branch. In this example, that would be <code>john_branch_4</code>, the branch that <code>HEAD</code> was pointing to when we initiated the <code>git merge</code> command. "Theirs" is the <code>MERGE_HEAD</code>, the branch that we are merging in - in this case, <code>paul_branch_4</code>.</p>
<p>So <code>git diff</code> without any special flags shows changes between the working tree and the index - which in this case are the conflicts yet to be resolved. The output doesn't include staged changes, which is very convenient for resolving the conflict.</p>
<p>Time to resolve this manually. Fun!</p>
<p>So, why is this a conflict?</p>
<p>For Git, Paul and John made different changes to the same line, for a few lines. John changed it to one thing, and Paul changed it to another thing. Git cannot decide which one is correct.</p>
<p>This is not the case for the last lines, like the line that used to be "Everyone had a hard year" on the merge base. Paul hasn't changed this line, or the lines surrounding it, so its version on paul_branch_4, or "theirs" in our case, agrees with the <code>merge_base</code>. Yet John's version, "ours", is different. Thus <code>git merge</code> can easily decide to take this version.</p>
<p>But what about the conflicted lines?</p>
<p>In this case, I know what I want, and that is actually a combination of these lines. I want the lines to start with "Everybody", following John's change, but also to include Paul's "yeah"s. So go ahead and create the desired version by editing everyone.md:</p>
<pre><code class="lang-bash">nano everyone.md
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/everyone_6.png" alt="Editing the file manually to achieve the desired state" width="600" height="400" loading="lazy">
<em>Editing the file manually to achieve the desired state</em></p>
<p>To compare the result file to what you had in the branch prior to the merge, you can run:</p>
<pre><code class="lang-bash">git diff --ours
</code></pre>
<p>Similarly, if you wish to see how the result of the merge differs from the branch you merged into our branch, you can run:</p>
<pre><code class="lang-bash">git diff --theirs
</code></pre>
<p>You can even see how the result is different from both sides using:</p>
<pre><code class="lang-bash">git diff --base
</code></pre>
<p>Now you can stage the fixed version:</p>
<pre><code class="lang-bash">git add everyone.md
</code></pre>
<p>After staging, if you look at <code>git status</code>, you will see no conflicts:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_status_after_manual_fix.png" alt="After staging the fixed version , there are no conflicts" width="600" height="400" loading="lazy">
<em>After staging the fixed version <code>everyone.md</code>, there are no conflicts</em></p>
<p>You can now simply use <code>git commit</code>, and Git will present you with a commit message containing details about the merge. You can modify it if you like, or leave it as is. Regardless of the commit message, Git will create a "merge commit" - that is, a commit with more than one parent.</p>
<p>To validate that, consider the history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_merge_2.png" alt="The history after completing the merge operation" width="600" height="400" loading="lazy">
<em>The history after completing the merge operation</em></p>
<p><code>john_branch_4</code> now points to the new merge commit. The incoming branch, "theirs", in this case, <code>paul_branch_4</code>, stays where it was.</p>
<h3 id="heading-how-to-use-vs-code-to-resolve-conflicts">How to Use VS Code to Resolve Conflicts</h3>
<p>You will now see how to resolve the same conflict using a graphical tool. For this example, I use VS Code, which is a free and popular code editor. There are many other tools, but the process is similar, so I will just show VS Code as an example.</p>
<p>First, get back to the state before the merge:</p>
<pre><code class="lang-bash">git reset --hard HEAD~
</code></pre>
<p>And try to merge again:</p>
<pre><code class="lang-bash">git merge paul_branch_4
</code></pre>
<p>You should be back at the same status:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_status_after_merge_failed-1.png" alt="Back at the conflicting status" width="600" height="400" loading="lazy">
<em>Back at the conflicting status</em></p>
<p>Let's see how this appears on VS Code:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/vs_code_1.png" alt="Conflict resolution with VS Code" width="600" height="400" loading="lazy">
<em>Conflict resolution with VS Code</em></p>
<p>VS Code marks the different versions with "Current Change" - which is the "ours" version, the current <code>HEAD</code>, and "Incoming Change" for the branch we are merging into the active branch. You can accept one of the changes (or both) by clicking on one of the options.</p>
<p>If you clicked on <code>Resolve in Merge editor</code>, you'll get a more visual view of the state. VS Code shows the status of each line:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/vs_code_2-1.png" alt="VS Code's Merge Editor" width="600" height="400" loading="lazy">
<em>VS Code's Merge Editor</em></p>
<p>If you look closely, you will see that VS Code shows changes within words - for example, showing that "Every<strong>one</strong>" was changed to "Every<strong>body</strong>", marking the changed parts.</p>
<p>You can accept either version, or you can accept a combination. In this case, if you click on "Accept Combination", you get this result:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/vs_code_3.png" alt="VS Code's Merge Editor after clicking on &quot;Accept Combination&quot;" width="600" height="400" loading="lazy">
<em>VS Code's Merge Editor after clicking on "Accept Combination"</em></p>
<p>VS Code did a really good job! The same three way merge algorithm was implemented here and used on the <em>word</em> level rather than the <em>line</em> level. So VS Code was able to actually resolve this conflict in a rather impressive way. Of course, you can modify VS Code's suggestion, but it provided a <em>very</em> good start.</p>
<h3 id="heading-one-more-powerful-tool">One More Powerful Tool</h3>
<p>Well, this was the first time in this book that I've used a tool with a graphical user interface. Indeed, graphical interfaces can be convenient to understand what's going on when you are resolving merge conflicts.</p>
<p>However, like in many other cases, when we need to really understand what's going on, the command line becomes handy. So, let's get back to the command line and learn a tool that can come in handy in more complicated cases.</p>
<p>Again, go back to the state before the merge:</p>
<pre><code class="lang-bash">git reset --hard HEAD~
</code></pre>
<p>And merge:</p>
<pre><code class="lang-bash">git merge paul_branch_4
</code></pre>
<p>And say, you are not exactly sure what happened. Why is there a conflict? One very useful command would be:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span> -p --merge
</code></pre>
<p>As a reminder, <code>git log</code> shows the history of commits that are reachable from <code>HEAD</code>. Adding <code>-p</code> tells <code>git log</code> to show the commits along with the diffs they introduced. The <code>--merge</code> switch makes the command show all commits containing changes relevant to any unmerged files, on either branch, together with their diffs.</p>
<p>This can help you identify the changes in history that led to the conflicts. So in this example, you'd see:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_p_merge.png" alt="The output of " width="600" height="400" loading="lazy">
<em>The output of <code>git log -p --merge</code></em></p>
<p>The first commit we see is "Commit 15", as in this commit John modified everyone.md, a file that still has conflicts. Next, Git shows "Commit 13", where Paul changed <code>everyone.md</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_p_merge_2.png" alt="The output of  - continued" width="600" height="400" loading="lazy">
<em>The output of <code>git log -p --merge</code> - continued</em></p>
<p>Notice that <code>git log --merge</code> did not mention previous commits that changed <code>everyone.md</code> before "Commit 13", as they didn't affect the current conflict.</p>
<p>This way, <code>git log</code> tells you all you need to know to understand the process that got you into the current conflicting state. Cool! 😎</p>
<p>Using the command line, you can also ask Git to take only one side of the changes - either "ours" or "theirs", even for a specific file.</p>
<p>You can also instruct Git to take some parts of the diffs of one file and another from another file. I will provide links that describe how to do that in <a class="post-section-overview" href="#heading-diffs-and-patches">the additional resources of this chapter in the appendix</a>.</p>
<p>For the most part, you can accomplish that pretty easily, either manually or from the UI of your favorite IDE.</p>
<p>For now, it's time for a recap.</p>
<h3 id="heading-recap-understanding-git-merge">Recap - Understanding Git Merge</h3>
<p>In this chapter, you got an extensive overview of merging with Git. You learned that merging is the process of combining the recent changes from several branches into a single new commit. The new commit has two parents - those commits which had been the tips of the branches that were merged.</p>
<p>We considered a simple, fast-forward merge, which is possible when one branch diverged from the base branch, and then just added commits on top of the base branch.</p>
<p>We then considered three-way merges, and explained the three-stage process:</p>
<ul>
<li>First, Git locates the merge base. As a reminder, this is the first commit that is reachable from both branches.</li>
<li>Second, Git calculates two diffs - one diff from the merge base to the <em>first</em> branch, and another diff from the merge base to the <em>second</em> branch. Git generates patches based on those diffs.</li>
<li>Third and last, Git applies both patches to the merge base using a 3-way merge algorithm. The result is the state of the new merge commit.</li>
</ul>
<p>We dove deeper into the process of a 3-way merge, whether at a file level or a hunk level. We considered when Git is able to rely on a 3-way merge to automatically resolve conflicts, and when it just can't.</p>
<p>You saw the output of <code>git diff</code> when we are in a conflicting state, and how to resolve conflicts either manually or with VS Code.</p>
<p>There is much more to be said about merges - different merge strategies, recursive merges, and so on. Yet, I believe this chapter covered everything needed so you have a robust understanding of what merge is, and what happens under the hood in the vast majority of cases.</p>
<h3 id="heading-beatles-related-resources">Beatles-Related Resources</h3>
<ul>
<li><a target="_blank" href="https://www.the-paulmccartney-project.com/song/ive-got-a-feeling/">https://www.the-paulmccartney-project.com/song/ive-got-a-feeling/</a></li>
<li><a target="_blank" href="https://www.cheatsheet.com/entertainment/did-john-lennon-or-paul-mccartney-write-the-classic-a-day-in-the-life.html/">https://www.cheatsheet.com/entertainment/did-john-lennon-or-paul-mccartney-write-the-classic-a-day-in-the-life.html/</a></li>
<li><a target="_blank" href="http://lifeofthebeatles.blogspot.com/2009/06/ive-got-feeling-lyrics.html">http://lifeofthebeatles.blogspot.com/2009/06/ive-got-feeling-lyrics.html</a></li>
</ul>
<h2 id="heading-chapter-8-understanding-git-rebase">Chapter 8 - Understanding Git Rebase</h2>
<p>One of the most powerful tools a developer can have in their toolbox is <code>git rebase</code>. Yet it is notorious for being complex and misunderstood.</p>
<p>The truth is, if you understand what it actually does, <code>git rebase</code> is a very elegant, and straightforward tool to achieve so many different things in Git.</p>
<p>In the previous chapters in this part, you learned what Git diffs are, what a merge is, and how Git resolves merge conflicts. In this chapter, you will understand what Git rebase is, why it's different from merge, and how to rebase with confidence.</p>
<h3 id="heading-short-recap-what-is-git-merge">Short Recap - What is Git Merge?</h3>
<p>Under the hood, <code>git rebase</code> and <code>git merge</code> are very, very different things. Then why do people compare them all the time?</p>
<p>The reason is their usage. When working with Git, we usually work in different branches and introduce changes to those branches.</p>
<p>In the previous chapter, we considered the example where John and Paul (of the Beatles) were co-authoring a new song. They started from the <code>main</code> branch, and then each diverged, modified the lyrics, and committed their changes.</p>
<p>Then, the two wanted to <em>integrate</em> their changes, which is something that happens very frequently when working with Git.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/diverging_history_commit_9.png" alt="A diverging history -  and  diverged from " width="600" height="400" loading="lazy">
_A diverging history - <code>paul_branch</code> and <code>john_branch</code> diverged from <code>main</code>_</p>
<p>There are two main ways to integrate changes introduced in different branches in Git, or in other words, different commits and commit histories. These are merge and rebase.</p>
<p>In the previous chapter, we got to know <code>git merge</code> pretty well. We saw that when performing a merge, we create a <strong>merge commit</strong> - where the contents of this commit are a combination of the two branches, and it also has two parents, one in each branch.</p>
<p>So, say you are on the branch <code>john_branch</code> (assuming the history depicted in the drawing above), and you run <code>git merge paul_branch</code>. You will get to this state - where on <code>john_branch</code>, there is a new commit with two parents. The first one will be the commit on the <code>john_branch</code> branch where <code>HEAD</code> was pointing to a state before performing the merge - in this case, "Commit 6". The second will be the commit pointed to by <code>paul_branch</code>, "Commit 9".</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_merge_paul_branch.png" alt="The result of running : a new Merge Commit with two parents" width="600" height="400" loading="lazy">
_The result of running <code>git merge paul_branch</code>: a new Merge Commit with two parents_</p>
<p>Look again at the history graph: you created a <strong>diverged</strong> history. You can actually see where it branched and where it merged again.</p>
<p>So when using <code>git merge</code>, you do not rewrite history - but rather, you add a commit to the existing history. And specifically, a commit that creates a diverged history.</p>
<h3 id="heading-how-is-git-rebase-different-than-git-merge">How is <code>git rebase</code> Different than <code>git merge</code>?</h3>
<p>When using <code>git rebase</code>, something different happens.</p>
<p>Let's start with the big picture: if you are on <code>paul_branch</code>, and use <code>git rebase john_branch</code>, Git goes to the common ancestor of John's branch and Paul's branch. Then it takes the patches introduced in the commits on Paul's branch, and applies those changes to John's branch.</p>
<p>So here, you use <code>rebase</code> to take the changes that were committed on one branch - Paul's branch - and replay them on a different branch, <code>john_branch</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_rebase_john_branch.png" alt="The result of running : the commits on  were &quot;replayed&quot; on top of " width="600" height="400" loading="lazy">
_The result of running <code>git rebase john_branch</code>: the commits on <code>paul_branch</code> were "replayed" on top of <code>john_branch</code>_</p>
<p>Wait, what does that mean?</p>
<p>We will now take this bit by bit to make sure you fully understand what's happening under the hood 😎</p>
<h3 id="heading-cherry-pick-as-a-basis-for-rebase"><code>cherry-pick</code> as a Basis for Rebase</h3>
<p>It is useful to think of rebase as performing <code>git cherry-pick</code> - a command that takes a commit, computes the patch this commit introduces by computing the difference between the parent's commit and the commit itself, and then cherry-pick "replays" this difference.</p>
<p>Let's do this manually.</p>
<p>If we look at the difference introduced by "Commit 5" by performing <code>git diff main &lt;SHA_OF_COMMIT_5&gt;</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_main_commit_5.png" alt="Running  to observe the patch introduced by &quot;Commit 5&quot;" width="600" height="400" loading="lazy">
<em>Running <code>git diff</code> to observe the patch introduced by "Commit 5"</em></p>
<p>As always, you are encouraged to run the commands yourself while reading this chapter. Unless noted otherwise, I will use the following repository:</p>
<p><a target="_blank" href="https://github.com/Omerr/rebase_playground.git">https://github.com/Omerr/rebase_playground.git</a></p>
<p>I recommend you clone it locally and have the same starting point I am using for this chapter.</p>
<p>You can see that in this commit, John started working on a song called "Lucy in the Sky with Diamonds":</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_main_commit_5_output.png" alt="The output of  - the patch introduced by &quot;Commit 5&quot;" width="600" height="400" loading="lazy">
<em>The output of <code>git diff</code> - the patch introduced by "Commit 5"</em></p>
<p>As a reminder, you can also use the command <code>git show</code> to get the same output:</p>
<pre><code class="lang-bash">git show &lt;SHA_OF_COMMIT_5&gt;
</code></pre>
<p>Now, if you <code>cherry-pick</code> this commit, you will introduce <em>this change</em> specifically, on the active branch. Switch to <code>main</code> first:</p>
<pre><code class="lang-bash">git checkout main (or git switch main)
</code></pre>
<p>And create another branch:</p>
<pre><code class="lang-bash">git checkout -b my_branch (or git switch -c my_branch)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/create_my_branch.png" alt="Creating  that branches from " width="600" height="400" loading="lazy">
_Creating <code>my_branch</code> that branches from <code>main</code>_</p>
<p>Next, <code>cherry-pick</code> "Commit 5":</p>
<pre><code class="lang-bash">git cherry-pick &lt;SHA_OF_COMMIT_5&gt;
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/cherry_pick_commit_5.png" alt="Using  to apply the changes introduced in &quot;Commit 5&quot; onto " width="600" height="400" loading="lazy">
<em>Using <code>cherry-pick</code> to apply the changes introduced in "Commit 5" onto <code>main</code></em></p>
<p>Consider the log (output of <code>git lol</code>):</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_lol_commit_5.png" alt="The output of " width="600" height="400" loading="lazy">
<em>The output of <code>git lol</code></em></p>
<p>It seems like you <em>copy-pasted</em> "Commit 5". Remember that even though it has the same commit message, and introduces the same changes, and even points to the same tree object as the original "Commit 5" in this case - it is still a different commit object, as it was created with a different timestamp.</p>
<p>Looking at the changes, using <code>git show HEAD</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_show_HEAD-1.png" alt="The output of " width="600" height="400" loading="lazy">
<em>The output of <code>git show HEAD</code></em></p>
<p>They are the same as "Commit 5"'s.</p>
<p>And of course, if you look at the file (say, by using <code>nano lucy_in_the_sky_with_diamonds.md</code>), it will be in the same state as it has been after the original "Commit 5".</p>
<p>Cool! 😎</p>
<p>You can now remove the new branch so it doesn't appear on your history every time:</p>
<pre><code class="lang-bash">git checkout main
git branch -D my_branch
</code></pre>
<h3 id="heading-beyond-cherry-pick-how-to-use-git-rebase">Beyond <code>cherry-pick</code> - How to Use <code>git rebase</code></h3>
<p>You can view <code>git rebase</code> as a way to perform multiple <code>cherry-pick</code>s one after the other - that is, to "replay" multiple commits. This is not the only thing you can do with rebase, but it's a good starting point for our explanation.</p>
<p>It's time to play with <code>git rebase</code>!</p>
<p>Before, you merged <code>paul_branch</code> into <code>john_branch</code>. What would happen if you <em>rebased</em> <code>paul_branch</code> on top of <code>john_branch</code>? You would get a very different history.</p>
<p>In essence, it would seem as if we took the changes introduced in the commits on <code>paul_branch</code>, and replayed them on <code>john_branch</code>. The result would be a linear history.</p>
<p>To understand the process, I will provide the high level view, and then dive deeper into each step. The process of rebasing one branch on top of another branch is as follows:</p>
<ol>
<li>Find the common ancestor.</li>
<li>Identify the commits to be "replayed".</li>
<li>For every commit <code>X</code>, compute <code>diff(parent(X), X)</code>, and store it as a <code>patch(X)</code>.</li>
<li>Move <code>HEAD</code> to the new base.</li>
<li>Apply the generated patches in order on the target branch. Each time, create a new commit object with the new state.</li>
</ol>
<p>The process of making new commits with the same change sets as existing ones is also called "<strong>replaying</strong>" those commits, a term we have already used.</p>
<h3 id="heading-time-to-get-hands-on-with-rebase">Time to Get Hands-On with Rebase</h3>
<p>Before running the following command command, make sure you have <code>john_branch</code> locally, so run:</p>
<pre><code class="lang-bash">git checkout john_branch
</code></pre>
<p>Start from Paul's branch:</p>
<pre><code class="lang-bash">git checkout paul_branch
</code></pre>
<p>This is the history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/diverging_history_commit_9-1.png" alt="Commit history before performing " width="600" height="400" loading="lazy">
<em>Commit history before performing <code>git rebase</code></em></p>
<p>And now, to the exciting part:</p>
<pre><code class="lang-bash">git rebase john_branch
</code></pre>
<p>And observe the history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_rebase.png" alt="The history after rebasing" width="600" height="400" loading="lazy">
<em>The history after rebasing</em></p>
<p>With <code>git merge</code> you added to the history, while with <code>git rebase</code> you <strong>rewrite history</strong>. You create <strong>new</strong> commit objects. In addition, the result is a linear history graph - rather than a diverging graph.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_rebase_2.png" alt="The history after rebasing" width="600" height="400" loading="lazy">
<em>The history after rebasing</em></p>
<p>In essence, you "copied" the commits that were on <code>paul_branch</code> and that were introduced after "Commit 4", and "pasted" them on top of <code>john_branch</code>.</p>
<p>The command is called "rebase", because it changes the base commit of the branch it's run from. That is, in your case, before running <code>git rebase</code>, the base of <code>paul_branch</code> was "Commit 4" - as this is where the branch was "born" (from <code>main</code>). With <code>rebase</code>, you asked Git to give it another base - that is, pretend as if it had been born from "Commit 6".</p>
<p>To do that, Git took what used to be "Commit 7", and "replayed" the changes introduced in this commit onto "Commit 6". Then it created a new commit object. This object differs from the original "Commit 7" in three aspects:</p>
<ol>
<li>It has a different timestamp.</li>
<li>It has a different parent commit - "Commit 6", rather than "Commit 4".</li>
<li>The tree object it is pointing to is different - as the changes were introduced to the tree pointed to by "Commit 6", and not the tree pointed to by "Commit 4".</li>
</ol>
<p>Notice the last commit here, "Commit 9'". The snapshot it represents (that is, the tree that it points to) is exactly the same tree you would get by merging the two branches. The state of the files in your Git repository would be <strong>the same</strong> as if you used <code>git merge</code>. It's only the <em>history</em> that is different, and the commit objects of course.</p>
<p>Now, you can simply use:</p>
<pre><code class="lang-bash">git checkout main
git merge paul_branch
</code></pre>
<p>Hm.... What would happen if you ran this last command? Consider the commit history again, after checking out <code>main</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_checkout_main.png" alt="The history after rebasing and checking out " width="600" height="400" loading="lazy">
<em>The history after rebasing and checking out <code>main</code></em></p>
<p>What would it mean to merge <code>main</code> and <code>paul_branch</code>?</p>
<p>Indeed, Git can simply perform a fast-forward merge, as the history is completely linear (if you need a reminder about fast-forward merges, check out the previous chapter). As a result, <code>main</code> and <code>paul_branch</code> now point to the same commit:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/fast_forward_merge_result.png" alt="The result of a fast-forward merge" width="600" height="400" loading="lazy">
<em>The result of a fast-forward merge</em></p>
<h3 id="heading-advanced-rebasing-in-git">Advanced Rebasing in Git</h3>
<p>Now that you understand the basics of rebase, it is time to consider more advanced cases, where additional switches and arguments to the rebase command will come in handy.</p>
<p>In the previous example, when you only used <code>rebase</code> (without additional switches), Git replayed all the commits from the common ancestor to the tip of the current branch.</p>
<p>But rebase is a super-power. It's an almighty command capable of…well, rewriting history. And it can come in handy if you want to modify history to make it your own.</p>
<p>Undo the last merge by making <code>main</code> point to "Commit 4" again:</p>
<pre><code class="lang-bash">git reset --hard &lt;ORIGINAL_COMMIT 4&gt;
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_reset_hard_1.png" alt="&quot;Undoing&quot; the last merge operation" width="600" height="400" loading="lazy">
<em>"Undoing" the last merge operation</em></p>
<p>And undo the rebasing by using:</p>
<pre><code class="lang-bash">git checkout paul_branch
git reset --hard &lt;ORIGINAL_COMMIT 9&gt;
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_reset_hard_2.png" alt="&quot;Undoing&quot; the rebase operation" width="600" height="400" loading="lazy">
<em>"Undoing" the rebase operation</em></p>
<p>Notice that you got to exactly the same history you used to have:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_undoing_rebase.png" alt="Visualizing the history after &quot;undoing&quot; the rebase operation" width="600" height="400" loading="lazy">
<em>Visualizing the history after "undoing" the rebase operation</em></p>
<p>To be clear, "Commit 9" doesn't just disappear when it's not reachable from the current <code>HEAD</code>. Rather, it's still stored in the object database. And as you used <code>git reset</code> now to change <code>HEAD</code> to point to this commit, you were able to retrieve it, and also its parent commits since they are also stored in the database. Pretty cool, huh? 😎 </p>
<p>You will learn more about <code>git reset</code> in the next part, where we discuss undoing changes in Git.</p>
<p>View the changes that Paul introduced:</p>
<pre><code class="lang-bash">git show HEAD
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_show_HEAD_2.png" alt=" shows the patch introduced by &quot;Commit 9&quot;" width="600" height="400" loading="lazy">
<em><code>git show HEAD</code> shows the patch introduced by "Commit 9"</em></p>
<p>Keep going backwards in the commit graph:</p>
<pre><code class="lang-bash">git show HEAD~
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_show_HEAD-.png" alt=" (same as ) shows the patch introduced by &quot;Commit 8&quot;" width="600" height="400" loading="lazy">
<em><code>git show HEAD~</code> (same as <code>git show HEAD~1</code>) shows the patch introduced by "Commit 8"</em></p>
<p>And one commit further:</p>
<pre><code class="lang-bash">git show HEAD~2
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_show_HEAD-2.png" alt=" shows the patch introduced by &quot;Commit 7&quot;" width="600" height="400" loading="lazy">
<em><code>git show HEAD~2</code> shows the patch introduced by "Commit 7"</em></p>
<p>Perhaps Paul doesn't want this kind of history. Rather, he wants it to seem as if he introduced the changes in "Commit 7" and "Commit 8" as a single commit.</p>
<p>For that, you can use an <strong>interactive rebase</strong>. To do that, we add the <code>-i</code> (or <code>--interactive</code>) switch to the rebase command:</p>
<pre><code class="lang-bash">git rebase -i &lt;SHA_OF_COMMIT_4&gt;
</code></pre>
<p>Or, since main is pointing to "Commit 4", we can run:</p>
<pre><code class="lang-bash">git rebase -i main
</code></pre>
<p>By running this command, you tell Git to use a new base, "Commit 4". So you are asking Git to go back to all commits that were introduced after "Commit 4" and that are reachable from the current <code>HEAD</code>, and replay those commits.</p>
<p>For every commit that is replayed, Git asks us what we'd like to do with it:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/interactive_rebase_1.png" alt=" prompts you to select what to do with each commit" width="600" height="400" loading="lazy">
<em><code>git rebase -i main</code> prompts you to select what to do with each commit</em></p>
<p>In this context it's useful to think of a commit as a patch. That is, "Commit 7", as in "the patch that "Commit 7" introduced on top of its parent".</p>
<p>One option is to use <code>pick</code>. This is the default behavior, which tells Git to replay the changes introduced in this commit. In this case, if you just leave it as is - and <code>pick</code> all commits - you will get the same history, and Git won't even create new commit objects.</p>
<p>Another option is <code>squash</code>. A <em>squashed</em> commit will have its contents "folded" into the contents of the commit preceding it. So in our case, Paul would like to squash "Commit 8" into "Commit 7":</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/interactive_rebase_2.png" alt="Squashing &quot;Commit 8&quot; into &quot;Commit 7&quot;" width="600" height="400" loading="lazy">
<em>Squashing "Commit 8" into "Commit 7"</em></p>
<p>As you can see, <code>git rebase -i</code> provides additional options, but we won't go into all of them in this chapter. If you allow the rebase to run, you will get prompted to select a commit message for the newly created commit (that is, the one that introduced the changes of both "Commit 7" and "Commit 8"):</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/interactive_rebase_3.png" alt="Providing the commit message: Commits 7+8" width="600" height="400" loading="lazy">
<em>Providing the commit message: Commits 7+8</em></p>
<p>And look at the history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_interactive_rebase.png" alt="The history after the interactive rebase" width="600" height="400" loading="lazy">
<em>The history after the interactive rebase</em></p>
<p>Exactly as we wanted! On <code>paul_branch</code>, we have "Commit 9" (of course, it's a different object than the original "Commit 9"). This object points to "Commits 7+8", which is a single commit introducing the changes of both the original "Commit 7" and the original "Commit 8". This commit's parent is "Commit 4", where <code>main</code> is pointing to.</p>
<p>Oh wow, isn't that cool? 😎</p>
<p><code>git rebase</code> grants you unlimited control over the shape of any branch. You can use it to reorder commits, or to remove incorrect changes, or modify a change in retrospect. Alternatively, you could perhaps move the base of your branch onto another commit, any commit that you wish.</p>
<h3 id="heading-how-to-use-the-onto-switch-of-git-rebase">How to Use the <code>--onto</code> Switch of <code>git rebase</code></h3>
<p>Let's consider one more example. Get to <code>main</code> again:</p>
<pre><code class="lang-bash">git checkout main
</code></pre>
<p>And delete the pointers to paul_branch and john_branch so you don't see them in the commit graph anymore:</p>
<pre><code class="lang-bash">git branch -D paul_branch
git branch -D john_branch
</code></pre>
<p>Next, branch from <code>main</code> to a new branch:</p>
<pre><code class="lang-bash">git checkout -b new_branch
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/create_new_branch.png" alt="Creating  that diverges from " width="600" height="400" loading="lazy">
_Creating <code>new_branch</code> that diverges from <code>main</code>_</p>
<p>This is the clean history you should have:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_new_branch.png" alt="A clean history with  that diverges from " width="600" height="400" loading="lazy">
_A clean history with <code>new_branch</code> that diverges from <code>main</code>_</p>
<p>Now, change the file <code>code.py</code> (for example, add a new function) and commit your changes:</p>
<pre><code class="lang-bash">nano code.py
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/code_py_1.png" alt="Adding the function  to " width="600" height="400" loading="lazy">
_Adding the function <code>new_branch</code> to <code>code.py</code>_</p>
<pre><code class="lang-bash">git add code.py
git commit -m <span class="hljs-string">"Commit 10"</span>
</code></pre>
<p>Get back to <code>main</code>:</p>
<pre><code class="lang-bash">git checkout main
</code></pre>
<p>And introduce another change - adding a docstring at the beginning of the file:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/code_py_2.png" alt="Added a docstring at the beginning of the file" width="600" height="400" loading="lazy">
<em>Added a docstring at the beginning of the file</em></p>
<p>Time to stage and commit these changes:</p>
<pre><code class="lang-bash">git add code.py
git commit -m <span class="hljs-string">"Commit 11"</span>
</code></pre>
<p>And yet another change, perhaps add <code>@Author</code> to the docstring:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/code_py_3.png" alt="Added  to the docstring" width="600" height="400" loading="lazy">
<em>Added <code>@Author</code> to the docstring</em></p>
<p>Commit this change as well:</p>
<pre><code class="lang-bash">git add code.py
git commit -m <span class="hljs-string">"Commit 12"</span>
</code></pre>
<p>Oh wait, now I realize that I wanted you to make the changes introduced in "Commit 11" as a part of the <code>new_branch</code>. Ugh. What can you do?</p>
<p>Consider the history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_commit_12-2.png" alt="The history after introducing &quot;Commit 12&quot;" width="600" height="400" loading="lazy">
<em>The history after introducing "Commit 12"</em></p>
<p>Instead of having "Commit 11" reside only on the <code>main</code> branch, I want it to be on <em>both</em> the <code>main</code> branch as well as <code>new_branch</code>. Visually, I would want to <em>move</em> it down the graph here:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/push_commit_10_down.png" alt="Visually, I want you to &quot;push down&quot; &quot;Commit 10&quot;" width="600" height="400" loading="lazy">
<em>Visually, I want you to "push down" "Commit 10"</em></p>
<p>Can you see where I am going? 😇</p>
<p>Well, <code>rebase</code> allows you to basically replay the changes introduced in <code>new_branch</code>, those introduced in "Commit 10", as if they had been originally conducted on "Commit 11", rather than "Commit 4".</p>
<p>To do that, you can use other arguments of <code>git rebase</code>. Specifically, you can use <code>git rebase --onto</code>, which optionally takes three parameters:</p>
<pre><code class="lang-bash">git rebase --onto &lt;new_parent&gt; &lt;old_parent&gt; &lt;until&gt;
</code></pre>
<p>That is, you take all commits between <code>old_parent</code> and <code>until</code>, and you "cut" and "paste" them <em>onto</em> <code>new_parent</code>.</p>
<p>In this case, you'd tell Git that you want to take all the history introduced between the common ancestor of <code>main</code> and <code>new_branch</code>, which is "Commit 4", and have the new base for that history be "Commit 11". To do that, use:</p>
<pre><code class="lang-bash">git rebase --onto &lt;SHA_OF_COMMIT_11&gt; main new_branch
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/rebase_onto_1.png" alt="The history before and after the rebase, &quot;Commit 10&quot; has been &quot;pushed&quot;" width="600" height="400" loading="lazy">
<em>The history before and after the rebase, "Commit 10" has been "pushed"</em></p>
<p>And look at our beautiful history! 😍</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/rebase_onto_2.png" alt="The history before and after the rebase, &quot;Commit 10&quot; has been &quot;pushed&quot;" width="600" height="400" loading="lazy">
<em>The history before and after the rebase, "Commit 10" has been "pushed"</em></p>
<p>Let's consider another case.</p>
<p>Say I started working on a new feature, and by mistake I started working from <code>feature_branch_1</code>, rather than from <code>main</code>.</p>
<p>So to emulate this, create <code>feature_branch_1</code>:</p>
<pre><code class="lang-bash">git checkout main
git checkout -b feature_branch_1
</code></pre>
<p>And erase <code>new_branch</code> so you don't see it in the graph anymore:</p>
<pre><code class="lang-bash">git branch -D new_branch
</code></pre>
<p>Create a simple Python file called <code>1.py</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/1_py_1.png" alt="A new file, , with " width="600" height="400" loading="lazy">
<em>A new file, <code>1.py</code>, with <code>print('Hello world!')</code></em></p>
<p>Stage and commit this file:</p>
<pre><code class="lang-bash">git add 1.py
git commit -m  <span class="hljs-string">"Commit 13"</span>
</code></pre>
<p>Now branch out from <code>feature_branch_1</code> (this is the mistake you will later fix):</p>
<pre><code class="lang-bash">git checkout -b feature_branch_2
</code></pre>
<p>And create another file, <code>2.py</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/2_py_1.png" alt="Creating " width="600" height="400" loading="lazy">
<em>Creating <code>2.py</code></em></p>
<p>Stage and commit this file as well:</p>
<pre><code class="lang-bash">git add 2.py
git commit -m  <span class="hljs-string">"Commit 14"</span>
</code></pre>
<p>And introduce some more code to <code>2.py</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/2_py_2.png" alt="Modifying " width="600" height="400" loading="lazy">
<em>Modifying <code>2.py</code></em></p>
<p>Stage and commit these changes too:</p>
<pre><code class="lang-bash">git add 2.py
git commit -m  <span class="hljs-string">"Commit 15"</span>
</code></pre>
<p>So far you should have this history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_commit_15.png" alt="The history after introducing &quot;Commit 15&quot;" width="600" height="400" loading="lazy">
<em>The history after introducing "Commit 15"</em></p>
<p>Get back to <code>feature_branch_1</code> and edit <code>1.py</code>:</p>
<pre><code class="lang-bash">git checkout feature_branch_1
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/1_py_2.png" alt="Modifying " width="600" height="400" loading="lazy">
<em>Modifying <code>1.py</code></em></p>
<p>Now stage and commit:</p>
<pre><code class="lang-bash">git add 1.py
git commit -m  <span class="hljs-string">"Commit 16"</span>
</code></pre>
<p>Your history should look like this:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_commit_16-1.png" alt="The history after introducing &quot;Commit 16&quot;" width="600" height="400" loading="lazy">
<em>The history after introducing "Commit 16"</em></p>
<p>Say now you realize that you've made a mistake. You actually wanted <code>feature_branch_2</code> to be born from the <code>main</code> branch, rather than from <code>feature_branch_1</code>.</p>
<p>How can you achieve that?</p>
<p>Try to think about it given the history graph and what you've learned about the <code>--onto</code> flag for the <code>rebase</code> command.</p>
<p>Well, you want to "replace" the parent of your first commit on <code>feature_branch_2</code>, which is "Commit 14", so that it's on top of <code>main</code> branch - in this case, "Commit 12" - rather than the beginning of <code>feature_branch_1</code> - in this case, "Commit 13". So again, you will be creating a <em>new base</em>, this time for the first commit on <code>feature_branch_2</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/plan_commit14_15.png" alt="You want to move around &quot;Commit 14&quot; and &quot;Commit 15&quot;" width="600" height="400" loading="lazy">
<em>You want to move around "Commit 14" and "Commit 15"</em></p>
<p>How would you do that?</p>
<p>First, switch to <code>feature_branch_2</code>:</p>
<pre><code class="lang-bash">git checkout feature_branch_2
</code></pre>
<p>And now you can use:</p>
<pre><code class="lang-bash">git rebase --onto main &lt;SHA_OF_COMMIT_13&gt;
</code></pre>
<p>This tells Git to take the history with "Commit 13" as a base, and change that base to be "Commit 12" (pointed to by <code>main</code>) instead.</p>
<p>As a result, you have <code>feature_branch_2</code> based on <code>main</code> rather than <code>feature_branch_1</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/rebase_onto_3.png" alt="The commit history after performing rebase" width="600" height="400" loading="lazy">
<em>The commit history after performing rebase</em></p>
<p>The syntax of the command is:</p>
<pre><code class="lang-bash">git rebase --onto &lt;new_parent&gt; &lt;old_parent&gt;
</code></pre>
<h3 id="heading-how-to-rebase-on-a-single-branch">How to Rebase on a Single Branch</h3>
<p>You can also use <code>git rebase</code> while looking at the history of a single branch.</p>
<p>Let's see if you can help me here.</p>
<p>Say I worked from <code>feature_branch_2</code>, and specifically edited the file <code>code.py</code>. I started by changing all strings to be wrapped by double quotes rather than single quotes:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/code_py_4.png" alt="Changing  into  in " width="600" height="400" loading="lazy">
<em>Changing <code>'</code> into <code>"</code> in <code>code.py</code></em></p>
<p>Then, I staged and committed:</p>
<pre><code class="lang-bash">git add code.py
git commit -m <span class="hljs-string">"Commit 17"</span>
</code></pre>
<p>I then decided to add a new function at the beginning of the file:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/code_py_5.png" alt="Adding the function " width="600" height="400" loading="lazy">
_Adding the function <code>another_feature</code>_</p>
<p>Again, I staged and committed:</p>
<pre><code class="lang-bash">git add code.py
git commit -m <span class="hljs-string">"Commit 18"</span>
</code></pre>
<p>And now I realized that I actually forgot to change the single quotes to double quotes wrapping <code>__main__</code> (as you might have noticed), so I did that too:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/code_py_6.png" alt="Changing  into " width="600" height="400" loading="lazy">
<em>Changing <code>'__main__'</code> into <code>"__main__"</code></em></p>
<p>Of course, I staged and committed this change:</p>
<pre><code class="lang-bash">git add code.py
git commit -m <span class="hljs-string">"Commit 19"</span>
</code></pre>
<p>Now, consider the history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_commit_19.png" alt="The commit history after introducing &quot;Commit 19&quot;" width="600" height="400" loading="lazy">
<em>The commit history after introducing "Commit 19"</em></p>
<p>It isn't really nice, is it? I mean, I have two commits that are related to one another, "Commit 17" and "Commit 19" (turning <code>'</code>s into <code>"</code>s), but they are split by the unrelated "Commit 18" (where I added a new function). What can we do? Can you help me?</p>
<p>Intuitively, I want to edit the history here:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/plan_edit_commits_17_18.png" alt="These are the commits I want to edit" width="600" height="400" loading="lazy">
<em>These are the commits I want to edit</em></p>
<p>So, what would you do?</p>
<p>You are right!</p>
<p>I can <code>rebase</code> the history from "Commit 17" to "Commit 19", on top of "Commit 15". To do that:</p>
<pre><code class="lang-bash">git rebase --interactive --onto &lt;SHA_OF_COMMIT_15&gt; &lt;SHA_OF_COMMIT_15&gt;
</code></pre>
<p>Notice I specified "Commit 15" as the beginning of the range of commits, excluding this commit. And I didn't need to explicitly specify <code>HEAD</code> as the last parameter.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/rebase_onto_4.png" alt="Using  on a single branch" width="600" height="400" loading="lazy">
<em>Using <code>rebase --onto</code> on a single branch</em></p>
<p>(Note: If you follow the steps above with my repository and get a merge conflict, you may have a different configuration than on my machine with regards to whitespace characters at line endings. In that case, you can add the <code>--ignore-whitespace</code> switch to the <code>rebase</code> command, resulting in the following command: <code>git rebase --ignore-whitespace --interactive --onto &lt;SHA_OF_COMMIT_15&gt; &lt;SHA_OF_COMMIT_15&gt;</code>. If you are curious to find out more about this issue, search for <code>autocrlf</code>.)</p>
<p>After following your advice and running the <code>rebase</code> command (thanks! 😇) I get the following screen:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/interactive_rebase_4.png" alt="Interactive rebase" width="600" height="400" loading="lazy">
<em>Interactive rebase</em></p>
<p>So what would I do? I want to put "Commit 19" before "Commit 18", so it comes right after "Commit 17". I can go further and <code>squash</code> them together, like so:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/interactive_rebase_5.png" alt="Interactive rebase - changing the order of commit and squashing" width="600" height="400" loading="lazy">
<em>Interactive rebase - changing the order of commit and squashing</em></p>
<p>Now when I get prompted for a commit message, I can provide the message "Commit 17+19":</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/interactive_rebase_6.png" alt="Providing a commit message" width="600" height="400" loading="lazy">
<em>Providing a commit message</em></p>
<p>And now, see our beautiful history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/rebase_onto_5.png" alt="The resulting history" width="600" height="400" loading="lazy">
<em>The resulting history</em></p>
<p>Thanks again!</p>
<h3 id="heading-more-rebase-use-cases-more-practice">More Rebase Use Cases + More Practice</h3>
<p>By now I hope you feel comfortable with the syntax of rebase. The best way to actually understand it is to consider various cases and figure out how to solve them yourself.</p>
<p>With the upcoming use cases, I strongly suggest you stop reading after I've introduced each use case, and then try to solve it on your own.</p>
<h4 id="heading-how-to-exclude-commits">How to Exclude Commits</h4>
<p>Say you have this history on another repo:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/another_history_1.png" alt="Another commit history" width="600" height="400" loading="lazy">
<em>Another commit history</em></p>
<p>Before playing around with it, store a tag to "Commit F" so you can get back to it later:</p>
<pre><code class="lang-bash">git tag original_commit_f
</code></pre>
<p>(A tag is a named reference to a commit, just like a branch - but it doesn't change when you add additional commits. It is like a constant named reference.)</p>
<p>Now, you actually don't want the changes in "Commit C" and "Commit D" to be included. You could use an interactive rebase like before and remove their changes. Or, you could use <code>git rebase --onto</code> again. How would you use <code>--onto</code> in order to "remove" these two commits?</p>
<p>You can rebase <code>HEAD</code> on top of "Commit B", where the old parent was actually "Commit D", and now it should be "Commit B". Consider the history again:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/another_history_1-1.png" alt="The history again" width="600" height="400" loading="lazy">
<em>The history again</em></p>
<p>Rebasing so that "Commit B" is the base of "Commit E" means "moving" both "Commit E" and "Commit F", and giving them another base - "Commit B". Can you come up with the command yourself?</p>
<pre><code class="lang-bash">git rebase --onto &lt;SHA_OF_COMMIT_B&gt; &lt;SHA_OF_COMMIT_D&gt; HEAD
</code></pre>
<p>Notice that using the syntax above (exactly as provided) would <em>not</em> move <em>main</em> to point to the new commit, so the result is a "detached" <code>HEAD</code>. If you use <code>gg</code> or another tool that displays the history reachable from branches, it might confuse you:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/rebase_onto_6.png" alt="Rebasing with  results in a detached " width="600" height="400" loading="lazy">
<em>Rebasing with <code>--onto</code> results in a detached <code>HEAD</code></em></p>
<p>But if you simply use <code>git log</code> (or my alias <code>git lol</code>), you will see the desired history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_lol.png" alt="The resulting history" width="600" height="400" loading="lazy">
<em>The resulting history</em></p>
<p>I don't know about you, but these kinds of things make me really happy. 😊😇</p>
<p>By the way, you could omit <code>HEAD</code> from the previous command as this is the default value for the third parameter. So just using:</p>
<pre><code class="lang-bash">git rebase --onto &lt;SHA_OF_COMMIT_B&gt; &lt;SHA_OF_COMMIT_D&gt;
</code></pre>
<p>Would have the same effect. The last parameter actually tells Git where the end of the current sequence of commits to rebase is. So the syntax of <code>git rebase --onto</code> with three arguments is:</p>
<pre><code class="lang-bash">git rebase --onto &lt;new_parent&gt; &lt;old_parent&gt; &lt;until&gt;
</code></pre>
<h4 id="heading-how-to-move-commits-across-branches">How to Move Commits Across Branches</h4>
<p>So let's say we get to the same history as before:</p>
<pre><code class="lang-bash">git checkout original_commit_f
</code></pre>
<p>And now I want only "Commit E" to be on a branch based on "Commit B". That is, I want to have a new branch, branching from "Commit B", with only "Commit E".</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/another_history_2.png" alt="The current history, considering &quot;Commit E&quot;" width="600" height="400" loading="lazy">
<em>The current history, considering "Commit E"</em></p>
<p>So, what does this mean in terms of <code>rebase</code>? Consider the image above. What commit (or commits) should I rebase, and which commit would be the new base?</p>
<p>I know I can count on you here 😉</p>
<p>What I want is to take "Commit E", and this commit only, and change its base to be "Commit B". In other words, to replay the changes introduced in "Commit E" onto "Commit B".</p>
<p>Can you apply that logic to the syntax of git rebase?</p>
<p>Here it is (this time I'm writing <code>&lt;COMMIT_X&gt;</code> instead of <code>&lt;SHA_OF_COMMIT_X&gt;</code>, for brevity):</p>
<pre><code class="lang-bash">git rebase --onto &lt;COMMIT_B&gt; &lt;COMMIT_D&gt; &lt;COMMIT_E&gt;
</code></pre>
<p>Now the history looks like so:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_rebase_3.png" alt="The history after rebase" width="600" height="400" loading="lazy">
<em>The history after rebase</em></p>
<p>Notice that <code>rebase</code> moved <code>HEAD</code>, but not any other reference named (such as a branch or a tag). In other words, you are in a detached <code>HEAD</code> state. So here too, using <code>gg</code> or another tool that displays the history reachable from branches and tags might confuse you. You can use <code>git log</code> (or my alias <code>git lol</code>) to display the reachable history from <code>HEAD</code>.</p>
<p>Awesome!</p>
<h3 id="heading-a-note-about-conflicts">A Note About Conflicts</h3>
<p>Note that when performing a rebase, you may run into conflicts just as when merging. You may have conflicts because, when rebasing, you are trying to apply patches on a different base, perhaps where the patches do not apply.</p>
<p>For example, consider the previous repository again, and specifically, consider the change introduced in "Commit 12", pointed to by <code>main</code>:</p>
<pre><code class="lang-bash">git show main
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/patch_commit_12.png" alt="The patch introduced in &quot;Commit 12&quot;" width="600" height="400" loading="lazy">
<em>The patch introduced in "Commit 12"</em></p>
<p>I already covered the format of <code>git diff</code> in detail in <a class="post-section-overview" href="#heading-chapter-6-diffs-and-patches">chapter 6</a>, but as a quick reminder, this commit instructs Git to add a line after the two lines of context:</p>
<pre><code class="lang-patch">
</code></pre>
<p>This is a sample file</p>
<pre><code>
And before these three lines <span class="hljs-keyword">of</span> context:

<span class="hljs-string">``</span><span class="hljs-string">`patch</span>
</code></pre><p>def new_feature():
  print('new feature')</p>
<pre><code>
Say you are trying to rebase <span class="hljs-string">"Commit 12"</span> onto another commit. If, <span class="hljs-keyword">for</span> some reason, these context lines don<span class="hljs-string">'t exist as they do in the patch on the commit you are rebasing onto, then you will have a conflict.

### Zooming Out for the Big Picture

![Comparing rebase and merge](https://www.freecodecamp.org/news/content/images/2023/12/compare_rebase_merge.png)
_Comparing rebase and merge_

In the beginning of this chapter, I started by mentioning the similarity between `git merge` and `git rebase`: both are used to integrate changes introduced in different histories.

But, as you now know, they are very different in how they operate. While merging results in a _diverged_ history, rebasing results in a _linear_ history. Conflicts are possible in both cases. And there is one more column described in the table above that requires some close attention.

Now that you know what "Git rebase" is, and how to use interactive rebase or rebase `--onto`, as I hope you agree, `git rebase` is a super powerful tool. Yet, it has one huge drawback when compared with merging.

**Git rebase changes the history.**

This means that you should **not** rebase commits that exist outside your local copy of the repository, and that other people may have based their commits on.

In other words, if the only commits in question are those you created locally - go ahead, use rebase, go wild.

But if the commits have been pushed, this can lead to a huge problem - as someone else may rely on these commits that you later overwrite, and then you and they will have different versions of the repository.

This is unlike `merge` which, as we have seen, does not modify history.

For example, consider the last case where we rebased and resulted in this history:

![The history after rebase](https://www.freecodecamp.org/news/content/images/2023/12/history_after_rebase_3-1.png)
_The history after rebase_

Now, assume that I have already pushed this branch to the remote. And after I had pushed the branch, another developer pulled it and branched out from "Commit C". The other developer didn'</span>t know that meanwhile, I was locally rebasing my branch, and would later push it again.

This results <span class="hljs-keyword">in</span> an inconsistency: the other developer works <span class="hljs-keyword">from</span> a commit that is no longer available on my copy <span class="hljs-keyword">of</span> the repository.

I will not elaborate on what exactly <span class="hljs-built_in">this</span> causes <span class="hljs-keyword">in</span> <span class="hljs-built_in">this</span> book, <span class="hljs-keyword">as</span> my main message is that you should definitely avoid such cases. If you<span class="hljs-string">'re interested in what would actually happen, I'</span>ll leave a link to a useful resource <span class="hljs-keyword">in</span> the [additional references](#heading-additional-references-by-part). For now, <span class="hljs-keyword">let</span><span class="hljs-string">'s summarize what we have covered.

### Recap - Understanding Git Rebase

In this chapter, you learned about `git rebase`, a super-powerful tool to rewrite history in Git. You considered a few use cases where git rebase can be helpful, and how to use it with one, two, or three parameters, with and without the `--onto` switch.

I hope I was able to convince you that `git rebase` is powerful - but also that it is quite simple once you get the gist. It is a tool you can use to "copy-paste" commits (or, more accurately, patches). And it'</span>s a useful tool to have under your belt. In essence, <span class="hljs-string">`git rebase`</span> takes the patches introduced by commits, and replays them on another commit. As described <span class="hljs-keyword">in</span> <span class="hljs-built_in">this</span> chapter, <span class="hljs-built_in">this</span> is useful <span class="hljs-keyword">in</span> many different scenarios.

## Part <span class="hljs-number">2</span> - Summary

In <span class="hljs-built_in">this</span> part you learned about branching and integrating changes <span class="hljs-keyword">in</span> Git.

You learned what a **diff** is, and the difference between a diff and a **patch**. You also learned how the output <span class="hljs-keyword">of</span> <span class="hljs-string">`git diff`</span> is constructed.

Understanding diffs is a major milestone <span class="hljs-keyword">for</span> understanding many other processes within Git such <span class="hljs-keyword">as</span> merging or rebasing.

Then, you got an extensive overview <span class="hljs-keyword">of</span> merging <span class="hljs-keyword">with</span> Git. You learned that **merging** is the process <span class="hljs-keyword">of</span> **combining the recent changes <span class="hljs-keyword">from</span> several branches into a single <span class="hljs-keyword">new</span> commit**. The <span class="hljs-keyword">new</span> commit has multiple parents - those commits which had been the tips <span class="hljs-keyword">of</span> the branches that were merged. In most cases, merging combines the changes <span class="hljs-keyword">from</span> two branches, and the resulting merge commit then has two parents - one <span class="hljs-keyword">from</span> each branch.

We considered a simple, fast-forward merge, which is possible when one branch diverged <span class="hljs-keyword">from</span> the base branch, and then just added commits on top <span class="hljs-keyword">of</span> the base branch.

We then considered three-way merges, and explained the three-stage process:

* First, Git locates the merge base. As a reminder, <span class="hljs-built_in">this</span> is the first commit that is reachable <span class="hljs-keyword">from</span> both branches.
* Second, Git calculates two diffs - one diff <span class="hljs-keyword">from</span> the merge base to the _first_ branch, and another diff <span class="hljs-keyword">from</span> the merge base to the _second_ branch. Git generates patches based on those diffs.
* Third and last, Git applies both patches to the merge base using a <span class="hljs-number">3</span>-way merge algorithm. The result is the state <span class="hljs-keyword">of</span> the <span class="hljs-keyword">new</span> merge commit.

You saw the output <span class="hljs-keyword">of</span> <span class="hljs-string">`git diff`</span> when we are <span class="hljs-keyword">in</span> a conflicting state, and how to resolve conflicts either manually or <span class="hljs-keyword">with</span> VS Code.

Ultimately, you got to know Git rebase. You saw that <span class="hljs-string">`git rebase`</span> is powerful - but also that it is quite simple once you understand what it does. It is a tool to <span class="hljs-string">"copy-paste"</span> commits (or, more accurately, patches).

![Comparing rebase and merge](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/12/compare_rebase_merge-1.png)</span>
_Comparing rebase and merge_

Both <span class="hljs-string">`git merge`</span> and <span class="hljs-string">`git rebase`</span> are used to integrate changes introduced <span class="hljs-keyword">in</span> different histories.

Yet, they differ <span class="hljs-keyword">in</span> how they operate. While merging results <span class="hljs-keyword">in</span> a _diverged_ history, rebasing results <span class="hljs-keyword">in</span> a _linear_ history. <span class="hljs-string">`git rebase`</span> _changes_ the history, whereas <span class="hljs-string">`git merge`</span> adds to the existing history.

With <span class="hljs-built_in">this</span> deep understanding <span class="hljs-keyword">of</span> diffs, patches, merge and rebase, you should feel confident introducing changes to a git repository.

The next part will focus on what happens when things go wrong - how you can change history (<span class="hljs-keyword">with</span> or without <span class="hljs-string">`git rebase`</span>), or find <span class="hljs-string">"lost"</span> commits.

# Part <span class="hljs-number">3</span> - Undoing Changes

Did you ever get to a point where you said: <span class="hljs-string">"Uh-oh, what did I just do?"</span> I guess you have, just like about anyone who uses Git.

Perhaps you committed to the wrong branch. Perhaps you lost some code that you had written. Perhaps you committed something that you didn<span class="hljs-string">'t mean to.

This part will give you the tools to rewrite history with confidence, thereby "undoing" all kinds of changes in Git. 

Just like the other parts of the book, this part will be practical yet in-depth - so instead of providing you with a list of things to do when things go wrong, we will understand the underlying mechanisms, so that you will feel confident whenever you get to the "uh-oh" moment. Actually, you will find these moments as opportunities for an interesting challenge, rather than a dreadful scenario.

## Chapter 9 - Git Reset

Our journey starts with a powerful command that can be used to undo many different actions with Git - `git reset`.

### A Short Reminder - Recording Changes

In [chapter 3](#heading-chapter-3-how-to-record-changes-in-git), you learned how to record changes in Git. If you remember everything from this part, feel free to jump to the next section.

It is very useful to think about Git as a system for recording snapshots of a filesystem in time. Considering a Git repository, it has three "states" or "trees":

1. The **working directory**, a directory that has a repository associated with it.
2. The **staging area (index)** which holds the tree for the next commit.
3. The **repository**, which is a collection of commits and references.

![The three "trees" of a Git repo](https://www.freecodecamp.org/news/content/images/2023/12/3_trees.png)
_The three "trees" of a Git repo_

Note regarding the drawing conventions I use: I include `.git` within the working directory, to remind you that it is a folder within the project'</span>s folder on the filesystem. The <span class="hljs-string">`.git`</span> folder actually contains the objects and references <span class="hljs-keyword">of</span> the repository, <span class="hljs-keyword">as</span> explained <span class="hljs-keyword">in</span> [chapter <span class="hljs-number">4</span>](#heading-chapter<span class="hljs-number">-4</span>-how-to-create-a-repo-<span class="hljs-keyword">from</span>-scratch).

#### Hands-on Demonstration

Use <span class="hljs-string">`git init`</span> to initialize a <span class="hljs-keyword">new</span> repository. Write some text into a file called <span class="hljs-string">`1.txt`</span>:

<span class="hljs-string">``</span><span class="hljs-string">`bash
mkdir my_repo
cd my_repo
git init
echo Hello world &gt; 1.txt</span>
</code></pre><p>Out of the three tree states described above, where is <code>1.txt</code> now?</p>
<p>In the working tree, as it hasn't yet been introduced to the index.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/1_txt_working_dir.png" alt="The file  is now a part of the working dir only" width="600" height="400" loading="lazy">
<em>The file <code>1.txt</code> is now a part of the working dir only</em></p>
<p>In order to <em>stage</em> it, to <em>add</em> it to the index, use:</p>
<pre><code class="lang-bash">git add 1.txt
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/1_txt_index.png" alt="Using  stages the file so it is now in the index as well" width="600" height="400" loading="lazy">
<em>Using <code>git add</code> stages the file so it is now in the index as well</em></p>
<p>Notice that once you stage <code>1.txt</code>, Git creates a blob object with the content of this file, and adds it to the internal object database (within <code>.git</code> folder), as covered in <a class="post-section-overview" href="#heading-chapter-3-how-to-record-changes-in-git">chapter 3</a> and <a class="post-section-overview" href="#heading-chapter-4-how-to-create-a-repo-from-scratch">chapter 4</a>. I do not draw it as part of the "repository" as in this representation, the "repository" refers to a tree of commits and their references, and this blob has not been a part of any commit.</p>
<p>Now, use <code>git commit</code> to commit your changes to the repository:</p>
<pre><code class="lang-bash">git commit -m <span class="hljs-string">"Commit 1"</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit_1.png" alt="Using  creates a commit object in the repository" width="600" height="400" loading="lazy">
<em>Using <code>git commit</code> creates a commit object in the repository</em></p>
<p>You created a new <strong>commit</strong> object, which includes a pointer to a <strong>tree</strong> describing the entire <strong>working tree</strong>. In this case, this tree consists only of <code>1.txt</code> within the root folder. In addition to a pointer to the tree, the commit object includes metadata, such as timestamps and author information.</p>
<p>When considering the diagrams, notice that we only have a single copy of the file <code>1.txt</code> on disk, and a corresponding blob object in Git's object database. The "repository" tree now shows this file as it is part of the active commit - that is, the commit object "Commit 1" points to a tree that points to the blob with the contents of <code>1.txt</code>, the same blob that the index is pointing to.</p>
<p>For more information about the objects in Git (such as commits and trees), refer to <a class="post-section-overview" href="#heading-chapter-1-git-objects">chapter 1</a>.</p>
<p>Next, create a new file, and add it to the index, as before:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> second file &gt; 2.txt
git add 2.txt
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/2_txt_index.png" alt="The file  is in the working dir and the index after staging it with " width="600" height="400" loading="lazy">
<em>The file <code>2.txt</code> is in the working dir and the index after staging it with <code>git add</code></em></p>
<p>Next, commit:</p>
<pre><code class="lang-bash">git commit -m <span class="hljs-string">"Commit 2"</span>
</code></pre>
<p>Importantly, <code>git commit</code> does two things:</p>
<p>First, it creates a <strong>commit object</strong>, so there is an object within Git's internal object database with a corresponding SHA-1 value. This new commit object also points to the parent commit. That is the commit that <code>HEAD</code> was pointing to when you wrote the <code>git commit</code> command.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/new_commit_object.png" alt="A new commit object has been created, at first —  still points to the previous commit" width="600" height="400" loading="lazy">
<em>A new commit object has been created, at first - <code>main</code> still points to the previous commit</em></p>
<p>Second, <code>git commit</code> <strong>moves the pointer of the active branch</strong> — in our case, that would be <code>main</code>, to point to the newly created commit object.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit_updates_active_branch.png" alt=" also updates the active branch to point to the newly created commit object" width="600" height="400" loading="lazy">
<em><code>git commit</code> also updates the active branch to point to the newly created commit object</em></p>
<h3 id="heading-introducing-git-reset">Introducing <code>git reset</code></h3>
<p>You will now learn how to reverse the process of introducing a commit. For that, you will get to know the command <code>git reset</code>.</p>
<h4 id="heading-git-reset-soft"><code>git reset --soft</code></h4>
<p>The very last step you did before was to <code>git commit</code>, which actually means two things — Git created a commit object and moved <code>main</code>, the active branch. To undo this step, use the following command:</p>
<pre><code class="lang-bash">git reset --soft HEAD~1
</code></pre>
<p>The syntax <code>HEAD~1</code> refers to the first parent of <code>HEAD</code>. Consider a case where I had more than one commit in the commit-graph, say "Commit 3" pointing to "Commit 2", which is, in turn, pointing to "Commit 1. And consider <code>HEAD</code> was pointing to "Commit 3". You could use <code>HEAD~1</code> to refer to "Commit 2", and <code>HEAD~2</code> would refer to "Commit 1".</p>
<p>So, back to the command: <code>git reset --soft HEAD~1</code></p>
<p>This command asks Git to change whatever <code>HEAD</code> is pointing to. (Note: In the diagrams below, I use <code>*HEAD</code> for "whatever <code>HEAD</code> is pointing to".) In our example, <code>HEAD</code> is pointing to <code>main</code>. So Git will only change the pointer of <code>main</code> to point to <code>HEAD~1</code>. That is, <code>main</code> will point to "Commit 1".</p>
<p>However, this command did <strong>not</strong> affect the state of the index or the working tree. So if you use <code>git status</code> you will see that <code>2.txt</code> is staged, just like before you ran <code>git commit</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_status_after_reset_soft.png" alt=" shows that  is in the index, but not in the active commit" width="600" height="400" loading="lazy">
<em><code>git status</code> shows that <code>2.txt</code> is in the index, but not in the active commit</em></p>
<p>The state is now:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/reset_soft_1.png" alt="Resetting  to &quot;Commit 1&quot;" width="600" height="400" loading="lazy">
<em>Resetting <code>main</code> to "Commit 1"</em></p>
<p>(Note: I removed <code>2.txt</code> from the "repository" in the diagram as it is not part of the active commit - that is, the tree pointed to by "Commit 1" does not reference this file. However, it has not been removed from the file system - as it still exists in the working tree and the index.)</p>
<p>What about <code>git log</code>? It will start from <code>HEAD</code> , go to <code>main</code>, and then to "Commit 1":</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_after_reset_soft.png" alt="The output of " width="600" height="400" loading="lazy">
<em>The output of <code>git log</code></em></p>
<p>Notice that this means that "Commit 2" is no longer reachable from our history.</p>
<p>Does that mean the commit object of "Commit 2" is deleted?</p>
<p>No, it's not deleted. It still resides within Git's internal object database of objects.</p>
<p>If you push the current history now, by using <code>git push</code>, Git will not push "Commit 2" to the remote server (as it is not reachable from the current <code>HEAD</code>), but the commit object <em>still exists</em> on your local copy of the repository.</p>
<p>Now, commit again - and use the commit message of "Commit 2.1" to differentiate this new object from the original "Commit 2":</p>
<pre><code class="lang-bash">git commit -m <span class="hljs-string">"Commit 2.1"</span>
</code></pre>
<p>This is the resulting state:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit_2_1.png" alt="Creating a new commit" width="600" height="400" loading="lazy">
<em>Creating a new commit</em></p>
<p>I omitted "Commit 2" as it is not reachable from <code>HEAD</code>, even though its object exists in Git's internal object database.</p>
<p>Why are "Commit 2" and "Commit 2.1" different? Even if we used the same commit message, and even though they point to the same tree object (of the root folder consisting of <code>1.txt</code> and <code>2.txt</code>), they still have different timestamps, as they were created at different times. Both "Commit 2" and "Commit 2.1" now point to "Commit 1", but only "Commit 2.1" is reachable from <code>HEAD</code>.</p>
<h4 id="heading-git-reset-mixed"><code>git reset --mixed</code></h4>
<p>It's time to undo even further. This time, use:</p>
<pre><code class="lang-bash">git reset --mixed HEAD~1
</code></pre>
<p>(Note: <code>--mixed</code> is the default switch for <code>git reset</code>.)</p>
<p>This command starts the same as <code>git reset --soft HEAD~1</code>. That is, the command takes the pointer of whatever <code>HEAD</code> is pointing to now, which is the <code>main</code> branch, and sets it to <code>HEAD~1</code>, in our example - "Commit 1".</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_reset_mixed_1.png" alt="The first step of  is the same as " width="600" height="400" loading="lazy">
<em>The first step of <code>git reset --mixed</code> is the same as <code>git reset --soft</code></em></p>
<p>Next, Git goes further, effectively undoing the changes we made to the index. That is, changing the index so that it matches with the current <code>HEAD</code>, the new <code>HEAD</code> after setting it in the first step.</p>
<p>If we ran <code>git reset --mixed HEAD~1</code>, then <code>HEAD</code> (<code>main</code>) would be set to <code>HEAD~1</code> ("Commit 1"), and then Git would match the index to the state of "Commit 1" - in this case, it means that <code>2.txt</code> would no longer be part of the index.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_reset_mixed_2.png" alt="The second step of  is to match the index with the new " width="600" height="400" loading="lazy">
<em>The second step of <code>git reset --mixed</code> is to match the index with the new <code>HEAD</code></em></p>
<p>It's time to create a new commit with the state of the original "Commit 2". This time you need to stage <code>2.txt</code> again before creating it:</p>
<pre><code class="lang-bash">git add 2.txt
git commit -m <span class="hljs-string">"Commit 2.2"</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit_2_2.png" alt="Creating &quot;Commit 2.2&quot;" width="600" height="400" loading="lazy">
<em>Creating "Commit 2.2"</em></p>
<p>Similarly to "Commit 2.1", I "name" this commit "Commit 2.2" to differentiate it from the original "Commit 2" or "Commit 2.1" - these commits result in the same state as the original "Commit 2", but they are different commit objects.</p>
<h4 id="heading-git-reset-hard"><code>git reset --hard</code></h4>
<p>Go on, undo even more!</p>
<p>This time, use the <code>--hard</code> switch, and run:</p>
<pre><code class="lang-bash">git reset --hard HEAD~1
</code></pre>
<p>Again, Git starts with the <code>--soft</code> stage, setting whatever <code>HEAD</code> is pointing to (<code>main</code>), to <code>HEAD~1</code> ("Commit 1").</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_reset_hard_1-1.png" alt="The first step of  is the same as " width="600" height="400" loading="lazy">
<em>The first step of <code>git reset --hard</code> is the same as <code>git reset --soft</code></em></p>
<p>Next, moving on to the <code>--mixed</code> stage, matching the index with <code>HEAD</code>. That is, Git undoes the staging of <code>2.txt</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_reset_hard_2-1.png" alt="The second step of  is the same as " width="600" height="400" loading="lazy">
<em>The second step of <code>git reset --hard</code> is the same as <code>git reset --mixed</code></em></p>
<p>Next comes the <code>--hard</code> step, where Git goes even further and matches the working dir with the stage of the index. In this case, it means removing <code>2.txt</code> also from the working dir.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_reset_hard_3.png" alt="The third step of  matches the state of the working dir with that of the index" width="600" height="400" loading="lazy">
<em>The third step of <code>git reset --hard</code> matches the state of the working dir with that of the index</em></p>
<p>So to introduce a change to Git, you have three steps: you change the working dir, the index, or the staging area, and then you commit a new snapshot with those changes. To undo these changes:</p>
<ul>
<li>If we use <code>git reset --soft</code>, we undo the commit step.</li>
<li>If we use <code>git reset --mixed</code>, we also undo the staging step.</li>
<li>If we use <code>git reset --hard</code>, we undo the changes to the working dir.</li>
</ul>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_reset_switches.png" alt="The three main switches of " width="600" height="400" loading="lazy">
<em>The three main switches of <code>git reset</code></em></p>
<h3 id="heading-real-life-scenarios">Real-Life Scenarios</h3>
<h4 id="heading-scenario-1">Scenario #1</h4>
<p>So in a real-life scenario, write "I love Git" into a file (<code>love.txt</code>), as we all love Git 😍. Go ahead, stage and commit this as well:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> I love Git &gt; love.txt
git add love.txt
git commit -m <span class="hljs-string">"Commit 2.3"</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit_2_3.png" alt="Creating &quot;Commit 2.3&quot;" width="600" height="400" loading="lazy">
<em>Creating "Commit 2.3"</em></p>
<p>Also, save a tag so that you can get back to this commit later if needed:</p>
<pre><code class="lang-bash">git tag scenario-1
</code></pre>
<p>Oh, oops!</p>
<p>Actually, I didn't want you to commit it.</p>
<p>What I actually wanted you to do is write some more love words in this file before committing it.</p>
<p>What can you do?</p>
<p>Well, one way to overcome this would be to use <code>git reset --mixed HEAD~1</code>, effectively undoing both the committing and the staging actions you took:</p>
<pre><code class="lang-bash">git reset --mixed HEAD~1
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/reset_commit_2_3.png" alt="Undoing the staging and committing steps" width="600" height="400" loading="lazy">
<em>Undoing the staging and committing steps</em></p>
<p>So <code>main</code> points to "Commit 1" again, and <code>love.txt</code> is no longer a part of the index. However, the file remains in the working dir. You can now add more content to it:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> and Gitting Things Done &gt;&gt; love.txt
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/adding_love_lyrics.png" alt="Adding more love lyrics" width="600" height="400" loading="lazy">
<em>Adding more love lyrics</em></p>
<p>Stage and commit your file:</p>
<pre><code class="lang-bash">git add love.txt
git commit -m <span class="hljs-string">"Commit 2.4"</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit_2_4.png" alt="Introducing &quot;Commit 2.4&quot;" width="600" height="400" loading="lazy">
<em>Introducing "Commit 2.4"</em></p>
<p>Well done!</p>
<p>You got this clear, nice history of "Commit 2.4" pointing to "Commit 1".</p>
<p>You now have a new tool in your toolbox, <code>git reset</code>.</p>
<p>This tool is super, super useful, and you can accomplish almost anything with it. It's not always the most convenient tool to use, but it's capable of solving almost any rewriting-history scenario if you use it carefully.</p>
<p>For beginners, I recommend using only <code>git reset</code> for almost any time you want to undo in Git. Once you feel comfortable with it, move on to other tools.</p>
<h4 id="heading-scenario-2">Scenario #2</h4>
<p>Let us consider another case.</p>
<p>Create a new file called <code>new.txt</code>; stage and commit:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> this is a new file &gt; new.txt
git add new.txt
git commit -m <span class="hljs-string">"Commit 3"</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit_3.png" alt="Creating  and &quot;Commit 3&quot;" width="600" height="400" loading="lazy">
<em>Creating <code>new.txt</code> and "Commit 3"</em></p>
<p>(Note: In the drawing I omitted the files from the repository to avoid clutter. Commit 3 includes <code>1.txt</code>, <code>love.txt</code> and <code>new.txt</code> at this stage).</p>
<p>Oops. Actually, that's a mistake. You were on <code>main</code>, and I wanted you to create this commit on a feature branch. My bad 😇</p>
<p>There are two most important tools I want you to take from this chapter. The <em>second</em> is <code>git reset</code>. The first and by far more important one is to whiteboard the current state versus the state you want to be in.</p>
<p>For this scenario, the current state and the desired state look like so:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/scenario_2.png" alt="Scenario #2: current-vs-desired states" width="600" height="400" loading="lazy">
<em>Scenario #2: current-vs-desired states</em></p>
<p>(Note: In following diagrams, I will refer to the current state as the "original" state - before starting the process of rewriting history.)</p>
<p>You will notice three changes:</p>
<ol>
<li><code>main</code> points to "Commit 3" (the blue one) in the current state, but to "Commit 2.4" in the desired state.</li>
<li><code>feature_branch</code> doesn't exist in the current state, yet it exists and points to "Commit 3" in the desired state.</li>
<li><code>HEAD</code> points to <code>main</code> in the current state, and to <code>feature_branch</code> in the desired state.</li>
</ol>
<p>If you can draw this and you know how to use <code>git reset</code>, you can definitely get yourself out of this situation.</p>
<p>So again, the most important thing is to take a breath and draw this out.</p>
<p>Observing the drawing above, how do you get from the current state to the desired one?</p>
<p>There are a few different ways of course, but I will present one option only for each scenario. Feel free to play around with other options as well.</p>
<p>You can start by using <code>git reset --soft HEAD~1</code>. This would set <code>main</code> to point to the previous commit, "Commit 2.4":</p>
<pre><code class="lang-bash">git reset --soft HEAD~1
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/scenario_2_1.png" alt="Changing ; &quot;Commit 3 is still there, just not reachable from " width="600" height="400" loading="lazy">
<em>Changing <code>main</code>: "Commit 3" is still there, just not reachable from <code>HEAD</code></em></p>
<p>Peeking at the current-vs-desired diagram again, you can see that you need a new branch, right? You can use <code>git switch -c feature_branch</code> for it, or <code>git checkout -b feature_branch</code> (which does the same thing):</p>
<pre><code class="lang-bash">git switch -c feature_branch
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/scenario_2_2.png" alt="Creating  branch" width="600" height="400" loading="lazy">
_Creating <code>feature_branch</code> branch_</p>
<p>This command also updates <code>HEAD</code> to point to the new branch.</p>
<p>Since you used <code>git reset --soft</code>, you didn't change the index, so it currently has exactly the state you want to commit - how convenient! You can simply commit to <code>feature_branch</code>:</p>
<pre><code class="lang-bash">git commit -m <span class="hljs-string">"Commit 3.1"</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit_3_1.png" alt="Committing to  branch" width="600" height="400" loading="lazy">
_Committing to <code>feature_branch</code> branch_</p>
<p>And you got to the desired state.</p>
<h4 id="heading-scenario-3">Scenario #3</h4>
<p>Ready to apply your knowledge to additional cases?</p>
<p>Still on <code>feature_branch</code>, add some changes to <code>love.txt</code>, and create a new file called <code>cool.txt</code>. Stage them and commit:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> Some changes &gt;&gt; love.txt
<span class="hljs-built_in">echo</span> Git is cool &gt; cool.txt
git add love.txt
git add cool.txt
git commit -m <span class="hljs-string">"Commit 4"</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit_4.png" alt="The history, as well as the state of the index and the working dir after creating &quot;Commit 4&quot;" width="600" height="400" loading="lazy">
<em>The history, as well as the state of the index and the working dir after creating "Commit 4"</em></p>
<p>Oh, oops, actually I wanted you to create two <em>separate</em> commits, one with each change...</p>
<p>Want to try this one yourself (before reading on)?</p>
<p>You can undo the committing and staging steps:</p>
<pre><code class="lang-bash">git reset --mixed HEAD~1
</code></pre>
<p>Following this command, the index no longer includes those two changes, but they're both still in your file system:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/reset_commit_4.png" alt="Resulting state after using " width="600" height="400" loading="lazy">
<em>Resulting state after using <code>git reset --mixed HEAD~1</code></em></p>
<p>So now, if you only stage <code>love.txt</code>, you can commit it separately:</p>
<pre><code class="lang-bash">git add love.txt
git commit -m <span class="hljs-string">"Love"</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit_love.png" alt="Resulting state after committing the changes to " width="600" height="400" loading="lazy">
<em>Resulting state after committing the changes to <code>love.txt</code></em></p>
<p>Then, do the same for <code>cool.txt</code>:</p>
<pre><code class="lang-bash">git add cool.txt
git commit -m <span class="hljs-string">"Cool"</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit_separately.png" alt="Committing separately" width="600" height="400" loading="lazy">
<em>Committing separately</em></p>
<p>Nice!</p>
<h4 id="heading-scenario-4">Scenario #4</h4>
<p>To clear up the state, switch to <code>main</code> and use <code>reset --hard</code> to make it point to "Commit 3.1", while setting the index and the working dir to the state of "Commit 3.1":</p>
<pre><code class="lang-bash">git checkout main
git reset --hard &lt;SHA_OF_COMMIT_3_1&gt;
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/reset_main_commit_3_1.png" alt="Resetting  to &quot;Commit 3.1&quot;" width="600" height="400" loading="lazy">
<em>Resetting <code>main</code> to "Commit 3.1"</em></p>
<p>Create another file (<code>another.txt</code>) with some text, and add some text to <code>love.txt</code>. Stage both changes, and commit them:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> Another file &gt; another.txt
<span class="hljs-built_in">echo</span> More love &gt;&gt; love.txt
git add another.txt
git add love.txt
git commit -m <span class="hljs-string">"Commit 4.1"</span>
</code></pre>
<p>This should be the result:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit_more_changes.png" alt="A new commit" width="600" height="400" loading="lazy">
<em>A new commit</em></p>
<p>Oops...</p>
<p>So this time, I wanted it to be on another branch, but not a new branch, rather - an already-existing branch.</p>
<p>So what can you do?</p>
<p>I'll give you a hint. The answer is really short and really easy. What do we do first?</p>
<p>No, not <code>reset</code>. We <em>draw</em>. That's the first thing to do, as it would make everything else so much easier. So this is the current state:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/scenario_4.png" alt="The new commit on  appears blue" width="600" height="400" loading="lazy">
<em>The new commit on <code>main</code> appears blue</em></p>
<p>And the desired state?</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/scenario_4_1-1.png" alt="We want the &quot;blue&quot; commit to be on another, , branch\label{fig-scenario-4-1}" width="600" height="400" loading="lazy">
<em>We want the "blue" commit to be on another, <code>existing</code>, branch</em></p>
<p>How do you get from the current state to the desired state, what would be easiest?</p>
<p>One way would be to use <code>git reset</code> as you did before, but there is another way that I would like you to try.</p>
<p>Note that the following commands indeed assume the branch <code>existing</code> exists on your repository, yet you haven't created it earlier. To match a state where this branch actually exists, you can use the following commands:</p>
<pre><code class="lang-bash">git checkout &lt;SHA_OF_COMMIT_1&gt;
git checkout -b existing
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Hello"</span> &gt; x.txt
git add x.txt
git commit -m <span class="hljs-string">"Commit X"</span>
git checkout &lt;SHA_OF_COMMIT_3_1&gt; -- love.txt
git commit -m <span class="hljs-string">"Commit Y"</span>
git checkout main
</code></pre>
<p>(The command <code>git checkout &lt;SHA_OF_COMMIT_3_1&gt; -- love.txt</code> copies the contents of <code>love.txt</code> from "Commit 3.1" to the index and the working dir, so that you can commit it on the <code>existing</code> branch. We need the state of <code>love.txt</code> on "Commit Y" to be the same as of "Commit 3.1" to avoid conflicts.)</p>
<p>Now your history should match the one shown in the picture with the caption "We want the "blue" commit to be on another, <code>existing</code>, branch".</p>
<p>First, move <code>HEAD</code> to point to existing branch:</p>
<pre><code class="lang-bash">git switch existing
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/switch_existing.png" alt="Switch to the  branch" width="600" height="400" loading="lazy">
<em>Switch to the <code>existing</code> branch</em></p>
<p>Intuitively, what you want to do is take the changes introduced in "Commit 4.1", and apply these changes ("copy-paste") on top of <code>existing</code> branch. And Git has a tool just for that.</p>
<p>To ask Git to take the changes introduced between a commit and its parent commit and just apply these changes on the active branch, you can use <code>git cherry-pick</code>, a command we introduced in <a class="post-section-overview" href="#heading-chapter-8-understanding-git-rebase">chapter 8</a>. This command takes the changes introduced in the specified revision and applies them to the state of the active commit. Run:</p>
<pre><code class="lang-bash">git cherry-pick &lt;SHA_OF_COMMIT_4_1&gt;
</code></pre>
<p>You can specify the SHA-1 identifier of the desired commit, but you can also use <code>git cherry-pick main</code>, as the commit whose changes you are applying is the one <code>main</code> is pointing to.</p>
<p><code>git cherry-pick</code> also creates a new commit object, and updates the active branch to point to this new object, so the resulting state would be:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/cherry_pick.png" alt="The result after using " width="600" height="400" loading="lazy">
<em>The result after using <code>git cherry-pick</code></em></p>
<p>I mark the commit as "Commit 4.2" since it has a different timestamp, parent and SHA-1 value than "Commit 4.1", though the changes it introduces are the same.</p>
<p>You made good progress - the desired commit is now on the <code>existing</code> branch! But we don't want these changes to exist on <code>main</code> branch. <code>git cherry-pick</code> only applied the changes to the existing branch. How can you remove them from <code>main</code>?</p>
<p>One way would be to switch back to <code>main</code>, and then <code>reset</code> it:</p>
<pre><code class="lang-bash">git switch main
git reset --hard HEAD~1
</code></pre>
<p>And the result:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/reset_cherry_pick.png" alt="The resulting state after resetting " width="600" height="400" loading="lazy">
<em>The resulting state after resetting <code>main</code></em></p>
<p>You did it!</p>
<p>Note that <code>git cherry-pick</code> actually computes the difference between the specified commit and its parent, and then applies the difference to the active commit. This means that sometimes, Git won't be able to apply those changes due to a conflict.</p>
<p>Also, note that you can ask Git to <code>cherry-pick</code> the changes introduced in any commit, not only commits referenced by a branch.</p>
<h3 id="heading-recap-git-reset">Recap - Git Reset</h3>
<p>In this chapter, we learned how <code>git reset</code> operates, and clarified its three main modes of operation:</p>
<ul>
<li><code>git reset --soft &lt;commit&gt;</code>, which changes whatever <code>HEAD</code> is pointing to - to <code>&lt;commit&gt;</code>.</li>
<li><code>git reset --mixed &lt;commit&gt;</code>, which goes through the <code>--soft</code> stage, and also sets the state of the index to match that of <code>HEAD</code>.</li>
<li><code>git reset --hard &lt;commit&gt;</code>, which goes through the <code>--soft</code> and <code>--mixed</code> stages, and then sets the state of the working dir to match that of the index.</li>
</ul>
<p>You then applied your knowledge about <code>git reset</code> to solve some real-life issues that arise when using Git.</p>
<p>By understanding the way Git operates, and by whiteboarding the current state versus the desired state, you can confidently tackle all kinds of scenarios.</p>
<p>In the future chapters, we will cover additional Git commands and how they can help us solve all kinds of undesired situations.</p>
<h2 id="heading-chapter-10-additional-tools-for-undoing-changes">Chapter 10 - Additional Tools for Undoing Changes</h2>
<p>In the previous chapter, you met <code>git reset</code>. Indeed, <code>git reset</code> is a super powerful tool, and I highly recommend to use it until you feel completely comfortable with it.</p>
<p>Yet, <code>git reset</code> is not the only tool at our disposal. Some of the times, it is not the most convenient tool to use. In others, it's just not enough. This short chapter touches a few tools that are helpful for undoing changes in Git.</p>
<h3 id="heading-git-commit-amend"><code>git commit --amend</code></h3>
<p>Consider <a target="_blank" href="https://www.freecodecamp.org/news/p/f7b355ea-3f22-4613-8218-e95c67779d9f/scenario-1">Scenario #1</a> from the previous chapter again. As a reminder, you wrote "I love Git" into a file (<code>love.txt</code>), staged and committed this file:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/image-52.png" alt="Image" width="600" height="400" loading="lazy">
<em>The state after creating "Commit 2.3"</em></p>
<p>And then I realized I didn't want you to commit it at that state, but rather - write some more love words in this file before committing it.</p>
<p>To match this state, simply checkout the tag you created, which points to "Commit 2.3":</p>
<pre><code class="lang-bash">git checkout scenario-1
</code></pre>
<p>In the previous chapter, when we introduced <code>git reset</code>, you solved this issue by using <code>git reset --mixed HEAD~1</code>, effectively undoing both the committing and the staging actions you took.</p>
<p>Now I would like you to consider another approach. Keep working at the state of the last introduced commit ("Commit 2.3", referenced by the tag "scenario-1"), and make the changes you want:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> And I love this book &gt;&gt; love.txt
</code></pre>
<p>Add this change to the index:</p>
<pre><code class="lang-bash">git add love.txt
</code></pre>
<p>Now, you can use <code>git commit</code> with the <code>--amend</code> switch, which tells it to override the commit <code>HEAD</code> is pointing to. Actually, it will create another, new commit, pointing to <code>HEAD~1</code> ("Commit 1" in our example), and make <code>HEAD</code> point to this newly created commit. By providing the <code>-m</code> argument you can specify a new commit message as well:</p>
<pre><code class="lang-bash">git commit --amend -m <span class="hljs-string">"Commit 2.4"</span>
</code></pre>
<p>After running this command, <code>HEAD</code> points to <code>main</code>, which points to "Commit 2.4", which in turn points to "Commit 1". The previous "Commit 2.3" is no longer reachable from the history.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit_amend-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>The state after using <code>git commit --amend</code> (Commit "2.3" is unreachable and thus not included in the drawing)</em></p>
<p>This tool is useful when you want to quickly override the last commit you created. Indeed, you could use <code>git reset</code> to accomplish the same thing, but you can view <code>git commit --amend</code> as a more convenient shortcut.</p>
<h3 id="heading-git-revert"><code>git revert</code></h3>
<p>Okay, so another day, another problem.</p>
<p>Add the following text to <code>love.txt</code>, stage and commit as follows:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> This is more tezt &gt;&gt; love.txt
git add love.txt
git commit -m <span class="hljs-string">"Commit 3"</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_revert_1-1.png" alt="Committing &quot;More changes&quot;" width="600" height="400" loading="lazy">
<em>The state after committing "Commit 3"</em></p>
<p>And push it to the remote server:</p>
<pre><code class="lang-bash">git push origin HEAD
</code></pre>
<p>Um, oops 😓…</p>
<p>I just noticed something. I had a typo there. I wrote "This is more tezt" instead of "This is more text". Whoops. So what's the big problem now? I <code>push</code>ed, which means that someone else might have already <code>pull</code>ed those changes.</p>
<p>If I override those changes by using <code>git reset</code>, we will have different histories, and all hell might break loose. You can rewrite your own copy of the repo as much as you like until you <code>push</code> it.</p>
<p>Once you <code>push</code> the change, you need to be certain no one else has fetched those changes if you are going to rewrite history.</p>
<p>Alternatively, you can use another tool called <code>git revert</code>. This command takes the commit you're providing it with and computes the diff from its parent commit, just like <code>git cherry-pick</code>, but this time, it computes the <em>reverse</em> changes. That is, if in the specified commit you added a line, the reverse would delete the line, and vice versa. </p>
<p>In our case we are reverting "Commit 3", so the reverse would be to delete the line "This is more tezt" from <code>love.txt</code>. Since "Commit 3" is referenced by <code>main</code> and <code>HEAD</code>, we can use any of these named references in this command:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_revert_2.png" alt="Using  to undo the changes" width="600" height="400" loading="lazy">
<em>Using <code>git revert</code> to undo the changes</em></p>
<p><code>git revert</code> created a new commit object, which means it's an addition to the history. By using <code>git revert</code>, you didn't rewrite history. You admitted your past mistake, and this commit is an acknowledgment that you made a mistake and now you fixed it.</p>
<p>Some would say it's the more mature way. Some would say it's not as clean a history as you would get if you used <code>git reset</code> to rewrite the previous commit. But this is a way to avoid rewriting history.</p>
<p>You can now fix the typo and commit again:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> This is more text &gt;&gt; love.txt
git add love.txt
git commit -m <span class="hljs-string">"Commit 3.1"</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_revert_3.png" alt="Redoing the changes" width="600" height="400" loading="lazy">
<em>The resulting state after redoing the changes</em></p>
<p>You can use <code>git revert</code> to revert a commit other than <code>HEAD</code>. Say that you want to reverse the parent of <code>HEAD</code>, you can use:</p>
<pre><code class="lang-bash">git revert HEAD~1
</code></pre>
<p>Or you could provide the SHA-1 of the commit to revert.</p>
<p>Notice that since Git will apply the reverse patch of the previous patch - this operation might fail, as the patch may no longer apply and you might get a conflict.</p>
<h3 id="heading-git-rebase-as-a-tool-for-undoing-things">Git Rebase as a Tool for Undoing Things</h3>
<p>In <a class="post-section-overview" href="#heading-chapter-8-understanding-git-rebase">chapter 8</a>, you learned about Git rebase. We then considered it mainly as a tool to combine changes introduced in different branches. Yet, as long as you haven't <code>push</code>ed your changes, using <code>rebase</code> on your own branch can be a very convenient way to rearrange your commit history.</p>
<p>For that, you would usually <a class="post-section-overview" href="#heading-how-to-rebase-on-a-single-branch">rebase on a single branch</a>, and use interactive rebase. Consider again this example covered in <a class="post-section-overview" href="#heading-chapter-8-understanding-git-rebase">chapter 8</a>, where I worked from <code>feature_branch_2</code>, and specifically edited the file <code>code.py</code>. I started by changing all strings to be wrapped by double quotes rather than single quotes:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/code_py_4-1.png" alt="Changing  into  in " width="600" height="400" loading="lazy">
<em>Changing <code>'</code> into <code>"</code> in <code>code.py</code></em></p>
<p>Then, I staged and committed:</p>
<pre><code class="lang-bash">git add code.py
git commit -m <span class="hljs-string">"Commit 17"</span>
</code></pre>
<p>I then decided to add a new function at the beginning of the file:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/code_py_5-1.png" alt="Adding the function " width="600" height="400" loading="lazy">
_Adding the function <code>another_feature</code>_</p>
<p>Again, I staged and committed:</p>
<pre><code class="lang-bash">git add code.py
git commit -m <span class="hljs-string">"Commit 18"</span>
</code></pre>
<p>And now I realized I actually forgot to change the single quotes to double quotes wrapping the <code>__main__</code> (as you might have noticed), so I did that too:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/code_py_6-1.png" alt="Changing  into " width="600" height="400" loading="lazy">
<em>Changing <code>'__main__'</code> into <code>"__main__"</code></em></p>
<p>Of course, I staged and committed this change:</p>
<pre><code class="lang-bash">git add code.py
git commit -m <span class="hljs-string">"Commit 19"</span>
</code></pre>
<p>Now, consider the history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/history_after_commit_19-1.png" alt="The commit history after introducing &quot;Commit 19&quot;" width="600" height="400" loading="lazy">
<em>The commit history after introducing "Commit 19"</em></p>
<p>As explained in <a class="post-section-overview" href="#heading-chapter-8-understanding-git-rebase">chapter 8</a>, I got to a state with two commits that are related to one another, "Commit 17" and "Commit 19" (turning <code>'</code>s into <code>"</code>s), but they are split by the unrelated "Commit 18" (where I added a new function).</p>
<p>This is a classic case where <code>git rebase</code> would come in handy, to undo the local changes before <code>push</code>ing a clean history.</p>
<p>Intuitively, I want to edit the history here:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/plan_edit_commits_17_18-1.png" alt="These are the commits I want to edit" width="600" height="400" loading="lazy">
<em>These are the commits I want to edit</em></p>
<p>I can <code>rebase</code> the history from "Commit 17" to "Commit 19", on top of "Commit 15". To do that:</p>
<pre><code class="lang-bash">git rebase --interactive --onto &lt;SHA_OF_COMMIT_15&gt; &lt;SHA_OF_COMMIT_15&gt;
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/rebase_onto_4-1.png" alt="Using  on a single branch" width="600" height="400" loading="lazy">
<em>Using <code>rebase --onto</code> on a single branch</em></p>
<p>This results in the following screen:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/interactive_rebase_4-1.png" alt="Interactive rebase" width="600" height="400" loading="lazy">
<em>Interactive rebase</em></p>
<p>So what would I do? I want to put "Commit 19" before "Commit 18", so it comes right after "Commit 17". I can go further and <code>squash</code> them together, like so:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/interactive_rebase_5-1.png" alt="Interactive rebase - changing the order of commit and squashing" width="600" height="400" loading="lazy">
<em>Interactive rebase - changing the order of commit and squashing</em></p>
<p>Now when I get prompted for a commit message, I can provide the message "Commit 17+19":</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/interactive_rebase_6-1.png" alt="Providing a commit message" width="600" height="400" loading="lazy">
<em>Providing a commit message</em></p>
<p>And now, see our beautiful history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/rebase_onto_5-1.png" alt="The resulting history" width="600" height="400" loading="lazy">
<em>The resulting history</em></p>
<p>The syntax used above, <code>git rebase --interactive --onto &lt;COMMIT X&gt; &lt;COMMIT X&gt;</code> would be the most commonly used syntax by those who use <code>rebase</code> regularly. The state of mind these developers usually have is to create atomic commits while working, all the time, without being scared to change them later. Then, before <code>push</code>ing their changes, they would <code>rebase</code> the entire set of changes since the last <code>push</code>, and rearrange it so the history becomes coherent.</p>
<h3 id="heading-git-reflog"><code>git reflog</code></h3>
<p>Time to consider a more startling case.</p>
<p>Go back to "Commit 2.4":</p>
<pre><code class="lang-bash">git reset --hard &lt;SHA_OF_COMMIT_2_4&gt;
</code></pre>
<p>Get some work done, write some code, and add it to <code>love.txt</code>. Stage this change, and commit it:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> lots of work &gt;&gt; love.txt
git add love.txt
git commit -m <span class="hljs-string">"Commit 3.2"</span>
</code></pre>
<p>(I'm using "Commit 3.2" to indicate that this is not the same commit as "Commit 3" we used when explaining <code>git revert</code>.)</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/reflog_commit_3-1.png" alt="Another commit" width="600" height="400" loading="lazy">
<em>Another commit - "Commit 3.2"</em></p>
<p>I did the same on my machine, and I used the <code>Up</code> arrow key on my keyboard to scroll back to previous commands, and then I hit <code>Enter</code>, and… Wow.</p>
<p>Whoops.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/reflog_commit_3_reset.png" alt="Did I just ?" width="600" height="400" loading="lazy">
<em>Did I just <code>git reset -- hard</code>?</em></p>
<p>Did I just use <code>git reset --hard</code>? 😨</p>
<p>What actually happened? As you learned in the <a class="post-section-overview" href="#heading-chapter-9-git-reset">previous chapter</a>, Git moved the pointer to <code>HEAD~1</code>, so the last commit, with all of my precious work, is not reachable from the current history. Git also removed all the changes from the staging area, and then matched the working dir to the state of the staging area.</p>
<p>That is, everything matches this state where my work is… gone.</p>
<p>Freak out time. Freaking out.</p>
<p>But, really, is there a reason to freak out? Not really… We're relaxed people. What do we do? Well, intuitively, is the commit really, really gone?</p>
<p>No. Why not? It still exists inside the internal database of Git.</p>
<p>If I only knew where that is, I would know the <code>SHA-1</code> value that identifies this commit, and we could restore it. I could even undo the undoing, and <code>reset</code> back to this commit.</p>
<p>Actually, the only thing I really need here is the <code>SHA-1</code> of the "deleted" commit.</p>
<p>Now the question is, how do I find it? Would <code>git log</code> be useful?</p>
<p>Well, not really. <code>git log</code> would go to <code>HEAD</code>, which points to <code>main</code>, which points to the parent commit of the commit we are looking for. Then, <code>git log</code> would trace back through the parent chain, which does not include the commit with my precious work.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/reflog_git_log.png" alt=" doesn't help in this case" width="600" height="400" loading="lazy">
<em><code>git log</code> doesn't help in this case</em></p>
<p>Thankfully, the very smart people who created Git also created a backup plan for us, and that is called the <code>reflog</code>.</p>
<p>While you work with Git, whenever you change <code>HEAD</code>, which you can do by using <code>git reset</code>, but also other commands like <code>git switch</code> or <code>git checkout</code>, Git adds an entry to the <code>reflog</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_reflog.png" alt=" shows us where  was" width="600" height="400" loading="lazy">
<em><code>git reflog</code> shows us where <code>HEAD</code> was</em></p>
<p>We found our commit! It's the one starting with <code>0fb929e</code>.</p>
<p>We can also relate to it by its "nickname" - <code>HEAD@{1}</code>. Similar to the way Git uses <code>HEAD~1</code> to get to the first parent of <code>HEAD</code>, and <code>HEAD~2</code> to refer to the second parent of <code>HEAD</code> and so on, Git uses <code>HEAD@{1}</code> to refer to the first <em>reflog</em> parent of <code>HEAD</code>, that is, where <code>HEAD</code> pointed to in the previous step.</p>
<p>We can also ask <code>git rev-parse</code> to show us its value:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/reflog_revparse.png" alt="Using " width="600" height="400" loading="lazy">
<em>Using <code>git rev-parse HEAD@{1}</code></em></p>
<p>Note: In case you are using Windows, you may need to wrap it with quotation marks - like so:</p>
<pre><code class="lang-bash">git rev-parse <span class="hljs-string">"HEAD@{1}"</span>
</code></pre>
<p>Another way to view the <code>reflog</code> is by using <code>git log -g</code>, which asks <code>git log</code> to actually consider the <code>reflog</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_g.png" alt="The output of " width="600" height="400" loading="lazy">
<em>The output of <code>git log -g</code></em></p>
<p>You can see in the output of <code>git log -g</code> that the <code>reflog</code>'s entry <code>HEAD@{0}</code>, just like <code>HEAD</code>, points to <code>main</code>, which points to "Commit 2". But the parent of that entry in the <code>reflog</code> points to "Commit 3".</p>
<p>So to get back to "Commit 3", you can just use <code>git reset --hard HEAD@{1}</code> (or the <code>SHA-1</code> value of "Commit 3"):</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_reflog_reset.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git reset --hard HEAD@{1}</code></em></p>
<p>And now, if you <code>git log</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_2.png" alt="Our history is back!!!" width="600" height="400" loading="lazy">
<em>Our history is back!!!</em></p>
<p>We saved the day!</p>
<p>What would happen if I used this command again? And ran <code>git reset --hard HEAD@{1}</code>?</p>
<p>Git would set <code>HEAD</code> to where <code>HEAD</code> was pointing before the last <code>reset</code>, meaning to "Commit 2". We can keep going all day:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_reset_again.png" alt=" again" width="600" height="400" loading="lazy">
<em><code>git reset --hard</code> again</em></p>
<h3 id="heading-recap-additional-tools-for-undoing-changes">Recap - Additional Tools for Undoing Changes</h3>
<p>In the previous chapter, you learned how to use <code>git reset</code> to undo changes.</p>
<p>In this chapter, you extended your toolbox for undoing changes in Git with a few new commands:</p>
<ul>
<li><code>git commit --amend</code> - which "overrides" the last commit with the stage of the index. Mostly useful when you just committed something and want to modify that last commit.</li>
<li><code>git revert</code> - which creates a new commit, that reverts a past commit by adding a new commit to the history with the reversed changes. Useful especially when the "faulty" commit has already been pushed to the remote.</li>
<li><code>git rebase</code> - which you already know from <a class="post-section-overview" href="#heading-chapter-8-understanding-git-rebase">chapter 8</a>, and is useful for rewriting the history of multiple commits, especially before pushing them.</li>
<li><code>git reflog</code> (and <code>git log -g</code>) - which tracks all changes to <code>HEAD</code>, so you might find the SHA-1 value of a commit you need to get back to.</li>
</ul>
<p>The most important tool, even more important than the tools I just listed, is to whiteboard the current situation vs the desired one. Trust me on this, it will make every situation seem less daunting and the solution more clear.</p>
<p>There are additional tools that allow you to reverse changes in Git (I will provide links in the <a class="post-section-overview" href="#heading-additional-references-by-part">appendix</a>), but the collection of tools covered here should prepare you to tackle any challenge with confidence.</p>
<h2 id="heading-chapter-11-exercises">Chapter 11 - Exercises</h2>
<p>This chapter includes a few exercises to deepen your understanding of the tools you learned in Part 3. The full version of this book also includes detailed solutions for each.</p>
<p>The exercises are found on this repository:</p>
<p><a target="_blank" href="https://github.com/Omerr/undo-exercises.git">https://github.com/Omerr/undo-exercises.git</a></p>
<p>Each exercise exists on a branch with the name <code>exercise_XX</code>, so Exercise 1 is found on branch <code>exercise_01</code>, Exercise 2 is found on branch <code>exercise_02</code> and so on.</p>
<p>Note: As explained in previous chapters, if you work with commits that can be found on a remote server (which you are in this case, as you are using my repository "undo-exercises"), you should probably use <code>git revert</code> instead of <code>git reset</code>. Similar to <code>git rebase</code>, the command <code>git reset</code> also rewrites history - and thus you should refrain from using it on commits that others may have relied on. </p>
<p>For the purposes of these exercises, you can assume no one else has cloned or pulled code from the remote repository. Just remember - in real life, you should probably use <code>git revert</code> instead of commands that rewrite history in such cases.</p>
<h3 id="heading-exercise-1">Exercise 1</h3>
<p>On branch <code>exercise_01</code>, consider the file <code>hello.txt</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/ex_01_1.png" alt="The file " width="600" height="400" loading="lazy">
<em>The file <code>hello.txt</code></em></p>
<p>This file includes a typo (in the last character). Find the commit that introduced this typo.</p>
<h4 id="heading-exercise-1a">Exercise (1a)</h4>
<p>Remove this commit from the reachable history using <code>git reset</code> (with the right arguments), fix the typo, and commit again. Consider your history.</p>
<p>Revert to the previous state.</p>
<h4 id="heading-exercise-1b">Exercise (1b)</h4>
<p>Remove the faulty commit using <code>git commit --amend</code>, and get to the same state of the history as in the end of exercise (1a).</p>
<p>Revert to the previous state.</p>
<h4 id="heading-exercise-1c">Exercise (1c)</h4>
<p><code>revert</code> the faulty commit using <code>git revert</code> and fix the typo. Consider your history.</p>
<p>Revert to the previous state.</p>
<h4 id="heading-exercise-1d">Exercise (1d)</h4>
<p>Using <code>git rebase</code>, get to the same state as in the end of exercise (1a).</p>
<h3 id="heading-exercise-2">Exercise 2</h3>
<p>Switch to <code>exercise_02</code>, and consider the contents of <code>exercise_02.txt</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/ex_02_1.png" alt="The contents of " width="600" height="400" loading="lazy">
_The contents of <code>exercise_02.txt</code>_</p>
<p>A simple file, with one character at each line.</p>
<p>Consider the history (using <code>git lol</code>):</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/ex_02_2.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git lol</code></em></p>
<p>Oh my. Each character was introduced in a separate commit. That doesn't make any sense!</p>
<p>Use the tools you've acquired to create a history where the creation of <code>exercise_02.txt</code> is all done in a single commit.</p>
<h3 id="heading-exercise-3">Exercise 3</h3>
<p>Consider the history on branch <code>exercise_03</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/ex_03_1.png" alt="The history on " width="600" height="400" loading="lazy">
_The history on <code>exercise_03</code>_</p>
<p>This seems like a mess. You will notice that:</p>
<ul>
<li>The order is skewed. We need "Commit 1" to be the earliest commit on this branch, and have "Initial Commit" as its parent, followed by "Commit 2" and so on.</li>
<li>We shouldn't have "Commit 2a" and "Commit 2b", or "Commit 4a" and "Commit 4b" - these two pairs need to be combined into a single commit each - "Commit 2" and "Commit 4".</li>
<li>There is a typo on the commit message of "Commit 1", it should not have 3 <code>m</code>s.</li>
</ul>
<p>Fix these issues, but rely on the changes of each original commit. The resulting history should look like so:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/ex_03_2.png" alt="The desired history" width="600" height="400" loading="lazy">
<em>The desired history</em></p>
<h3 id="heading-exercise-4">Exercise 4</h3>
<p>This exercise actually consists of three branches: <code>exercise_04</code>, <code>exercise_04_a</code>, and <code>exercise_04_b</code>.</p>
<p>To see the history of these branches without others, use the following syntax:</p>
<pre><code class="lang-bash">git lol --branches=<span class="hljs-string">"exercise_04*"</span>
</code></pre>
<p>The result is:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/ex_04_1.png" alt="The output of " width="600" height="400" loading="lazy">
_The output of <code>git lol --branches="exercise_04*"</code>_</p>
<p>Your goal is to make <code>exercise_04_b</code> independent of <code>exercise_04_a</code>. That is, get to this history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/ex_04_2.png" alt="The desired history" width="600" height="400" loading="lazy">
<em>The desired history</em></p>
<p><strong>Good luck!</strong></p>
<h1 id="heading-part-4-amazing-and-useful-git-tools">Part 4 - Amazing and Useful Git Tools</h1>
<p>Git has lots of commands, and these commands have so many options and arguments. I could try to cover them all (though they do change over time), but I don't see a point in that. You should probably know a subset of these commands really well, those that you use regularly. Then, you can always search for a specific command to perform a task at hand.</p>
<p>This part relies on the basics you acquired in the previous parts, and covers specific commands and options that you may find useful. Given your understanding of how Git works, having these small tools can make you a real pro in Gitting things done.</p>
<h2 id="heading-chapter-12-git-log">Chapter 12 - Git Log</h2>
<p>You used <code>git log</code> many times across different chapters, and you had probably used it many times before reading this book.</p>
<p>Most developers use <code>git log</code>, few use it effectively. In this chapter you will learn useful tweaks for making the most of <code>git log</code>. Once you feel comfortable with the different switches of this command, it will be a game changer in your day to day work with Git.</p>
<p>Thinking about it, <code>git log</code> encompasses the essence of every version control system - that is, to record changes in versions. You record versions so that you can consider the history of your project - perhaps revert or apply specific changes, prefer to switch to a different point in time and test things there. Perhaps you would like to know who contributed a certain piece of code or when they did that.</p>
<p>While <code>git</code> does preserve this information by using commit objects, that also point to their parent commits, and references to commit objects (such as branches or <code>HEAD</code>), this storing of versions is not enough. Without being able to find the relevant commit you would like to consider, or gather the relevant information about it, having this data stored is pretty useless.</p>
<p>You can think of your commit objects as different books that pile up in a huge stack, or in a library, filling long shelves. The information you might need is in these books, but if you don't have an index - a way to know in which book the information you seek lies, or where this book is located within the library - you wouldn't be able to make much use of it. <code>git log</code> is this indexing of your library - it's a way to find the relevant commits and the information about them.</p>
<p>The useful arguments for <code>git log</code> that you will learn in this chapter either format how commits are displayed in the log, or filter specific commits.</p>
<p><code>git lol</code>, an alias which I have used throughout the book, uses some of these switches, as I will demonstrate. Feel free to tweak this alias (or create another from scratch) after reading this chapter.</p>
<p>As in other chapters, the goal is not to provide a complete reference, therefore I will not provide <em>all</em> different switches of <code>git log</code>. I will focus on the switches I believe you will find useful.</p>
<h3 id="heading-filtering-commits">Filtering Commits</h3>
<p>Consider the default output of <code>git log</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_1.png" alt="The output of  without additional switches" width="600" height="400" loading="lazy">
<em>The output of <code>git log</code> without additional switches</em></p>
<p>The log starts from <code>HEAD</code>, and follows the parent chain.</p>
<h4 id="heading-commits-not-reachable-from">Commits (Not) Reachable From...</h4>
<p>When you write <code>git log &lt;revision&gt;</code>, <code>git log</code> will include all entries reachable from <code>&lt;revision&gt;</code>. By "reachable", I refer to reachable by following the parent chain. So running <code>git log</code> without any arguments is equivalent to running <code>git log HEAD</code>.</p>
<p>You can specify multiple revisions for <code>git log</code> - if you write <code>git log branch_1 branch_2</code>, you ask <code>git log</code> to include every commit that is reachable from <code>branch_1</code> or <code>branch_2</code> (or both).</p>
<p><code>git log</code> will <strong>exclude</strong> any commits that are reachable from revisions preceded by a <code>^</code>.</p>
<p>For example, the following command:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span> branch_1 ^branch_2
</code></pre>
<p>asks <code>git log</code> to include every commit that is reachable from <code>branch_1</code>, but not those reachable from <code>branch_2</code>.</p>
<p>Consider the history when I use <code>git log feature_branch_1</code> on this repo:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_2-1.png" alt="Image" width="600" height="400" loading="lazy">
_<code>git log feature_branch_1</code>_</p>
<p>The history includes all commits reachable by <code>feature_branch_1</code>. Since this branch "branched off" <code>main</code> (that is, "Commit 12", which <code>main</code> points to, is reachable from the parent chain) - the log also includes the commits reachable from <code>main</code>.</p>
<p>What would happen if I ran this command?</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span> feature_branch_1 ^main
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_3.png" alt="Image" width="600" height="400" loading="lazy">
_<code>git log feature_branch_1 ^main</code>_</p>
<p>Indeed, <code>git log</code> outputs only "Commit 13" and "Commit 16", which are reachable from <code>feature_branch_1</code> but not from <code>main</code>.</p>
<h4 id="heading-git-log-all"><code>git log --all</code></h4>
<p>To follow commits that are reachable from any named reference or (any refs in <code>refs/</code>) or <code>HEAD</code>.</p>
<h4 id="heading-by-author">By Author</h4>
<p>If you know you are looking for a commit that a specific person has authored, you can filter these commits by using that user's name or email, like so:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span> --author=<span class="hljs-string">"Name"</span>
</code></pre>
<p>You can use regular expressions to look for author names that match a specific pattern, for example:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span> --author=<span class="hljs-string">"John\|Jane"</span>
</code></pre>
<p>will filter commits authored by either John or Jane.</p>
<h4 id="heading-by-date">By Date</h4>
<p>When you know that the change you are looking for has been committed within a specific timeframe, you can use <code>--before</code> or <code>--after</code> to filter commits from that timeframe.</p>
<p>For example, to get all commits introduced after April 12th, 2023 (inclusive), use:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span> --after=<span class="hljs-string">"2023-04-12"</span>
</code></pre>
<h4 id="heading-by-paths">By Paths</h4>
<p>You can ask <code>git log</code> to only show commits where <em>changes</em> to files in specific paths have been introduced. Notice that this does not mean any commit that points to a tree that includes the files in question, but rather that if we compute the difference between the commit in question and its parent, we would see that at least one of the paths has been modified.</p>
<p>For example, you can use:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span> --all -- 1.py
</code></pre>
<p>to find all commits that are reachable from any named pointer, or <code>HEAD</code>, and introduce a change to <code>1.py</code>. You can specify multiple paths:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span> --all -- 1.py 2.py
</code></pre>
<p>The previous command will make <code>git log</code> include reachable commits that introduced a change to <code>1.py</code> or <code>2.py</code> (or both).</p>
<p>You can also use a glob pattern, for example:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span> -- *.py
</code></pre>
<p>will include commits reachable from <code>HEAD</code> that include a change to any file in the root directory whose name ends with a <code>.py</code>. To look for any file whose name ends with <code>.py</code>, you can use:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span> -- **/*.py
</code></pre>
<h4 id="heading-by-commit-message">By Commit Message</h4>
<p>If you know the commit message (or parts of it) of the commit you are searching, you can use the <code>--grep</code> switch for "git log", for example:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span> --grep=<span class="hljs-string">"Commit 12"</span>
</code></pre>
<p>yields back the commit with the message "Commit 12".</p>
<h4 id="heading-by-diff-content">By Diff Content</h4>
<p>This one is super useful, and it saved me countless times. By using <code>git log -S</code>, you can search for commits that introduce or remove a particular line of source code. </p>
<p>This comes in handy, for example, when you know you have created something in the repo, but you don't know where it is now. You can't find it anywhere on your filesystem (it's not in <code>HEAD</code>), and you know it must be there - lurking somewhere in this library (bunch of commits) that you have.</p>
<p>Say I remember I wrote a line with the text <code>Git is awesome</code>, but I can't find it now. I could run:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span> --all -S<span class="hljs-string">"Git is awesome"</span>
</code></pre>
<p>Notice I used <code>--all</code> to avoid restraining myself to commits reachable from <code>HEAD</code>.</p>
<p>You can also search for a regular expression, using <code>-G</code>:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span> --all -G<span class="hljs-string">"Git .* awesome"</span>
</code></pre>
<h3 id="heading-formatting-log">Formatting Log</h3>
<p>Consider the default output of <code>git log</code> again:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_1-1.png" alt="The output of  without additional switches" width="600" height="400" loading="lazy">
<em>The output of <code>git log</code> without additional switches</em></p>
<p>The log starts from <code>HEAD</code>, and follows the parent chain.</p>
<p>Each log entry begins with a line starting with <code>commit</code> and then the SHA-1 of the commit, perhaps followed by additional pointers that point to this commit.<br>It is then followed by the author, date, and commit message.</p>
<h4 id="heading-oneline"><code>--oneline</code></h4>
<p>The main difficulty with the default output of <code>git log</code> is that it is hard to understand a history with more than a few commits, as you simply don't see them all. </p>
<p>In the output of <code>git log</code> shown before, only four commit objects appeared on my screen. Using <code>git log --oneline</code> provides a more concise view, showing the SHA-1 of the commit, next to its message, and named references if relevant:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_5.png" alt="The output of " width="600" height="400" loading="lazy">
<em>The output of <code>git log --oneline</code></em></p>
<p>If you wish to omit the named references, you can add the <code>--no-decorate</code> switch:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_6.png" alt="The output of " width="600" height="400" loading="lazy">
<em>The output of <code>git log --oneline --no-decorate</code></em></p>
<p>To explicitly ask for <code>git log</code> to show decorations, you can use <code>git log --decorate</code>.</p>
<h4 id="heading-graph"><code>--graph</code></h4>
<p><code>git log --oneline</code> shows a compact representation. That is great when we have a linear history, perhaps on a single branch. But what happens when we have multiple branches, that may diverge from one another?</p>
<p>Consider the output of the following command on my repository:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span> --oneline feature_branch_1 feature_branch_2
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_7.png" alt="The output of " width="600" height="400" loading="lazy">
_The output of <code>git log --oneline feature_branch_1 feature_branch_2</code>_</p>
<p><code>git log</code> outputs any commit reachable by <code>feature_branch_1</code>, <code>feature_branch_2</code>, or both. But what does the history look like? Did <code>feature_branch_2</code> diverge from <code>feature_branch_1</code>? Or did it diverge from <code>main</code>? It is impossible to tell from this view. </p>
<p>This is where <code>--graph</code> comes in handy, drawing an ASCII graph representing the branch structure of the commit history. If we add this option to the previous command:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_8.png" alt="The output of " width="600" height="400" loading="lazy">
_The output of <code>git log --oneline --graph feature_branch_1 feature_branch_2</code>_</p>
<p>You can actually <em>see</em> that <code>feature_branch_1</code> branched from <code>main</code> (as "Commit 12", <code>main</code>, is the parent of "Commit 13"), and also that <code>feature_branch_2</code> branched from <code>main</code> (as the parent of "Commit 14" is also "Commit 12").</p>
<p>The <code>*</code> symbol tells us which branch a certain commit is "on", so you can know for sure that "Commit 13" is on <code>feature_branch_1</code>, and not <code>feature_branch_2</code>.</p>
<h4 id="heading-prettyformat"><code>--pretty=format</code></h4>
<p>The above result is already very useful! Yet, it lacks a few things. We don't know the author or the time of the commit. These two information details were included in the default output of <code>git log</code> which was very long. Perhaps we can add them in a more compact way?</p>
<p>By using <code>--pretty=format:</code>, you can display the information of each commit in various ways using <code>printf</code>-style placeholders.</p>
<p>In the following command, the <code>%s</code>, <code>%an</code> and <code>%cd</code> placeholders are replaced by the commit's subject (message), author name, and the commit's date, respectively.</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span> --oneline --graph feature_branch_1 feature_branch_2 --pretty=format:<span class="hljs-string">"%s (%an) [%cd]"</span>
</code></pre>
<p>The output looks like this:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_9.png" alt="Image" width="600" height="400" loading="lazy">
_<code>git log --oneline --graph feature_branch_1 feature_branch_2 --pretty=format:"%s (%an) [%cd]</code>_</p>
<p>That's useful, but not really great to look at. We can then use other formatting tricks, specifically <code>%C(color)</code> that will switch the color to <code>color</code>, until reaching a <code>%Creset</code> that resets the color. To make the author name's yellow, you can use:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span> --oneline --graph feature_branch_1 feature_branch_2 --pretty=format:<span class="hljs-string">"%s %C(yellow)(%an)%Creset [%cd]"</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_10.png" alt="Image" width="600" height="400" loading="lazy">
_<code>git log --oneline --graph feature_branch_1 feature_branch_2 --pretty=format:"%s %C(yellow)(%an)%Creset [%cd]"</code>_</p>
<p>For some colors, like <code>red</code> or <code>green</code>, it is unnecessary to include the parenthesis, so <code>Cred</code> is enough.</p>
<h4 id="heading-how-is-git-lol-structured">How is <code>git lol</code> Structured?</h4>
<p>When I run <code>git lol</code>, it actually executes the following:</p>
<p><code>git log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)&lt;%an&gt;%Creset' --abbrev-commit</code></p>
<p>Can you take this bit by bit?</p>
<p>You already know <code>--graph</code>, which makes the output include an ASCII graph.</p>
<p><code>--abbrev-commit</code> uses a short prefix from the full SHA-1 of the commit (in my configuration, the first seven characters).</p>
<p>The rest is just coloring of various details about the commit:</p>
<pre><code class="lang-bash">git lol --all
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_log_11.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git lol --all</code></em></p>
<p>I like this output because I find it clear. It gives me the information I need, with enough coloring so that every detail stands out without hurting my eyes. But if you prefer other information, other colors, a different order, or anything else - go ahead and tweak it to your liking.</p>
<h3 id="heading-setting-an-alias">Setting an alias</h3>
<p>As you know, I set <code>git lol</code> as an alias - that is, when I run <code>git lol</code>, it executes the long command I provided previously.</p>
<p>How can you create an alias in Git?</p>
<p>The easiest way is to use <code>git alias</code>, like so:</p>
<pre><code class="lang-bash">git config --global alias.co checkout
</code></pre>
<p>This command sets <code>co</code> to be an alias for the command <code>checkout</code>, so you can use <code>git co main</code> instead of <code>git checkout main</code>.</p>
<p>To define <code>git lol</code> as an alias, you can use:</p>
<pre><code class="lang-bash">git config --global alias.lol <span class="hljs-string">'log --graph --pretty=format:'</span>%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)&lt;%an&gt;%Creset<span class="hljs-string">' --abbrev-commit'</span>
</code></pre>
<h2 id="heading-chapter-13-git-bisect">Chapter 13 - Git Bisect</h2>
<p>Oops.</p>
<p>I have a bug.</p>
<p>Yes, that happens some times, to all of us. Something in my system is broken, and I can't tell why. I have been debugging for a while, but the solution is not clear.</p>
<p>I can tell that two weeks ago, this didn't happen. Luckily for me, I have been using Git (obviously, I know...), so I can go back in time and test a past version of my code. Indeed, in this version - everything worked fine.</p>
<p>But... I have made many changes in these two weeks. Alas, not just me - my entire team has contributed commits that add, delete, or modify parts of the code base. Where do I begin? Should I go over every change introduced in those two weeks?</p>
<p>Enter - <code>git bisect</code>.</p>
<p>The goal of <code>git bisect</code> is help you find the commit where a bug was introduced, in an effective manner.</p>
<h3 id="heading-how-does-git-bisect-work">How Does <code>git bisect</code> Work?</h3>
<p><code>git bisect</code> first asks you to mark one commit as "bad" (where the bug occurs), and another commit as "good" (one without the bug). Then, it checks out a commit halfway between these two commits, and then asks you to identify the commit as either "good" or "bad". This process is repeated until you find the first "bad" commit.</p>
<p>The key here is using binary search - by looking at the halfway point and deciding if it is the new top or bottom of the list of commits, you can find the right commit efficiently. Even if you have 10,000 commits to hunt through, it only takes a maximum of 13 steps to find the first commit that introduced the bug.</p>
<h3 id="heading-git-bisect-example"><code>git bisect</code> Example</h3>
<p>For this example, I will use the repository on <a target="_blank" href="https://github.com/Omerr/bisect-exercise.git">https://github.com/Omerr/bisect-exercise.git</a>. To create it, I adapted the open source repository <a target="_blank" href="https://github.com/bast/git-bisect-exercise">https://github.com/bast/git-bisect-exercise</a> (according to its license).</p>
<p>In this repository, we have a single python file that is used to compute the value of pi (which is approximately <code>3.14</code>). If you run <code>python3 get_pi.py</code> on <code>main</code>, however, you will get a wrong result:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/bisect_1.png" alt="A wrong result, we have a bug" width="600" height="400" loading="lazy">
<em>A wrong result, we have a bug</em></p>
<p>This branch consists of more than 500 commits.</p>
<p>Find the first commit on this branch by using:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span> --oneline | tail -n 1
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/bisect_2.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git log --oneline | tail -n 1</code></em></p>
<p>If you <code>checkout</code> to this commit and run <code>python3 get_pi.py</code> again, the result is correct:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/commit_1_pi.png" alt="From the first commit, the result is valid" width="600" height="400" loading="lazy">
<em>From the first commit, the result is valid</em></p>
<p>So somewhere between <code>HEAD</code> and commit <code>f0ea950</code>, a change was introduced that resulted in this wrong output.</p>
<p>To find it using <code>git bisect</code>, <code>start</code> the bisect process, and mark this commit as "good":</p>
<pre><code class="lang-bash">git bisect start
git bisect good
</code></pre>
<p>By default, <code>git bisect good</code> would take <code>HEAD</code> as the "good" commit. To mark <code>main</code> as "bad", you can use <code>git bisect bad main</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/bisect_3.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git bisect bad main</code></em></p>
<p><code>git bisect</code> checked out commit number <code>251</code>, the "middle point" of <code>main</code> branch. Does the state in this commit produce the right or wrong output?</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/bisect_4.png" alt="Trying again..." width="600" height="400" loading="lazy">
<em>Trying again...</em></p>
<p>We still get the wrong output, which means we can discard commits <code>252</code> through <code>500</code> (and additional commits after that), and narrow our search to commits <code>2</code> through <code>251</code>. Mark this as <code>bad</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/bisect_5.png" alt="Mark as " width="600" height="400" loading="lazy">
<em>Mark as <code>bad</code></em></p>
<p><code>git bisect</code> checked out the "middle" commit (number <code>126</code>), and running the code again results in the right answer! This means that this commit is "good", and that the first "bad" commit is somewhere between <code>127</code> and <code>251</code>. Mark it as "good":</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/bisect_6.png" alt="Mark as " width="600" height="400" loading="lazy">
<em>Mark as <code>good</code></em></p>
<p>Nice, <code>git bisect</code> takes us to commit <code>188</code>, as this is the "middle" commit between <code>127</code> and <code>251</code>. By running the code again, you can see that the result is wrong, so this is actually a "bad" commit, which means the first faulty commit is somewhere between <code>127</code> and <code>188</code>. As you can see, <code>git bisect</code> narrows down the search space by half on each iteration.</p>
<p>Come on, now it's your turn - keep going from here! Test the result of <code>python3 get_pi.py</code> and use <code>git bisect good</code> or <code>git bisect bad</code> to mark the commit accordingly. What is the faulty commit?</p>
<p>When you are done, use <code>git bisect reset</code> to stop the bisect process.</p>
<h3 id="heading-automatic-git-bisect">Automatic <code>git bisect</code></h3>
<p>In the previous example, you could simply run <code>python3 get_pi.py</code> and check the result. Other times, the process of validating whether a certain commit is "good" or "bad" can be tricky, error prone, or just time consuming. </p>
<p>It is possible to automate the process of <code>git bisect</code> by creating code that would be executed on each iteration, returning <code>0</code> when the current commit is "good", and a value between <code>1-127</code> (inclusive), except <code>125</code>, if it should be considered "bad".</p>
<p>The syntax is:</p>
<pre><code class="lang-bash">git bisect run my_script arguments
</code></pre>
<p>As this book is not about programming and doesn't assume you know a specific programming language, I will not show an example of implementing <code>my_script</code>. The <code>README.md</code> file in the repository used in this chapter (<a target="_blank" href="https://github.com/Omerr/bisect-exercise.git">https://github.com/Omerr/bisect-exercise.git</a>) includes an example for a script that you can run with <code>git bisect run</code> to automatically find the faulty commit for the previous example.</p>
<h2 id="heading-chapter-14-other-useful-commands">Chapter 14 - Other Useful Commands</h2>
<p>This chapter highlights a few commands that had have already been mentioned in previous chapters. I am putting them here together so that you can come back to them as a reference when needed.</p>
<h3 id="heading-git-cherry-pick"><code>git cherry-pick</code></h3>
<p>Introduced in <a class="post-section-overview" href="#heading-chapter-8-understanding-git-rebase">chapter 8</a>, this command takes a given commit, computes the <strong>patch</strong> this commit introduces by computing the difference between the parent's commit and the commit itself, and then <code>cherry-pick</code> "replays" this difference. It is like "copy-pasting" a commit, that is, the diff this commit introduced.</p>
<p>In <a class="post-section-overview" href="#heading-chapter-8-understanding-git-rebase">chapter 8</a> we considered the difference introduced by "Commit 5" (using <code>git diff main &lt;SHA_OF_COMMIT_5&gt;</code>):</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_main_commit_5-1.png" alt="Running  to observe the patch introduced by &quot;Commit 5&quot;" width="600" height="400" loading="lazy">
<em>Running <code>git diff</code> to observe the patch introduced by "Commit 5"</em></p>
<p>You can see that in this commit, John started working on a song called "Lucy in the Sky with Diamonds":</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_diff_main_commit_5_output-1.png" alt="The output of  - the patch introduced by &quot;Commit 5&quot;" width="600" height="400" loading="lazy">
<em>The output of <code>git diff</code> - the patch introduced by "Commit 5"</em></p>
<p>As a reminder, you can also use the command <code>git show</code> to get the same output:</p>
<pre><code class="lang-bash">git show &lt;SHA_OF_COMMIT_5&gt;
</code></pre>
<p>Now, if you <code>cherry-pick</code> this commit, you will introduce <em>this change</em> specifically, on the active branch. You can switch to <code>main</code> branch:</p>
<pre><code class="lang-bash">git checkout main (or git switch main)
</code></pre>
<p>And create another branch:</p>
<pre><code class="lang-bash">git checkout -b my_branch (or git switch -c my_branch)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/create_my_branch-1.png" alt="Creating  that branches from " width="600" height="400" loading="lazy">
_Creating <code>my_branch</code> that branches from <code>main</code>_</p>
<p>Next, <code>cherry-pick</code> "Commit 5":</p>
<pre><code class="lang-bash">git cherry-pick &lt;SHA_OF_COMMIT_5&gt;
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/cherry_pick_commit_5-1.png" alt="Using  to apply the changes introduced in &quot;Commit 5&quot; onto " width="600" height="400" loading="lazy">
<em>Using <code>cherry-pick</code> to apply the changes introduced in "Commit 5" onto <code>main</code></em></p>
<p>Consider the log (output of <code>git lol</code>):</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_lol_commit_5-1.png" alt="The output of " width="600" height="400" loading="lazy">
<em>The output of <code>git lol</code></em></p>
<p>It seems like you <em>copy-pasted</em> "Commit 5". Remember that even though it has the same commit message, and introduces the same changes, and even points to the same tree object as the original "Commit 5" in this case - it is still a different commit object, as it was created with a different timestamp.</p>
<p>Looking at the changes, using <code>git show HEAD</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/git_show_HEAD-3.png" alt="The output of " width="600" height="400" loading="lazy">
<em>The output of <code>git show HEAD</code></em></p>
<p>They are the same as "Commit 5"'s.</p>
<h3 id="heading-git-revert-1"><code>git revert</code></h3>
<p><code>git revert</code> is essentially the reverse of <code>git cherry-pick</code>, introduced in <a class="post-section-overview" href="#heading-chapter-10-additional-tools-for-undoing-changes">chapter 10</a>. This command takes the commit you're providing it with and computes the diff from its parent commit, just like <code>git cherry-pick</code>, but this time, it computes the <em>reverse</em> changes. That is, if in the specified commit you added a line, the reverse would delete the line, and vice versa.</p>
<h3 id="heading-git-add-p"><code>git add -p</code></h3>
<p>Staging changes is an integral part of introducing changes to Git. Sometimes, you wish to stage all changes together (with <code>git add .</code>), or perhaps stage all changes of a specific file (using <code>git add &lt;file_path&gt;</code>). Yet there are times where it would be convenient to stage only certain parts of modified files.</p>
<p>In <a target="_blank" href="https://www.freecodecamp.org/news/p/f7b355ea-3f22-4613-8218-e95c67779d9f/chapter-6-diffs-and-patches">chapter 6</a>, we introduced <code>git add -p</code>. This command allows you to stage certain parts of files, by splitting them into hunks (<code>p</code> stands for <code>patch</code>). For example, say you have this file, <code>my_file.py</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/my_file_py_1.png" alt="Image" width="600" height="400" loading="lazy">
_<code>my_file.py</code>_</p>
<p>You then modify this file - by changing text within <code>function_1</code>, and also adding a new function, <code>function_5</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/my_file_py_2.png" alt=" after the changes" width="600" height="400" loading="lazy">
_<code>my_file.py</code> after the changes_</p>
<p>If you used <code>git add my_file.py</code> at this point, you would stage both of these changes together. In case you want to separate them into different commits, you could use <code>git add -p</code>, which splits these two changes and asks you about each one as a standalone hunk:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/add_p_1.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git add -p</code></em></p>
<p>By typing <code>?</code>, you can see what the different options stand for:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/add_p_2.png" alt="Using a  to get a description of the different options" width="600" height="400" loading="lazy">
<em>Using a <code>?</code> to get a description of the different options</em></p>
<p>In this case, say we only want to stage the change introducing <code>function_5</code>. We do not want to stage the change of <code>function_1</code>, so we select <code>n</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/12/add_p_3.png" alt="Not staging the change to " width="600" height="400" loading="lazy">
_Not staging the change to <code>function_1</code>_</p>
<p>Next, we are prompted for the second change - the one introducing <code>function_5</code>. We want to stage this hunk indeed, to can do so we can type <code>y</code>.</p>
<h1 id="heading-summary">Summary</h1>
<p>Well, this was FUN!</p>
<p>Can you believe how much you have learned?</p>
<p>In <strong>Part 1</strong> you learned about - blobs, trees, and commits.</p>
<p>You then learned about <strong>branches</strong>, seeing that they are nothing but a named reference to a commit.</p>
<p>You learned the process of recording changes in Git, and that it involves the <strong>working directory</strong>, the <strong>staging area (index)</strong>, and the <strong>repository</strong>.</p>
<p>Then - you created a new repository from scratch, by using <code>echo</code> and low-level commands such as <code>git hash-object</code>. You created a blob, a tree, and a commit object pointing to that tree.</p>
<p>In <strong>Part 2</strong> you learned about branching and integrating changes in Git.</p>
<p>You learned what a <strong>diff</strong> is, and the difference between a diff and a <strong>patch</strong>. You also learned how the output of <code>git diff</code> is constructed.</p>
<p>Then, you got an extensive overview of merging with Git, specifically understanding the three-way merge algorithm. You understood when <strong>merging conflicts</strong> occur, when Git can resolve them automatically, and how to resolve them manually when needed.</p>
<p>You saw that <code>git rebase</code> is powerful - but also that it is quite simple once you understand what it does. You understood the differences between merging and rebasing, and when you should use each.</p>
<p>In <strong>Part 3</strong> you learned how to <strong>undo changes</strong> in Git - especially when things go wrong. You learned how to use a bunch of tools, like <code>git reset</code>, <code>git commit --amend</code>, <code>git revert</code>, <code>git reflog</code> (and <code>git log -g</code>).</p>
<p>The most important tool, even more important than the tools I just listed, is to whiteboard the current situation vs the desired one. Trust me on this, it will make every situation seem less daunting and the solution more clear.</p>
<p>In <strong>Part 4</strong> you acquired additional powerful tools, like different switches of <code>git log</code>, <code>git bisect</code>, <code>git cherry-pick</code>, <code>git revert</code> and <code>git add -p</code>.</p>
<p>Wow, you should be proud of yourself!</p>
<h3 id="heading-a-message-from-me-to-you">A Message From Me to You</h3>
<p>Indeed, this was fun, but all things must pass. You finished reading this book, but this doesn't mean your learning journey ends here.</p>
<p>What you have acquired, more than any specific tool, is intuition and understanding of how Git operates, and how to think about various operations in Git. Keep researching, reading, and using Git. I am sure you will be able to teach me something new, and by all means - please do.</p>
<p>If you liked this book, please share it with more people.</p>
<p>If you want to read more of my Git articles and handbooks, here they are:</p>
<ol>
<li><a target="_blank" href="https://www.freecodecamp.org/news/git-rebase-handbook/">The Git Rebase Handbook</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/the-definitive-guide-to-git-merge/">The Git Merge Handbook</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/git-diff-and-patch/">The Git Diff and Patch Handbook</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/git-internals-objects-branches-create-repo/">Git Internals - Objects, Branches, and How to Create a Repo</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/save-the-day-with-git-reset/">Git Reset Command Explained</a></li>
</ol>
<h3 id="heading-acknowledgements">Acknowledgements</h3>
<p>Many people helped make this book the best it can be. Among them, I was lucky to have many beta readers that provided me with feedback so that I can improve the book. Specifically, I would like to thank Jason S. Shapiro, Anna Łapińska, C. Bruce Hilbert, and Jonathon McKitrick for their thorough reviews.</p>
<p>Abbey Rennemeyer has been a wonderful editor. After she has reviewed my posts for freeCodeCamp for over three years, it was clear that I would like to ask her to be the editor of this book as well. She helped me improve the book in many ways, and I am grateful for her help.</p>
<p>Quincy Larson founded the amazing community at freeCodeCamp, motivated me throughout emails and face to face discussions. I thank him for starting this incredible community, and for his friendship.</p>
<p>Estefania Cassingena Navone designed the cover of this book. I am grateful for her professional work and her patience with my perfectionism and requests.</p>
<p>Daphne Gray-Grant's website, <a target="_blank" href="https://www.publicationcoach.com/">"Publication Coach"</a>, has provided me with inspiring as well as technical advice that has greatly helped me with my writing process.</p>
<h3 id="heading-if-you-wish-to-support-this-book">If You Wish to Support This Book</h3>
<p>If you would like to support this book, you are welcome to buy the <a target="_blank" href="https://www.amazon.com/dp/B0CQXTJ5V5">Paperback version</a>, an <a target="_blank" href="https://www.buymeacoffee.com/omerr/e/197232">E-Book version</a>, or <a target="_blank" href="https://www.buymeacoffee.com/omerr">buy me a coffee</a>. Thank you!</p>
<h3 id="heading-contact-me">Contact Me</h3>
<p>This book has been created to help you and people like you learn, understand Git, and apply their knowledge in real life. </p>
<p>Right from the beginning, I asked for feedback and was lucky to receive it from great people (mentioned in the <a class="post-section-overview" href="#heading-acknowledgements">Acknowledgements</a>) to make sure the book achieves these goals. If you liked something about this book, felt that something was missing or needed improvement - I would love to hear from you. Please reach out at <a target="_blank" href="mailto:gitting.things@gmail.com">gitting.things@gmail.com</a>.</p>
<p>Thank you for learning and allowing me to be a part of your journey.</p>
<ul>
<li>Omer Rosenbaum</li>
</ul>
<h1 id="heading-appendixes">Appendixes</h1>
<h2 id="heading-additional-references-by-part">Additional References - By Part</h2>
<p>(Note - this is a short list. You can find a longer list of references on the <a target="_blank" href="https://www.buymeacoffee.com/omerr/e/197232">E-Book</a> or <a target="_blank" href="https://www.amazon.com/dp/B0CQXTJ5V5">printed</a> version.)</p>
<h3 id="heading-part-1">Part 1</h3>
<ul>
<li>Git Internals YouTube playlist - by Brief:<br><a target="_blank" href="https://www.youtube.com/playlist?list=PL9lx0DXCC4BNUby5H58y6s2TQVLadV8v7">https://www.youtube.com/playlist?list=PL9lx0DXCC4BNUby5H58y6s2TQVLadV8v7</a></li>
<li>Tim Berglund's lecture  - "Git From the Bits Up":<br><a target="_blank" href="https://www.youtube.com/watch?v=MYP56QJpDr4">https://www.youtube.com/watch?v=MYP56QJpDr4</a></li>
<li>as promised, docs: Git for the confused:<br><a target="_blank" href="https://www.gelato.unsw.edu.au/archives/git/0512/13748.html">https://www.gelato.unsw.edu.au/archives/git/0512/13748.html</a></li>
</ul>
<h3 id="heading-part-2">Part 2</h3>
<h4 id="heading-diffs-and-patches">Diffs and Patches</h4>
<p>Git Diffs algorithms:</p>
<ul>
<li><a target="_blank" href="https://en.wikipedia.org/wiki/Diff">https://en.wikipedia.org/wiki/Diff</a></li>
</ul>
<p>The most default diff algorithm in Git is Myers:</p>
<ul>
<li><a target="_blank" href="https://www.nathaniel.ai/myers-diff/">https://www.nathaniel.ai/myers-diff/</a></li>
<li><a target="_blank" href="https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-part-1/">https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-part-1/</a></li>
<li><a target="_blank" href="https://blog.robertelder.org/diff-algorithm/">https://blog.robertelder.org/diff-algorithm/</a></li>
</ul>
<h4 id="heading-git-merge">Git Merge</h4>
<ul>
<li><a target="_blank" href="https://git-scm.com/book/en/v2/Git-Tools-Advanced-Merging">https://git-scm.com/book/en/v2/Git-Tools-Advanced-Merging</a></li>
<li><a target="_blank" href="https://blog.plasticscm.com/2010/11/live-to-merge-merge-to-live.html">https://blog.plasticscm.com/2010/11/live-to-merge-merge-to-live.html</a></li>
</ul>
<h4 id="heading-git-rebase">Git Rebase</h4>
<ul>
<li><a target="_blank" href="https://jwiegley.github.io/git-from-the-bottom-up/1-Repository/7-branching-and-the-power-of-rebase.html">https://jwiegley.github.io/git-from-the-bottom-up/1-Repository/7-branching-and-the-power-of-rebase.html</a></li>
<li><a target="_blank" href="https://git-scm.com/book/en/v2/Git-Branching-Rebasing">https://git-scm.com/book/en/v2/Git-Branching-Rebasing</a></li>
</ul>
<h4 id="heading-beatles-related-resources-1">Beatles-Related Resources</h4>
<ul>
<li><a target="_blank" href="https://www.the-paulmccartney-project.com/song/ive-got-a-feeling/">https://www.the-paulmccartney-project.com/song/ive-got-a-feeling/</a></li>
<li><a target="_blank" href="https://www.cheatsheet.com/entertainment/did-john-lennon-or-paul-mccartney-write-the-classic-a-day-in-the-life.html/">https://www.cheatsheet.com/entertainment/did-john-lennon-or-paul-mccartney-write-the-classic-a-day-in-the-life.html/</a></li>
<li><a target="_blank" href="http://lifeofthebeatles.blogspot.com/2009/06/ive-got-feeling-lyrics.html">http://lifeofthebeatles.blogspot.com/2009/06/ive-got-feeling-lyrics.html</a></li>
</ul>
<h3 id="heading-part-3">Part 3</h3>
<ul>
<li><a target="_blank" href="https://git-scm.com/book/en/v2/Git-Tools-Reset-Demystified">https://git-scm.com/book/en/v2/Git-Tools-Reset-Demystified</a></li>
<li><a target="_blank" href="https://www.edureka.co/blog/common-git-mistakes/">https://www.edureka.co/blog/common-git-mistakes/</a></li>
</ul>
<h1 id="heading-about-the-author">About the Author</h1>
<p><a target="_blank" href="https://www.linkedin.com/in/omer-rosenbaum-034a08b9/">Omer Rosenbaum</a> is <a target="_blank" href="https://swimm.io/">Swimm</a>’s Chief Technology Officer. He's the author of the <a target="_blank" href="https://youtube.com/@BriefVid">Brief YouTube Channel</a>. He's also a cyber training expert and founder of Checkpoint Security Academy. He's the author of <a target="_blank" href="https://data.cyber.org.il/networks/networks.pdf">Computer Networks (in Hebrew)</a>. You can find him on <a target="_blank" href="https://twitter.com/Omer_Ros">Twitter</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ The Git Rebase Handbook – A Definitive Guide to Rebasing ]]>
                </title>
                <description>
                    <![CDATA[ One of the most powerful tools a developer can have in their toolbox is git rebase. Yet it is notorious for being complex and misunderstood.  The truth is, if you understand what it actually does, git rebase is a very elegant, and straightforward too... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/git-rebase-handbook/</link>
                <guid isPermaLink="false">66c17c2632867815f7100b62</guid>
                
                    <category>
                        <![CDATA[ Git ]]>
                    </category>
                
                    <category>
                        <![CDATA[ version control ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Omer Rosenbaum ]]>
                </dc:creator>
                <pubDate>Mon, 03 Jul 2023 13:56:34 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/07/The-Git-Rebase-Handbook-Book-Cover--1-.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>One of the most powerful tools a developer can have in their toolbox is <code>git rebase</code>. Yet it is notorious for being complex and misunderstood. </p>
<p>The truth is, if you understand what it <em>actually</em> does, <code>git rebase</code> is a very elegant, and straightforward tool to achieve so many different things in Git.</p>
<p>In previous posts, you understood <a target="_blank" href="https://www.freecodecamp.org/news/git-diff-and-patch/">what Git diffs are</a>, <a target="_blank" href="https://www.freecodecamp.org/news/the-definitive-guide-to-git-merge/">what a merge is</a>, and <a target="_blank" href="https://www.freecodecamp.org/news/the-definitive-guide-to-git-merge/">how Git resolves merge conflicts</a>. In this post, you will understand what Git rebase is, why it's different from merge, and how to rebase with confidence 💪🏻</p>
<h2 id="heading-notes-before-we-start">Notes before we start</h2>
<ol>
<li>I also created a video covering the contents of this post. If you wish to watch alongside reading, you can find it <a target="_blank" href="https://youtu.be/3VFsitGUB3s">here</a>.</li>
<li>If you want to play around with the repository I used and try out the commands for yourself, you can get the repo <a target="_blank" href="https://github.com/Omerr/rebase_playground">here</a>.</li>
<li>I am working on a book about Git! Are you interested in reading the initial versions and providing feedback? Send me an email: <a target="_blank" href="https://www.freecodecamp.org/news/p/2e1fc200-f447-4f55-b0a3-73ef790a2190/gitting.things@gmail.com">gitting.things@gmail.com</a></li>
</ol>
<p>OK, are you ready?</p>
<h1 id="heading-short-recap-what-is-git-merge">Short Recap - What is Git Merge? 🤔</h1>
<p>Under the hood, <code>git rebase</code> and <code>git merge</code> are very, very different things. Then why do people compare them all the time?</p>
<p>The reason is their usage. When working with Git, we usually work in different branches and introduce changes to those branches. </p>
<p>In <a target="_blank" href="https://www.freecodecamp.org/news/the-definitive-guide-to-git-merge/#howgits3waymergealgorithmworks">a previous tutorial</a>, I gave an example where John and Paul (of the Beatles) were co-authoring a new song. They started from the <code>main</code> branch, and then each diverged, modified the lyrics and committed their changes. </p>
<p>Then, the two wanted to integrate their changes, which is something that happens very frequently when working with Git.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-197.png" alt="Image" width="600" height="400" loading="lazy">
_A diverging history - <code>paul_branch</code> and <code>john_branch</code> diverged from <code>main</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)_</p>
<p>There are two main ways to integrate changes introduced in different branches in Git, or in other words, different commits and commit histories. These are merge and rebase.</p>
<p><a target="_blank" href="https://www.freecodecamp.org/news/the-definitive-guide-to-git-merge/">In a previous tutorial</a>, we got to know <code>git merge</code> pretty well. We saw that when performing a merge, we create a <strong>merge commit</strong> – where the contents of this commit are a combination of the two branches, and it also has two parents, one in each branch.</p>
<p>So, say you are on the branch <code>john_branch</code> (assuming the history depicted in the drawing above), and you run <code>git merge paul_branch</code>. You will get to this state – where on <code>john_branch</code>, there is a new commit with two parents. The first one will be the commit on <code>john_branch</code> branch where <code>HEAD</code> was pointing to before performing the merge, in this case - "Commit 6". The second will be the commit pointed to by <code>paul_branch</code>, "Commit 9".</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-196.png" alt="Image" width="600" height="400" loading="lazy">
_The result of running <code>git merge paul_branch</code>: a new Merge Commit with two parents (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)_</p>
<p>Look again at the history graph: you created a <strong>diverged</strong> history. You can actually see where it branched and where it merged again.</p>
<p>So when using <code>git merge</code>, you do not rewrite history – but rather, you add a commit to the existing history. And specifically, a commit that creates a diverged history.</p>
<h1 id="heading-how-is-git-rebase-different-than-git-merge">How is <code>git rebase</code> Different than <code>git merge</code>? 🤔</h1>
<p>When using <code>git rebase</code>, something different happens. 🥁</p>
<p>Let's start with the big picture: if you are on <code>paul_branch</code>, and use <code>git rebase john_branch</code>, Git goes to the common ancestor of John's branch and Paul's branch. Then it takes the patches introduced in the commits on Paul's branch, and applies those changes to John's branch. </p>
<p>So here, you use <code>rebase</code> to take the changes that were committed on one branch – Paul's branch – and replay them on a different branch, <code>john_branch</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-198.png" alt="Image" width="600" height="400" loading="lazy">
_The result of running <code>git rebase john_branch</code>: the commits on <code>paul_branch</code> were "replayed" on top of <code>john_branch</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)_</p>
<p>Wait, what does that mean? 🤔</p>
<p>We will now take this bit by bit to make sure you fully understand what's happening under the hood 😎</p>
<h1 id="heading-cherry-pick-as-a-basis-for-rebase"><code>cherry-pick</code> as a Basis for Rebase</h1>
<p>It is useful to think of rebase as performing <code>git cherry-pick</code> – a command takes a commit, computes the <em>patch</em> this commit introduces by computing the difference between the parent's commit and the commit itself, and then <code>cherry-pick</code> "replays" this difference.</p>
<p>Let's do this manually.</p>
<p>If we look at the difference introduced by "Commit 5" by performing <code>git diff main &lt;SHA_OF_COMMIT_5&gt;</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-199.png" alt="Image" width="600" height="400" loading="lazy">
<em>Running <code>git diff</code> to observe the patch introduced by "Commit 5" (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>(If you want to play around with the repository I used and try out the commands for yourself, you can get the repo <a target="_blank" href="https://github.com/Omerr/rebase_playground">here</a>).</p>
<p>You can see that in this commit, John started working on a song called "Lucy in the Sky with Diamonds":</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-200.png" alt="Image" width="600" height="400" loading="lazy">
<em>The output of <code>git diff</code> - the patch introduced by "Commit 5" (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>As a reminder, you can also use the command <code>git show</code> to get the same output:</p>
<pre><code>git show &lt;SHA_OF_COMMIT_5&gt;
</code></pre><p>Now, if you <code>cherry-pick</code> this commit, you will introduce this change specifically, on the active branch. Switch to <code>main</code> first:</p>
<p><code>git checkout main</code> (or <code>git switch main</code>)</p>
<p>And create another branch, just to be clear:</p>
<p><code>git checkout -b my_branch</code> (or <code>git switch -c my_branch</code>)</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-201.png" alt="Image" width="600" height="400" loading="lazy">
_Creating <code>my_branch</code> that branches from <code>main</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)_</p>
<p>And <code>cherry-pick</code> this commit:</p>
<pre><code>git cherry-pick &lt;SHA_OF_COMMIT_5&gt;
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-202.png" alt="Image" width="600" height="400" loading="lazy">
<em>Using <code>cherry-pick</code> to apply the changes introduced in "Commit 5" onto <code>main</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Consider the log (output of <code>git lol</code>):</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-205.png" alt="Image" width="600" height="400" loading="lazy">
<em>The output of <code>git lol</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>(<code>git lol</code> is an alias I added to Git to visibly see the history in a graphical manner. You can find it <a target="_blank" href="https://gist.github.com/Omerr/8134a61b56ca82dd90e546e7ef04eb77">here</a>).</p>
<p>It seems like you <em>copy-pasted</em> "Commit 5". Remember that even though it has the same commit message, and introduces the same changes, and even points to the same tree object as the original "Commit 5" in this case – it is still a different commit object, as it was created with a different timestamp.</p>
<p>Looking at the changes, using <code>git show HEAD</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-204.png" alt="Image" width="600" height="400" loading="lazy">
<em>The output of <code>git show HEAD</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>They are the same as "Commit 5"'s.</p>
<p>And of course, if you look at the file (say, by using <code>nano lucy_in_the_sky_with_diamonds.md</code>), it will be in the same state as it has been after the original "Commit 5".</p>
<p>Cool! 😎</p>
<p>OK, you can now remove the new branch so it doesn't appear on your history every time:</p>
<pre><code>git checkout main
git branch -D my_branch
</code></pre><h2 id="heading-beyond-cherry-pick-how-to-use-git-rebase">Beyond <code>cherry-pick</code> – How to Use <code>git rebase</code></h2>
<p>You can look at <code>git rebase</code> as a way to perform multiple <code>cherry-pick</code>s one after the other – that is, to "replay" multiple commits. This is not the only thing you can do with <code>rebase</code>, but it's a good starting point for our explanation.</p>
<p>It's time to play with <code>git rebase</code>! 👏🏻👏🏻</p>
<p>Before, you merged <code>paul_branch</code> into <code>john_branch</code>. What would happen if you <em>rebased</em> <code>paul_branch</code> on top of  <code>john_branch</code>? You would get a very different history.</p>
<p>In essence, it would seem as if we took the changes introduced in the commits on <code>paul_branch</code>, and replayed them on <code>john_branch</code>. The result would be a <strong>linear</strong> history.</p>
<p>To understand the process, I will provide the high level view, and then dive deeper into each step. The process of rebasing one branch on top of another branch is as follows:</p>
<ol>
<li>Find the common ancestor.</li>
<li>Identify the commits to be "replayed".</li>
<li>For every commit <code>X</code>, compute <code>diff(parent(X), X)</code>, and store it as a <code>patch(X)</code>.</li>
<li>Move <code>HEAD</code> to the new base.</li>
<li>Apply the generated patches in order on the target branch. Each time, create a new commit object with the new state.</li>
</ol>
<p>The process of making new commits with the same changesets as existing ones is also called <strong>"replaying"</strong> those commits, a term we have already used.</p>
<h1 id="heading-time-to-get-hands-on-with-rebase"><strong>Time to Get Hands-On with Rebase🙌🏻</strong></h1>
<p>Start from Paul's branch:</p>
<pre><code>git checkout paul_branch
</code></pre><p>This is the history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-206.png" alt="Image" width="600" height="400" loading="lazy">
<em>Commit history before performing <code>git rebase</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>And now, to the exciting part:</p>
<pre><code>git rebase john_branch
</code></pre><p>And observe the history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-207.png" alt="Image" width="600" height="400" loading="lazy">
<em>The history after rebasing (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>( <code>gg</code> is an alias for an external tool I introduced <a target="_blank" href="https://youtu.be/3VFsitGUB3s">in the video</a>).</p>
<p>So whereas with <code>git merge</code> you added to the history, with <code>git rebase</code> you <strong>rewrite history</strong>. You create <strong>new</strong> commit objects. In addition, the result is a linear history graph – rather than a diverging graph.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-209.png" alt="Image" width="600" height="400" loading="lazy">
<em>The history after rebasing (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>In essence, we "copied" the commits that were on <code>paul_branch</code> and introduced after "Commit 4", and "pasted" them on top of <code>john_branch</code>.</p>
<p>The command is called "rebase", because it changes the base commit of the branch it's run from. That is, in your case, before running <code>git rebase</code>, the base of <code>paul_branch</code> was "Commit 4" – as this is where the branch was "born" (from <code>main</code>). With <code>rebase</code>, you asked Git to give it another base – that is, pretend as if it had been born from "Commit 6".</p>
<p>To do that, Git took what used to be "Commit 7", and "replayed" the changes introduced in this commit onto "Commit 6", and then created a new commit object. This object differs from the original "Commit 7" in three aspects:</p>
<ol>
<li>It has a different timestamp.</li>
<li>It has a different parent commit – "Commit 6" rather than "Commit 4".</li>
<li>The <a target="_blank" href="https://www.freecodecamp.org/news/git-internals-objects-branches-create-repo/">tree object</a> it is pointing to is different - as the changes were introduced to the tree pointed to by "Commit 6", and not the tree pointed to by "Commit 4".</li>
</ol>
<p>Notice the last commit here, "Commit 9'". The snapshot it represents (that is, the <a target="_blank" href="https://www.freecodecamp.org/news/git-internals-objects-branches-create-repo/">tree</a> that it points to) is exactly the same tree you would get by merging the two branches. The state of the files in your Git repository would be <strong>the same</strong> as if you used <code>git merge</code>. It's only the history that is different, and the commit objects of course.</p>
<p>Now, you can simply use:</p>
<pre><code>git checkout main
git merge paul_branch
</code></pre><p>Hm.... What would happen if you ran this last command? 🤔 Consider the commit history again, after checking out <code>main</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-210.png" alt="Image" width="600" height="400" loading="lazy">
<em>The history after rebasing and checking out <code>main</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>What would it mean to merge <code>main</code> and <code>paul_branch</code>?</p>
<p>Indeed, Git can simply perform a fast-forward merge, as the history is completely linear (if you need a reminder about fast forward merges, check out <a target="_blank" href="https://www.freecodecamp.org/news/the-definitive-guide-to-git-merge/#timetogethandson">this post</a>). As a result, <code>main</code> and <code>paul_branch</code> now point to the same commit:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-211.png" alt="Image" width="600" height="400" loading="lazy">
<em>The result of a fast-forward merge (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<h1 id="heading-advanced-rebasing-in-git">Advanced Rebasing in Git💪🏻</h1>
<p>Now that you understand the basics of rebase, it is time to consider more advanced cases, where additional switches and arguments to the <code>rebase</code> command will come in handy.</p>
<p>In the previous example, when you only said <code>rebase</code> (without additional switches), Git replayed all the commits from the common ancestor to the tip of the current branch.</p>
<p>But rebase is a super-power, it's an almighty command capable of…well, rewriting history. And it can come in handy if you want to modify history to make it your own.</p>
<p>Undo the last merge by making <code>main</code> point to "Commit 4" again:</p>
<pre><code>git reset -–hard &lt;ORIGINAL_COMMIT <span class="hljs-number">4</span>&gt;
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-238.png" alt="Image" width="600" height="400" loading="lazy">
<em>"Undoing" the last merge operation (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>And undo the rebasing by using:</p>
<pre><code>git checkout paul_branch
git reset -–hard &lt;ORIGINAL_COMMIT <span class="hljs-number">9</span>&gt;
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-239.png" alt="Image" width="600" height="400" loading="lazy">
<em>"Undoing" the rebase operation (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Notice that you got to exactly the same history you used to have:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-240.png" alt="Image" width="600" height="400" loading="lazy">
<em>Visualizing the history after "undoing" the rebase operation (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Again, to be clear, "Commit 9" doesn't just disappear when it's not reachable from the current <code>HEAD</code>. Rather, it's still stored in the object database. And as you used <code>git reset</code> now to change <code>HEAD</code> to point to this commit, you were able to retrieve it, and also its parent commits since they are also stored in the database. Pretty cool, huh? 😎</p>
<p>OK, quickly view the changes that Paul introduced:</p>
<pre><code>git show HEAD
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-241.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git show HEAD</code> shows the patch introduced by "Commit 9" (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Keep going backwards in the commit graph:</p>
<pre><code>git show HEAD~
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-242.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git show HEAD~</code> (same as <code>git show HEAD~1</code>) shows the patch introduced by "Commit 8" (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>And one commit further:</p>
<pre><code>git show HEAD~<span class="hljs-number">2</span>
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-243.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git show HEAD~2</code> shows the patch introduced by "Commit 7" (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>So, these changes are nice, but perhaps Paul doesn't want this kind of history. Rather, he wants it to seem as if he introduced the changes in "Commit 7" and "Commit 8" as a single commit.</p>
<p>For that, you can use an <strong>interactive</strong> rebase. To do that, we add the <code>-i</code> (or <code>--interactive</code>) switch to the <code>rebase</code> command:</p>
<pre><code>git rebase -i &lt;SHA_OF_COMMIT_4&gt;
</code></pre><p>Or, since <code>main</code> is pointing to "Commit 4", we can simply run:</p>
<pre><code>git rebase -i main
</code></pre><p>By running this command, you tell Git to use a new base, "Commit 4". So you are asking Git to go back to all commits that were introduced after "Commit 4" and that are reachable from the current <code>HEAD</code>, and replay those commits.</p>
<p>For every commit that is replayed, Git asks us what we'd like to do with it:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-250.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git rebase -i main</code> prompts you to select what to do with each commit (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>In this context it's useful to think of a commit as a patch. That is, "Commit 7" as in "the patch that "Commit 7" introduced on top of its parent".</p>
<p>One option is to use <code>pick</code>. This is the default behavior, which tells Git to replay the changes introduced in this commit. In this case, if you just leave it as is – and <code>pick</code> all commits – you will get the same history, and Git won't even create new commit objects.</p>
<p>Another option is <code>squash</code>. A <em>squashed</em> commit will have its contents "folded" into the contents of the commit preceding it. So in our case, Paul would like to squash "Commit 8" into "Commit 7":</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-251.png" alt="Image" width="600" height="400" loading="lazy">
<em>Squashing "Commit 8" into "Commit 7" (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>As you can see, <code>git rebase -i</code> provides additional options, but we won't go into all of them in this post. If you allow the rebase to run, you will get prompted to select a commit message for the newly created commit (that is, the one that introduced the changes of both "Commit 7" and "Commit 8"):</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-252.png" alt="Image" width="600" height="400" loading="lazy">
<em>Providing the commit message: <code>Commits 7+8</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>And look at the history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-253.png" alt="Image" width="600" height="400" loading="lazy">
<em>The history after the interactive rebase (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Exactly as we wanted! We have on <code>paul_branch</code> "Commit 9" (of course, it's a different object than the original "Commit 9"). This points to "Commits 7+8", which is a single commit introducing the changes of both the original "Commit 7" and the original "Commit 8". This commit's parent is "Commit 4", where <code>main</code> is pointing to. You have <code>john_branch</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-254.png" alt="Image" width="600" height="400" loading="lazy">
<em>The history after the interactive rebase - visualized (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Oh wow, isn't that cool? 😎</p>
<p><code>git rebase</code> grants you unlimited control over the shape of any branch. You can use it to reorder commits, or to remove incorrect changes, or modify a change in retrospect. Alternatively, you could perhaps move the base of your branch onto another commit, any commit that you wish.</p>
<h2 id="heading-how-to-use-the-onto-switch-of-git-rebase">How to Use the <code>--onto</code> Switch of <code>git rebase</code></h2>
<p>Let's consider one more example. Get to <code>main</code> again:</p>
<pre><code>git checkout main
</code></pre><p>And delete the pointers to <code>paul_branch</code> and <code>john_branch</code> so you don't see them in the commit graph anymore:</p>
<pre><code>git branch -D paul_branch
git branch -D john_branch
</code></pre><p>And now branch from <code>main</code> to a new branch:</p>
<pre><code>git checkout -b new_branch
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-255.png" alt="Image" width="600" height="400" loading="lazy">
_Creating <code>new_branch</code> that diverges from <code>main</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)_</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-256.png" alt="Image" width="600" height="400" loading="lazy">
_A clean history with <code>new_branch</code> that diverges from <code>main</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)_</p>
<p>Now, add a few changes here and commit them:</p>
<pre><code>nano code.py
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-257.png" alt="Image" width="600" height="400" loading="lazy">
_Adding the function <code>new_branch</code> to <code>code.py</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)_</p>
<pre><code>git add code.py
git commit -m <span class="hljs-string">"Commit 10"</span>
</code></pre><p>Get back to <code>main</code>:</p>
<pre><code>git checkout main
</code></pre><p>And introduce another change:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-258.png" alt="Image" width="600" height="400" loading="lazy">
<em>Added a docstring at the beginning of the file (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Time to stage and commit these changes:</p>
<pre><code>git add code.py
git commit -m <span class="hljs-string">"Commit 11"</span>
</code></pre><p>And yet another change:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-259.png" alt="Image" width="600" height="400" loading="lazy">
<em>Added <code>@Author</code> to the docstring (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Commit this change as well:</p>
<pre><code>git add code.py
git commit -m <span class="hljs-string">"Commit 12"</span>
</code></pre><p>Oh wait, now I realize that I wanted you to make the changes introduced in "Commit 11" as a part of the <code>new_branch</code>. Ugh. What can you do? 🤔</p>
<p>Consider the history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-260.png" alt="Image" width="600" height="400" loading="lazy">
<em>The history after introducing "Commit 12" (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>What I want is, instead of having "Commit 10" reside only on the <code>main</code> branch, I want it to be on both the <code>main</code> branch as well as the <code>new_branch</code>. Visually, I would want to move it down the graph here:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-261.png" alt="Image" width="600" height="400" loading="lazy">
<em>Visually, I want you to "push" "Commit 10" (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Can you see where I am going? 😇</p>
<p>Well, as we understand, rebase allows us to basically <em>replay</em> the changes introduced in <code>new_branch</code>, those introduced in "Commit 10", as if they had been originally conducted on "Commit 11", rather than "Commit 4".</p>
<p>To do that, you can use other arguments of <code>git rebase</code>. You'd tell Git that you want to take all the history introduced between the common ancestor of <code>main</code> and <code>new_branch</code>, which is "Commit 4", and have the new base for that history be "Commit 11". To do that, use:</p>
<pre><code>git rebase -–onto &lt;SHA_OF_COMMIT_11&gt; main new_branch
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-262.png" alt="Image" width="600" height="400" loading="lazy">
<em>The history before and after the rebase, "Commit 10" has been "pushed" (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>And look at our beautiful history! 😍</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-263.png" alt="Image" width="600" height="400" loading="lazy">
<em>The history before and after the rebase, "Commit 10" has been "pushed" (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Let's consider another case.</p>
<p>Say I started working on a branch, and by mistake I started working from <code>feature_branch_1</code>, rather than from <code>main</code>.</p>
<p>So to emulate this, create <code>feature_branch_1</code>:</p>
<pre><code>git checkout main
git checkout -b feature_branch_1
</code></pre><p>And erase <code>new_branch</code> so you don't see it in the graph anymore:</p>
<pre><code>git branch -D new_branch
</code></pre><p>Create a simple Python file called <code>1.py</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-264.png" alt="Image" width="600" height="400" loading="lazy">
<em>A new file, <code>1.py</code>, with <code>print('Hello world!')</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Stage and commit this file:</p>
<pre><code>git add <span class="hljs-number">1.</span>py
git commit -m  <span class="hljs-string">"Commit 13"</span>
</code></pre><p>Now branched out (by mistake) from <code>feature_branch_1</code>:</p>
<pre><code>git checkout -b feature_branch_2
</code></pre><p>And create another file, <code>2.py</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-265.png" alt="Image" width="600" height="400" loading="lazy">
<em>Creating <code>2.py</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Stage and commit this file as well:</p>
<pre><code>git add <span class="hljs-number">2.</span>py
git commit -m  <span class="hljs-string">"Commit 14"</span>
</code></pre><p>And introduce some more code to <code>2.py</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-266.png" alt="Image" width="600" height="400" loading="lazy">
<em>Modifying <code>2.py</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Stage and commit these changes too:</p>
<pre><code>git add <span class="hljs-number">2.</span>py
git commit -m  <span class="hljs-string">"Commit 15"</span>
</code></pre><p>So far you should have this history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-267.png" alt="Image" width="600" height="400" loading="lazy">
<em>The history after introducing "Commit 15" (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Get back to <code>feature_branch_1</code> and edit <code>1.py</code>:</p>
<pre><code>git checkout feature_branch_1
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-268.png" alt="Image" width="600" height="400" loading="lazy">
<em>Modifying <code>1.py</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Now stage and commit:</p>
<pre><code>git add <span class="hljs-number">1.</span>py
git commit -m  <span class="hljs-string">"Commit 16"</span>
</code></pre><p>Your history should look like this:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-270.png" alt="Image" width="600" height="400" loading="lazy">
<em>The history after introducing "Commit 16" (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Say now you realize, you've made a mistake. You actually wanted <code>feature_branch_2</code> to be born from the <code>main</code> branch, rather than from <code>feature_branch_1</code>.</p>
<p>How can you achieve that? 🤔</p>
<p>Try to think about it given the history graph and what you've learned about the <code>--onto</code> flag for the <code>rebase</code> command.</p>
<p>Well, you want to "replace" the parent of your first commit on <code>feature_branch_2</code>, which is "Commit 14", to be on top of <code>main</code> branch, in this case, "Commit 12", rather than the beginning of <code>feature_branch_1</code>, in this case, "Commit 13". So again, you will be creating a <em>new base,</em> this time for the first commit on <code>feature_branch_2</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-271.png" alt="Image" width="600" height="400" loading="lazy">
<em>You want to move around "Commit 14" and "Commit 15" (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>How would you do that?</p>
<p>First, switch to <code>feature_branch_2</code>:</p>
<pre><code>git checkout feature_branch_2
</code></pre><p>And now you can use:</p>
<pre><code>git rebase -–onto main &lt;SHA_OF_COMMIT_13&gt;
</code></pre><p>As a result, you have <code>feature_branch_2</code> based on <code>main</code> rather than <code>feature_branch_1</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-272.png" alt="Image" width="600" height="400" loading="lazy">
<em>The commit history after performing rebase (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>The syntax is of the command is:</p>
<pre><code>git rebase --onto &lt;new_parent&gt; <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">old_parent</span>&gt;</span></span>
</code></pre><h2 id="heading-how-to-rebase-on-a-single-branch">How to rebase on a single branch</h2>
<p>You can also use <code>git rebase</code> while looking at a history of a single branch.</p>
<p>Let's see if you can help me here.</p>
<p>Say I worked from <code>feature_branch_2</code>, and specifically edited the file <code>code.py</code>. I started by changing all strings to be wrapped by double quotes rather than single quotes:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-273.png" alt="Image" width="600" height="400" loading="lazy">
<em>Changing <code>'</code> into <code>"</code> in <code>code.py</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Then, I staged and committed:</p>
<pre><code>git add code.py
git commit -m <span class="hljs-string">"Commit 17"</span>
</code></pre><p>I then decided to add a new function at the beginning of the file:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-274.png" alt="Image" width="600" height="400" loading="lazy">
_Adding the function <code>another_feature</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)_</p>
<p>Again, I staged and committed:</p>
<pre><code>git add code.py
git commit -m <span class="hljs-string">"Commit 18"</span>
</code></pre><p>And now I realized I actually forgot to change the single quotes to double quotes wrapping the <code>__main__</code> (as you might have noticed), so I did that too:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-275.png" alt="Image" width="600" height="400" loading="lazy">
<em>Changing <code>'__main__'</code> into <code>"__main__"</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Of course, I staged and committed this change:</p>
<pre><code>git add code.py
git commit -m <span class="hljs-string">"Commit 19"</span>
</code></pre><p>Now, consider the history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-276.png" alt="Image" width="600" height="400" loading="lazy">
<em>The commit history after introducing "Commit 19" (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>It isn't really nice, is it? I mean, I have two commits that are related to one another, "Commit 17" and "Commit 19" (turning <code>'</code>s into <code>"</code>s), but they are split by the unrelated "Commit 18" (where I added a new function). What can we do? 🤔 Can you help me?</p>
<p>Intuitively, I want to edit the history here:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-277.png" alt="Image" width="600" height="400" loading="lazy">
<em>These are the commits I want to edit (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p> So, what would you do?</p>
<p>You are right! 👏🏻</p>
<p>I can rebase the history from "Commit 17" to "Commit 19", on top of "Commit 15". To do that:</p>
<pre><code>git rebase --interactive --onto &lt;SHA_OF_COMMIT_15&gt; <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">SHA_OF_COMMIT_15</span>&gt;</span></span>
</code></pre><p>Notice I specified "Commit 15" as the beginning of the range of commits, excluding this commit. And I didn't need to explicitly specify <code>HEAD</code> as the last parameter.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-279.png" alt="Image" width="600" height="400" loading="lazy">
<em>Using <code>rebase --onto</code> on a single branch (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>After following your advice and running the <code>rebase</code> command (thanks! 😇) I get the following screen:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-280.png" alt="Image" width="600" height="400" loading="lazy">
<em>Interactive rebase (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>So what would I do? I want to put "Commit 19" <em>before</em> "Commit 18", so it comes right after "Commit 17". I can go further and squash them together, like so:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-281.png" alt="Image" width="600" height="400" loading="lazy">
<em>Interactive rebase - changing the order of commit and squashing (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Now when I get prompted for a commit message, I can provide the message "Commit 17+19":</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-282.png" alt="Image" width="600" height="400" loading="lazy">
<em>Providing a commit message (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>And now, see our beautiful history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-283.png" alt="Image" width="600" height="400" loading="lazy">
<em>The resulting history (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Thanks again! 🙌🏻</p>
<h1 id="heading-more-rebase-use-cases-more-practice">More Rebase Use Cases + More Practice</h1>
<p>By now I hope you feel comfortable with the syntax of rebase. The best way to actually understand it is to consider various cases and figure out how to solve them yourself. </p>
<p>With the upcoming use cases, I strongly suggest you stop reading after I've introduced each use case, and then try to solve it on your own.</p>
<h2 id="heading-how-to-exclude-commits">How to Exclude Commits</h2>
<p>Say you have this history on another repo:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-284.png" alt="Image" width="600" height="400" loading="lazy">
<em>Another commit history (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Before playing around with it, store a tag to "Commit F" so you can get back to it later:</p>
<pre><code>git tag original_commit_f
</code></pre><p>Now, you actually don't want the changes in "Commit C" and "Commit D" to be included. You could use an interactive rebase like before and remove their changes. Or, could can use again <code>git rebase -–onto</code>. How would you use <code>--onto</code> in order to "remove" these two commits?</p>
<p>You can rebase <code>HEAD</code> on top of "Commit B", where the old parent was actually "Commit D", and now it should be "Commit B". Consider the history again:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-284.png" alt="Image" width="600" height="400" loading="lazy">
<em>The history again (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Rebasing so that "Commit B" is the base of "Commit E", means "moving" both "Commit E" and "Commit F", and giving them another <em>base</em> – "Commit B". Can you come up with the command yourself?</p>
<pre><code>git rebase --onto &lt;SHA_OF_COMMIT_B&gt; <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">SHA_OF_COMMIT_D</span>&gt;</span> HEAD</span>
</code></pre><p>Notice that using the syntax above would not move <code>main</code> to point to the new commit, so the result is a "detached" <code>HEAD</code>. If you use <code>gg</code> or another tool that displays the history reachable from branches it might confuse you:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-285.png" alt="Image" width="600" height="400" loading="lazy">
<em>Rebasing with <code>--onto</code> results in a detached <code>HEAD</code> (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>But if you simply use <code>git log</code> (or my alias <code>git lol</code>), you will see the desired history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-286.png" alt="Image" width="600" height="400" loading="lazy">
<em>The resulting history (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>I don't know about you, but these kinds of things make me really happy. 😊😇</p>
<p>By the way, you could omit <code>HEAD</code> from the previous command as this is the default value for the third parameter. So just using:</p>
<pre><code>git rebase --onto &lt;SHA_OF_COMMIT_B&gt; <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">SHA_OF_COMMIT_D</span>&gt;</span></span>
</code></pre><p>Would have the same effect. The last parameter actually tells Git where the end of the current sequence of commits to rebase is. So the syntax of <code>git rebase --onto</code> with three arguments is:</p>
<pre><code>git rebase --onto &lt;new_parent&gt; &lt;old_parent&gt; &lt;until&gt;
</code></pre><h2 id="heading-how-to-move-commits-across-branches">How to move commits across branches</h2>
<p>So let's say we get to the same history as before:</p>
<pre><code>git checkout original_commit_f
</code></pre><p>And now I want only "Commit E", to be on a branch based on "Commit B". That is, I want to have a new branch, branching from "Commit B", with only "Commit E".</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-287.png" alt="Image" width="600" height="400" loading="lazy">
<em>The current history, considering "Commit E" (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>So, what does this mean in terms of rebase? Consider the image above. What commit (or commits) should I rebase, and which commit would be the new base?</p>
<p>I know I can count on you here 😉</p>
<p>What I want is to take "Commit E", and this commit only, and change its base to be "Commit B". In other words, to <em>replay</em> the changes introduced in "Commit E" onto "Commit B".</p>
<p>Can you apply that logic to the syntax of <code>git rebase</code>?</p>
<p>Here it is (this time I'm writing <code>&lt;COMMIT_B&gt;</code> instead of <code>&lt;SHA_OF_COMMIT_B&gt;</code>, for brevity):</p>
<pre><code>git rebase –-onto &lt;COMMIT_B&gt; &lt;COMMIT_D&gt; &lt;COMMIT_E&gt;
</code></pre><p>Now the history looks like so:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-288.png" alt="Image" width="600" height="400" loading="lazy">
<em>The history after rebase (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Awesome!</p>
<h1 id="heading-a-note-about-conflicts">A Note About Conflicts</h1>
<p>Note that when performing a rebase, you may run into conflicts just as when merging. You may have conflicts because when rebasing, you are trying to apply patches on a different base, perhaps where the patches do not apply.</p>
<p>For example, consider the previous repository again, and specifically, consider the change introduced in "Commit 12", pointed to by <code>main</code>:</p>
<pre><code>git show main
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-289.png" alt="Image" width="600" height="400" loading="lazy">
<em>The patch introduced in "Commit 12" (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>I already covered the format of <code>git diff</code> in detail in <a target="_blank" href="https://www.freecodecamp.org/news/git-diff-and-patch/">a previous post</a>, but as a quick reminder, this commit instructs Git to add a line after the two lines of context:</p>
<pre><code>
</code></pre><p>This is a sample file</p>
<pre><code>
And before these three lines <span class="hljs-keyword">of</span> context:
</code></pre><pre><code>def new_feature():
  print(<span class="hljs-string">'new feature'</span>)
</code></pre><p>Say you are trying to rebase "Commit 12" onto another commit. If, for some reason, these context lines don't exist as they do in the patch on the commit you are rebasing <em>onto</em>, then you will have a conflict. To learn more about conflicts and how to resolve them, see <a target="_blank" href="https://www.freecodecamp.org/news/the-definitive-guide-to-git-merge/">this guide</a>.</p>
<h1 id="heading-zooming-out-for-the-big-picture">Zooming Out for the Big Picture</h1>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-290.png" alt="Image" width="600" height="400" loading="lazy">
<em>Comparing rebase and merge (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>In the beginning of this guide, I started by mentioning the similarity between <code>git merge</code> and <code>git rebase</code>: both are used to integrate changes introduced in different histories. </p>
<p>But, as you now know, they are very different in how they operate. While merging results in a diverged history, rebasing results in a linear history. Conflicts are possible in both cases. And there is one more column described in the table above that requires some close attention.</p>
<p>Now that you know what "Git rebase" is, and how to use interactive rebase or <code>rebase --onto</code>, as I hope you agree, <code>git rebase</code> is a super powerful tool. Yet, it has one huge drawback when compared with merging.</p>
<p>Git rebase changes the history.</p>
<p>This means that you should <strong>not</strong> rebase commits that exist outside your local copy of the repository, and that other people may have based their commits on.</p>
<p>In other words, if the only commits in question are those you created locally – go ahead, use rebase, go wild.</p>
<p>But if the commits have been pushed, this can lead to a huge problem – as someone else may rely on these commits, that you later overwrite, and then you and they will have different versions of the repository. </p>
<p>This is unlike <code>merge</code> which, as we have seen, does not modify history.</p>
<p>For example, consider the last case where we rebased and resulted in this history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/06/image-288.png" alt="Image" width="600" height="400" loading="lazy">
<em>The history after rebase (Source: <a target="_blank" href="https://youtu.be/3VFsitGUB3s">Brief</a>)</em></p>
<p>Now, assume that I have already pushed this branch to the remote. And after I had pushed the branch, another developer pulled it and branched out from "Commit C". The other developer didn't know that meanwhile, I was locally rebasing my branch, and would later push it again.</p>
<p>This results in an inconsistency: the other developer works from a commit that is no longer available on my copy of the repository.</p>
<p>I will not elaborate on what exactly this causes in this guide, as my main message is that you should definitely avoid such cases. If you're interested in what would actually happen, I'll leave a link to a useful resource below. For now, let's summarize what we have covered.</p>
<h1 id="heading-recap">Recap</h1>
<p>In this tutorial, you learned about <code>git rebase</code>, a super-powerful tool to rewrite history in Git. You considered a few use cases where <code>git rebase</code> can be helpful, and how to use it with one, two, or three parameters, with and without the <code>--onto</code> switch.</p>
<p>I hope I was able to convince you that <code>git rebase</code> is powerful – but also that it is quite simple once you get the gist. It is a tool to "copy-paste" commits (or, more accurately, patches). And it's a useful tool to have under your belt.</p>
<h1 id="heading-additional-references">Additional References</h1>
<ul>
<li><a target="_blank" href="https://www.youtube.com/playlist?list=PL9lx0DXCC4BNUby5H58y6s2TQVLadV8v7">Git Internals YouTube playlist — by Brief</a> (my YouTube channel).</li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/git-internals-objects-branches-create-repo/">Omer's previous post about Git internals.</a></li>
<li><a target="_blank" href="https://medium.com/@Omer_Rosenbaum/git-undo-how-to-rewrite-git-history-with-confidence-d4452e2969c2">Omer's tutorial about Git UNDO - rewriting history with Git</a>.</li>
<li><a target="_blank" href="https://git-scm.com/book/en/v2/Git-Branching-Rebasing">Git docs on rebasing</a></li>
<li><a target="_blank" href="https://jwiegley.github.io/git-from-the-bottom-up/1-Repository/7-branching-and-the-power-of-rebase.html">Branching and the power of rebase</a></li>
<li><a target="_blank" href="https://jwiegley.github.io/git-from-the-bottom-up/1-Repository/8-interactive-rebasing.html">Interactive rebasing</a></li>
<li><a target="_blank" href="https://womanonrails.com/git-rebase-onto">Git rebase --onto</a></li>
</ul>
<h1 id="heading-about-the-author"><strong>About the Author</strong></h1>
<p><a target="_blank" href="https://www.linkedin.com/in/omer-rosenbaum-034a08b9/">Omer Rosenbaum</a> is <a target="_blank" href="https://swimm.io/">Swimm</a>’s Chief Technology Officer. He's the author of the <a target="_blank" href="https://youtube.com/@BriefVid">Brief YouTube Channel</a>. He's also a cyber training expert and founder of Checkpoint Security Academy. He's the author of <a target="_blank" href="https://data.cyber.org.il/networks/networks.pdf">Computer Networks (in Hebrew)</a>. You can find him on <a target="_blank" href="https://twitter.com/Omer_Ros">Twitter</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ The Git Merge Handbook – Definitive Guide to Merging in Git ]]>
                </title>
                <description>
                    <![CDATA[ By reading this post, you are going to really understand git merge, one of the most common operations you'll perform in your Git repositories. Notes before we start I also created two videos covering the contents of this post. If you wish to watch a... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/the-definitive-guide-to-git-merge/</link>
                <guid isPermaLink="false">66c17c4958ee0865d2671b62</guid>
                
                    <category>
                        <![CDATA[ Git ]]>
                    </category>
                
                    <category>
                        <![CDATA[ handbook ]]>
                    </category>
                
                    <category>
                        <![CDATA[ version control ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Omer Rosenbaum ]]>
                </dc:creator>
                <pubDate>Thu, 27 Apr 2023 17:07:19 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/07/The-Git-Merge-Handbook-Book-Cover.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By reading this post, you are going to <em>really</em> understand <code>git merge</code>, one of the most common operations you'll perform in your Git repositories.</p>
<h2 id="heading-notes-before-we-start">Notes before we start</h2>
<ol>
<li>I also created two videos covering the contents of this post. If you wish to watch alongside reading, you can find them here (<a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Part 1</a>, <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Part 2</a>).</li>
<li>I am working on a book about Git! Are you interested in reading the initial versions and providing feedback? Send me an email: gitting.things@gmail.com</li>
</ol>
<p>OK, are you ready?</p>
<h1 id="heading-table-of-contents">Table of Contents</h1>
<ul>
<li><a class="post-section-overview" href="#heading-what-is-a-merge-in-git">What is a Merge in Git?</a></li>
<li><a class="post-section-overview" href="#heading-time-to-get-hands-on">Time to Get Hands-on 🙌🏻</a></li>
<li><a class="post-section-overview" href="#heading-time-for-a-more-advanced-case">Time For a More Advanced Case</a></li>
<li><a class="post-section-overview" href="#heading-quick-recap-on-a-three-way-merge">Quick recap on a three-way merge</a></li>
<li><a class="post-section-overview" href="#heading-moving-on">Moving on 👣</a></li>
<li><a class="post-section-overview" href="#heading-more-advanced-git-merge-cases">More Advanced Git Merge Cases</a></li>
<li><a class="post-section-overview" href="#heading-how-gits-3-way-merge-algorithm-works">How Git's 3-way Merge Algorithm Works</a></li>
<li><a class="post-section-overview" href="#heading-how-to-resolve-merge-conflicts">How to Resolve Merge Conflicts</a></li>
<li><a class="post-section-overview" href="#heading-how-to-use-vs-code-to-resolve-conflicts">How to Use VS Code to Resolve Conflicts</a></li>
<li><a class="post-section-overview" href="#heading-one-more-powerful-tool">One More Powerful Tool 🪛</a></li>
<li><a class="post-section-overview" href="#heading-recap">Recap</a></li>
</ul>
<h1 id="heading-what-is-a-merge-in-git">What is a Merge in Git?</h1>
<p>Merging is the process of combining the recent changes from several branches into a single new commit that will be on all those branches.</p>
<p>In a way, merging is the complement of branching in version control: a branch allows you to work simultaneously with others on a particular set of files, whereas a merge allows you to later combine separate work on branches that diverged from a common ancestor commit.</p>
<p>OK, let's take this bit by bit.</p>
<p>Remember that in Git, <a target="_blank" href="https://www.freecodecamp.org/news/git-internals-objects-branches-create-repo/">a branch is just a name pointing to a single commit</a>. When we think about commits as being "on" a specific branch, they are actually reachable through the parent chain from the commit that the branch is pointing to. </p>
<p>That is, if you consider this commit graph:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-140.png" alt="Image" width="600" height="400" loading="lazy">
_Commit graph with two pointers (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>You see the branch <code>feature_1</code>, which points to a commit with the SHA-1 value of <code>ba0d2</code>. Of course, as in other posts, I only write the first 5 digits of the SHA-1 value. </p>
<p>Notice that commit <code>54a9d</code> is also on this branch, as it is the parent commit of <code>ba0d2</code>. So if you start from the pointer of <code>feature_1</code>, you get to <code>ba0d2</code>, which then points to <code>54a9d</code>.</p>
<p>When you merge with Git, you merge <strong>commits</strong>. Almost always, we merge two commits by referring to them with the branch names that point to them. Thus we say we "merge branches" – though under the hood, we actually merge commits.</p>
<h1 id="heading-time-to-get-hands-on">Time to Get Hands-on 🙌🏻</h1>
<p>OK, so let's say I have this simple repository here, with a branch called <code>main</code>, and a few commits with the commit messages of "Commit 1", "Commit 2" and "Commit 3":</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-141.png" alt="Image" width="600" height="400" loading="lazy">
_A simple repository with three commits (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Next, create a feature branch by typing <code>git branch new_feature</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-142.png" alt="Image" width="600" height="400" loading="lazy">
_Creating a new branch with <code>git branch</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>And switch <code>HEAD</code> to point to this new branch, by using <code>git checkout new_feature</code>. You can look at the outcome by using <code>git log</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-143.png" alt="Image" width="600" height="400" loading="lazy">
_The output of <code>git log</code> after using <code>git checkout new_feature</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>As a reminder, you could also write <code>git checkout -b new_feature</code>, which would both create a new branch and change <code>HEAD</code> to point to this new branch. </p>
<p>If you need a reminder about branches and how they're implemented under the hood, please check out <a target="_blank" href="https://www.freecodecamp.org/news/git-internals-objects-branches-create-repo/">a previous post on the subject</a>. Yes, check out. Pun intended 😇</p>
<p>Now, on the <code>new_feature</code> branch, implement a new feature. In this example I will edit an existing file that looks like this before the edit:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-144.png" alt="Image" width="600" height="400" loading="lazy">
_<code>code.py</code> before editing it (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>And I will now edit it to include a new function:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-145.png" alt="Image" width="600" height="400" loading="lazy">
_Implementing <code>new_feature</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>And thankfully, this is not a programming tutorial, so this function is legit 😇<br>Next, stage and commit this change:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-146.png" alt="Image" width="600" height="400" loading="lazy">
_Committing the changes to "Commit 4" (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Looking at the history, you have the branch <code>new_feature</code>, now pointing to "Commit 4", which points to its parent, "Commit 3". The branch <code>main</code> is also pointing to "Commit 3".</p>
<p>Time to merge the new feature! That is, merge these two branches, <code>main</code> and <code>new_feature</code>. Or, in Git's lingo, merge <code>new_feature</code> <em>into</em> <code>main</code>. This means merging "Commit 4" and "Commit 3". This is pretty trivial, as after all, "Commit 3" is an ancestor of "Commit 4".</p>
<p>Check out the main branch (with <code>git checkout main</code>), and perform the merge by using <code>git merge new_feature</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-197.png" alt="Image" width="600" height="400" loading="lazy">
_Merging <code>new_feature</code> into <code>main</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Since <code>new_feature</code> never really <em>diverged</em> from <code>main</code>, Git could just perform a fast-forward merge. So what happened here? Consider the history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/05/image--7-.png" alt="Image" width="600" height="400" loading="lazy">
_The result of a fast-forward merge (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Even though you used <code>git merge</code>, there was no actual merging here. Actually, Git did something very simple – it reset the <code>main</code> branch to point to the same commit as the branch <code>new_feature</code>.</p>
<p>In case you don't want that to happen, but rather you want Git to really perform a merge, you could either change Git's configuration, or run the <code>merge</code> command with the <code>--no-ff</code> flag.</p>
<p>First, undo the last commit:</p>
<pre><code class="lang-git">git reset --hard HEAD~1
</code></pre>
<p>If this way of using reset is not clear to you, feel free to check out <a target="_blank" href="https://medium.com/@Omer_Rosenbaum/git-undo-how-to-rewrite-git-history-with-confidence-d4452e2969c2">a post where I covered <code>git reset</code> in depth</a>. It is not crucial for this introduction of <code>merge</code>, though. For now, it's important to understand that it basically undoes the merge operation.</p>
<p>Just to clarify, now if you checked out <code>new_feature</code> again:</p>
<pre><code class="lang-git">git checkout new_feature
</code></pre>
<p>The history would look just like before the merge:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/05/image--8-.png" alt="Image" width="600" height="400" loading="lazy">
_The history after using <code>git reset --hard HEAD~1</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Next, perform the merge with the <code>--no-fast-forward</code> flag (<code>--no-ff for short</code>):</p>
<pre><code class="lang-git">git checkout main
git merge new_feature --no-ff
</code></pre>
<p>Now, if we look at the history using <code>git lol</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-200.png" alt="Image" width="600" height="400" loading="lazy">
_History after merging with the <code>--no-ff</code> flag (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>(<code>git lol</code> is an alias I added to Git to visibly see the history in a graphical manner. You can find it <a target="_blank" href="https://gist.github.com/Omerr/8134a61b56ca82dd90e546e7ef04eb77">here</a>).</p>
<p>Considering this history, you can see Git created a new commit, a merge commit.</p>
<p>If you consider this commit a bit closer:</p>
<pre><code>git log -n1
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-201.png" alt="Image" width="600" height="400" loading="lazy">
_The merge commit has two parents (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>You will see that this commit actually has two parents – "Commit 4", which was the commit that <code>new_feature</code> pointed to when you ran <code>git merge</code>, and "Commit 3", which was the commit that <code>main</code> pointed to. So a merge commit has two parents: the two commits it merged.</p>
<p>The merge commit shows us the concept of merge quite well. Git takes two commits, usually referenced by two different branches, and merges them together. </p>
<p>After the merge, as you started the process from <code>main</code>, you are still on <code>main</code>, and the history from <code>new_feature</code> has been merged into this branch. Since you started with <code>main</code>, then "Commit 3", which <code>main</code> pointed to, is the first parent of the merge commit, whereas "Commit 4", which you merged <em>into</em> <code>main</code>, is the second parent of the merge commit.</p>
<p>Notice that you started on <code>main</code> when it pointed to "Commit 3", and Git went quite a long way for you. It changed the working tree, the index, and also <code>HEAD</code> and created a new commit object. At least when you use <code>git merge</code> without the <code>--no-commit</code> flag and when it's not a fast-forward merge, Git does all of that.</p>
<p>This was a super simple case, where the branches you merged didn't diverge at all.</p>
<p>By the way, you can use <code>git merge</code> to merge more than two commits – actually, any number of commits. This is rarely done and I don't see a good reason to elaborate on it here.</p>
<p>Another way to think of <code>git merge</code> is by joining two or more <em>development histories</em> together. That is, when you merge, you incorporate changes from the named commits, since the time their histories diverged <em>from</em> the current branch, <em>into</em> the current branch. I used the term <code>branch</code> here, but I am stressing this again – we are actually merging commits.</p>
<h1 id="heading-time-for-a-more-advanced-case">Time For a More Advanced Case 💪🏻</h1>
<p>Time to consider a more advanced case, which is probably the most common case where we use <code>git merge</code> explicitly – where you need to merge branches that <em>did</em> diverge from one another.</p>
<p>Assume we have two people working on this repo now, John and Paul.</p>
<p>John created a branch:</p>
<pre><code>git checkout -b john_branch
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-348.png" alt="Image" width="600" height="400" loading="lazy">
_A new branch, <code>john_branch</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>And John has written a new song in a new file, <code>lucy_in_the_sky_with_diamonds.md</code>. Well, I believe John Lennon didn't really write in Markdown format, or use Git for that matter, but let's pretend he did for this explanation.</p>
<pre><code>git add lucy_in_the_sky_with_diamonds.md
git commit -m <span class="hljs-string">"Commit 5"</span>
</code></pre><p>While John was working on this song, Paul was also writing, on another branch. Paul had started from <code>main</code>:</p>
<pre><code>git checkout main
</code></pre><p>And created his own branch:</p>
<pre><code>git checkout -b paul_branch
</code></pre><p>And Paul wrote his song into a file:</p>
<pre><code>nano penny_lane.md
</code></pre><p>And committed it:</p>
<pre><code>git add penny_lane.md
git commit -m <span class="hljs-string">"Commit 6"</span>
</code></pre><p>So now our history looks like this – where we have two different branches, branching out from <code>main</code>, with different histories.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-203.png" alt="Image" width="600" height="400" loading="lazy">
_The output of <code>git lol</code> shows the history after John and Paul committed (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>John is happy with his branch (that is, his song), so he decides to merge it into the <code>main</code> branch:</p>
<pre><code>git checkout main
git merge john_branch
</code></pre><p>Actually, this is a fast-forward merge, as we have learned before. You can validate that by looking at the history (using <code>git lol</code>, for example):</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-349.png" alt="Image" width="600" height="400" loading="lazy">
_Merging <code>john_branch</code> into <code>main</code> results in a fast-forwrad merge (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>At this point, Paul also wants to merge his branch into <code>main</code>, but now a fast-forward merge is no longer relevant – there are two <em>different</em> histories here: the history of <code>main</code>'s and that of <code>paul_branch</code>'s. It's not that <code>paul_branch</code> only adds commits on top of <code>main</code> branch or vice versa.</p>
<p>Now things get interesting. 😎😎</p>
<p>First, let Git do the hard work for you. After that, we will understand what's actually happening under the hood.</p>
<pre><code>git merge paul_branch
</code></pre><p>Consider the history now:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-206.png" alt="Image" width="600" height="400" loading="lazy">
_When you merge <code>paul_branch</code>, you get a new merge commit (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>What you have is a new commit, with two parents – "Commit 5" and "Commit 6".
In the working dir, you can see that both John's song as well as Paul's song are there:
<code>ls</code></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-242.png" alt="Image" width="600" height="400" loading="lazy">
_The working dir after the merge (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Nice, Git really did merge the changes for us. But how does that happen?</p>
<p>Undo this last commit:</p>
<pre><code>git reset --hard HEAD~
</code></pre><h2 id="heading-how-to-perform-a-three-way-merge-in-git">How to perform a three-way merge in Git</h2>
<p>It's time to understand what's really happening under the hood. 😎</p>
<p>What Git has done here is it called a <code>3-way merge</code>. In outlining the process of a 3-way merge, I will use the term "branch" for simplicity, but you should remember you could also merge two (or more) commits that are not referenced by a branch.</p>
<p>The 3-way merge process includes these stages:</p>
<p>First, Git locates the common ancestor of the two branches. That is, the common commit from which the merging branches most recently diverged. Technically, this is actually the first commit that is reachable from both branches. This commit is then called the <strong>merge base</strong>.</p>
<p>Second, Git calculates two diffs – one diff from the merge base to the first branch, and another diff from the merge base to the second branch. Git generates patches based on those diffs.</p>
<p>Third, Git applies both patches to the merge base using a 3-way merge algorithm. The result is the state of the new, merge commit.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-357.png" alt="Image" width="600" height="400" loading="lazy">
_The three steps of the 3-way merge algorithm: (1) locate the common ancestor; (2) calculate diffs from the merge base to the first branch, and from the merge base to the second branch; (3) apply both patches together (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>So, back to our example.</p>
<p>In the first step, Git looks from both branches – <code>main</code> and <code>paul_branch</code> – and traverses the history to find the first commit that is reachable from both. In this case, this would be...which commit?</p>
<p>Correct, "Commit 4".</p>
<p>If you are not sure, you can always ask Git directly:</p>
<pre><code>git merge-base main paul_branch
</code></pre><p>By the way, this is the most common and simple case, where we have a single obvious choice for the merge base. In more complicated cases, there may be multiple possibilities for a merge base, but this is a topic for another post.</p>
<p>In the second step, Git calculates the diffs. So it first calculates the diff between "Commit 4" and "Commit 5":</p>
<pre><code>git diff <span class="hljs-number">4</span>f90a62 <span class="hljs-number">4683</span>aef
</code></pre><p>(The SHA-1 values will be different on your machine)</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-247.png" alt="Image" width="600" height="400" loading="lazy">
_The diff between "Commit 4" and "Commit 5" (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>If you don't feel comfortable with the output of <code>git diff</code>, please read <a target="_blank" href="https://www.freecodecamp.org/news/git-diff-and-patch/">the previous post</a> where I described it in detail.</p>
<p>You can store that diff to a file:</p>
<pre><code>git diff <span class="hljs-number">4</span>f90a62 <span class="hljs-number">4683</span>aef &gt; john_branch_diff.patch
</code></pre><p>Next, Git calculates the diff between "Commit 4" and "Commit 6":</p>
<pre><code>git diff <span class="hljs-number">4</span>f90a62 c5e4951
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-249.png" alt="Image" width="600" height="400" loading="lazy">
_The diff between "Commit 4" and "Commit 6" (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Write this one to a file as well:</p>
<pre><code>git diff <span class="hljs-number">4</span>f90a62 c5e4951 &gt; paul_branch_diff.patch
</code></pre><p>Now Git applies those patches on the merge base. </p>
<p>First, try that out directly – just apply the patches (I will walk you through it in a moment). This is <em>not</em> what Git really does under the hood, but it will help you gain a better understanding of why Git needs to do something different.</p>
<p>Checkout the merge base first, that is, "Commit 4":</p>
<pre><code>git checkout <span class="hljs-number">4</span>f90a62
</code></pre><p>And apply John's patch first:</p>
<pre><code>git apply -–index john_branch_diff.patch
</code></pre><p>Notice that for now there is no merge commit. <code>git apply</code> updates the working dir as well as the index, as we used the <code>--index</code> switch.</p>
<p>You can observe the status using <code>git status</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-250.png" alt="Image" width="600" height="400" loading="lazy">
_Applying John's patch on "Commit 4" (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>So now John's new song is incorporated into the index. Apply the other patch:</p>
<p><code>git apply -–index paul_branch_diff.patch</code></p>
<p>As a result, the index contains changes from both branches.</p>
<p>Now it's time to commit your merge. Since the porcelain command <code>git commit</code> always generates a commit with a <em>single</em> parent, you would need the underlying plumbing command – <code>git commit-tree</code>. </p>
<p>If you need a reminder about porcelain vs plumbing commands, check out <a target="_blank" href="https://medium.com/swimm/getting-hardcore-creating-a-repo-from-scratch-cc747edbb11c">the post where I explained these terms, and created an entire repo from scratch</a>.</p>
<p>Remember that <a target="_blank" href="https://medium.com/swimm/a-visualized-intro-to-git-internals-objects-and-branches-68df85864037">every Git commit object points to a single tree</a>. So you need to record the contents of the index in a tree:</p>
<pre><code>git write-tree
</code></pre><p>Now you get the SHA-1 value of the created tree, and you can create a commit object using <code>git commit-tree</code>:</p>
<pre><code>git commit-tree &lt;TREE_SHA&gt; -p &lt;COMMIT_4&gt; -p &lt;COMMIT_5&gt; -m <span class="hljs-string">"Merge commit!"</span>
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-251.png" alt="Image" width="600" height="400" loading="lazy">
_Creating a merge commit (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Great, so you have created a commit object 💪🏻</p>
<p>Recall that <code>git merge</code> also changes <code>HEAD</code> to point to the new merge commit object. So you can simply do the same:
<code>git reset –-hard db315a</code></p>
<p>If you look at the history now:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-252.png" alt="Image" width="600" height="400" loading="lazy">
_The history after creating a merge commit and resetting <code>HEAD</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>You can see that you've reached the same result as the merge done by Git, with the exception of the timestamp and thus the SHA-1 value, of course.</p>
<p>So you got to merge both the <strong>contents</strong> of the two commits – that is, the state of the files, and also the <strong>history</strong> of those commits – by creating a merge commit that points to both histories.</p>
<p>In this simple case, you could actually just apply the patches using <code>git apply</code>, and everything worked quite well.</p>
<h2 id="heading-quick-recap-on-a-three-way-merge">Quick recap on a three-way merge</h2>
<p>So to quickly recap, on a three-way merge, Git:</p>
<ul>
<li>First, locates the merge base – the common ancestor of the two branches. That is, the first commit that is reachable from both branches. </li>
<li>Second, Git calculates two diffs – one diff from the merge base to the first branch, and another diff from the merge base to the second branch. </li>
<li>Third, Git applies both patches to the merge base, using a 3-way merge algorithm. I haven't explained the 3-way merge yet, but I will elaborate on that later. The result is the state of the new, merge commit.</li>
</ul>
<p>You can also understand why it's called a "3-way merge": Git merges three different states – that of the first branch, that of the second branch, and their common ancestor. In our previous example, <code>main</code>, <code>paul_branch</code>, and <code>Commit 4</code>.</p>
<p>This is unlike, say, the fast-forward examples we saw before. The fast-forward examples are actually a case of a <strong>two</strong>-way merge, as Git only compares two states – for example, where <code>main</code> pointed to, and where <code>john_branch</code> pointed to.</p>
<h1 id="heading-moving-on">Moving on 👣</h1>
<p>Still, this was a simple case of a 3-way merge. John and Paul created different songs, so each of them touched a different file. It was pretty straightforward to execute the merge.</p>
<p>What about more interesting cases?</p>
<p>Let's assume that now John and Paul are co-authoring a new song.</p>
<p>So, John checkedout <code>main</code> branch and started writing the song:</p>
<pre><code>git checkout main
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-253.png" alt="Image" width="600" height="400" loading="lazy">
_John's new song (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>He staged and committed it ("Commit 7"):</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-254.png" alt="Image" width="600" height="400" loading="lazy">
_John's new song is committed (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Now, Paul branches:</p>
<pre><code>git checkout -b paul_branch_2
</code></pre><p>And edits the song, adding another verse:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-255.png" alt="Image" width="600" height="400" loading="lazy">
_Paul added a new verse (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ZS4stBVdDII&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Of course, in the original song, we don't have the title "Paul's Verse", but I'll add it here for simplicity.</p>
<p>Paul stages and commits the changes:</p>
<pre><code>git add a_day_in_the_life.md
git commit -m <span class="hljs-string">"Commit 8"</span>
</code></pre><p>John also branches out from <code>main</code> and adds a few last lines:
```git checkout main
git checkout -b john_branch_2</p>
<pre><code>

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-256.png)</span>
_Paul committed, and now it<span class="hljs-string">'s John'</span>s turn again (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=ZS4stBVdDII&amp;amp;ab_channel=Brief))_</span>

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-257.png)</span>
_John added a few lines (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=ZS4stBVdDII&amp;amp;ab_channel=Brief))_</span>

And he stages and commits his changes too (<span class="hljs-string">"Commit 9"</span>):

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-258.png)</span>
_John committed his changes (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=ZS4stBVdDII&amp;amp;ab_channel=Brief))_</span>

This is the resulting history:

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-350.png)</span>
_The history after John<span class="hljs-string">'s last commit (Source: [Brief](https://www.youtube.com/watch?v=ZS4stBVdDII&amp;amp;ab_channel=Brief))_

So, both Paul and John modified the same file on different branches. Will Git be successful in merging them? 🤔

Say now we don'</span>t go through <span class="hljs-string">`main,`</span> but John will <span class="hljs-keyword">try</span> to merge Paul<span class="hljs-string">'s new branch into his branch:</span>
</code></pre><p>git merge paul_branch_2</p>
<pre><code>
Wait!! 🤚🏻 Don<span class="hljs-string">'t run this command! Why would you let Git do all the hard work? You are trying to understand the process here.

So, first, Git needs to find the merge base. Can you see which commit that would be?

Correct, it would be the last commit on `main` branch, where the two diverged.

You can verify that by using:</span>
</code></pre><p>git merge-base john_branch_2 paul_branch_2</p>
<pre><code>

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-260.png)</span>
_Finding the merge base (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=ZS4stBVdDII&amp;amp;ab_channel=Brief))_</span>

Great, now Git should compute the diffs and generate the patches. You can observe the diffs directly:
</code></pre><p>git diff main paul_branch_2</p>
<pre><code>

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-261.png)</span>
_The output <span class="hljs-keyword">of</span> <span class="hljs-string">`git diff main paul_branch_2`</span> (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=ZS4stBVdDII&amp;amp;ab_channel=Brief))_</span>

Will applying <span class="hljs-built_in">this</span> patch succeed? Well, no problem, Git has all the context lines <span class="hljs-keyword">in</span> place.

Ask Git to apply <span class="hljs-built_in">this</span> patch:
</code></pre><p>git diff main paul_branch_2 &gt; paul_branch_2.patch
git apply -–index paul_branch_2.patch</p>
<pre><code>
And <span class="hljs-built_in">this</span> worked, no problem at all.

Now, compute the diff between John<span class="hljs-string">'s new branch and the merge base. Notice that you haven'</span>t committed the applied changes, so <span class="hljs-string">`john_branch_2`</span> still points at the same commit <span class="hljs-keyword">as</span> before, <span class="hljs-string">"Commit 9"</span>:
</code></pre><p>git diff main john_branch_2</p>
<pre><code>
![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-262.png)</span>
_The output <span class="hljs-keyword">of</span> <span class="hljs-string">`git diff main john_branch_2`</span> (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=ZS4stBVdDII&amp;amp;ab_channel=Brief))_</span>

Will applying <span class="hljs-built_in">this</span> diff work?

Well, indeed, yes. Notice that even though the line numbers have changed on the current version <span class="hljs-keyword">of</span> the file, thanks to the context lines Git is able to locate where it needs to add these lines…

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-263.png)</span>
_Git can rely on the context lines (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=ZS4stBVdDII&amp;amp;ab_channel=Brief))_</span>

Save <span class="hljs-built_in">this</span> patch and apply it then:
</code></pre><p>git diff main john_branch_2 &gt; john_branch_2.patch
git apply –-index john_branch_2.patch</p>
<pre><code>
![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-264.png)</span>
_Apply Paul<span class="hljs-string">'s patch (Source: [Brief](https://www.youtube.com/watch?v=ZS4stBVdDII&amp;amp;ab_channel=Brief))_

Observe the result file:

![Image](https://www.freecodecamp.org/news/content/images/2023/04/image-265.png)
_The result after applying Paul'</span>s patch (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=ZS4stBVdDII&amp;amp;ab_channel=Brief))_</span>

Cool, exactly what we wanted 👏🏻
You can now create the tree and relevant commit:
</code></pre><p>git write-tree</p>
<pre><code>
Don<span class="hljs-string">'t forget to specify both parents:</span>
</code></pre><p>git commit-tree  -p paul_branch_2 -p john_branch_2 -m "Merging new changes"</p>
<pre><code>

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-266.png)</span>
_Creating a merge commit (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=ZS4stBVdDII&amp;amp;ab_channel=Brief))_</span>

See how I used the branches names here? After all, they are just pointers to the commits we want.

Cool, look at the log <span class="hljs-keyword">from</span> the <span class="hljs-keyword">new</span> commit:

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-270.png)</span>
_The history after creating the merge commit (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=ZS4stBVdDII&amp;amp;ab_channel=Brief))_</span>

Exactly what we wanted.

You can also <span class="hljs-keyword">let</span> Git perform the job <span class="hljs-keyword">for</span> you. You can simply checkout <span class="hljs-string">`john_branch_2`</span>, which you haven<span class="hljs-string">'t moved – so it still points to the same commit as it did before the merge. So all you need to do is run:</span>
</code></pre><p>git merge paul_branch_2</p>
<pre><code>
Observe the resulting history:

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-271.png)</span>
_The history after letting Git perform the merge (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

Just <span class="hljs-keyword">as</span> before, you have a merge commit pointing to <span class="hljs-string">"Commit 8"</span> and <span class="hljs-string">"Commit 9"</span> <span class="hljs-keyword">as</span> its parents. <span class="hljs-string">"Commit 9"</span> is the first parent since you merged into it.

But <span class="hljs-built_in">this</span> was still quite simple… John and Paul worked on the same file, but on very different parts. You could also directly apply Paul<span class="hljs-string">'s changes to John'</span>s branch. If you go back to John<span class="hljs-string">'s branch before the merge:</span>
</code></pre><p>git reset --hard HEAD~</p>
<pre><code>
And now apply Paul<span class="hljs-string">'s changes:</span>
</code></pre><p>git apply -–index paul_branch_2.patch</p>
<pre><code>
![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-272.png)</span>
_Applying Paul<span class="hljs-string">'s changes directly to John'</span>s branch (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

You will get the same result.

But what happens when the two branches include changes on the same files, <span class="hljs-keyword">in</span> the same locations? 🤔

# More Advanced Git Merge Cases

What would happen <span class="hljs-keyword">if</span> John and Paul were to coordinate a <span class="hljs-keyword">new</span> song, and work on it together?

In <span class="hljs-built_in">this</span> <span class="hljs-keyword">case</span>, John creates the first version <span class="hljs-keyword">of</span> <span class="hljs-built_in">this</span> song <span class="hljs-keyword">in</span> the <span class="hljs-string">`main`</span> branch:
</code></pre><p>git checkout main
nano everyone.md</p>
<pre><code>
![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-273.png)</span>
_The contents <span class="hljs-keyword">of</span> <span class="hljs-string">`everyone.md`</span> prior to the first commit (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

By the way, <span class="hljs-built_in">this</span> text is indeed taken <span class="hljs-keyword">from</span> the version that John Lennon recorded <span class="hljs-keyword">for</span> a demo <span class="hljs-keyword">in</span> <span class="hljs-number">1968.</span> But <span class="hljs-built_in">this</span> isn<span class="hljs-string">'t an article about the Beatles, so if you'</span>re curious about the process the Beatles underwent <span class="hljs-keyword">while</span> writing <span class="hljs-built_in">this</span> song, you can follow the links <span class="hljs-keyword">in</span> the appendix below.
</code></pre><p>git add everyone.md
git commit -m "Commit 10"</p>
<pre><code>
![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-275.png)</span>
_Introducing <span class="hljs-string">"Commit 10"</span> (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

Now John and Paul split. Paul creates a <span class="hljs-keyword">new</span> verse <span class="hljs-keyword">in</span> the beginning:
</code></pre><p>git checkout -b paul_branch_3
nano everyone.md</p>
<pre><code>
![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-276.png)</span>
_Paul added a <span class="hljs-keyword">new</span> verse <span class="hljs-keyword">in</span> the beginning (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

Also, <span class="hljs-keyword">while</span> talking to John, they decided to change the word <span class="hljs-string">"feet"</span> to <span class="hljs-string">"foot"</span>, so Paul adds <span class="hljs-built_in">this</span> change <span class="hljs-keyword">as</span> well.

And Paul adds and commits his changes to the repo:
</code></pre><p>git add everyone.md
git commit -m "Commit 11"</p>
<pre><code>
You can observe Paul<span class="hljs-string">'s changes, by comparing this branch'</span>s state to the state <span class="hljs-keyword">of</span> branch <span class="hljs-string">`main`</span>:
</code></pre><p>git diff main</p>
<pre><code>

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-277.png)</span>
_The output <span class="hljs-keyword">of</span> <span class="hljs-string">`git diff main`</span> <span class="hljs-keyword">from</span> Paul<span class="hljs-string">'s branch (Source: [Brief](https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_

Store this diff in a patch file:</span>
</code></pre><p>git diff main &gt; paul_3.patch</p>
<pre><code>
Now back to <span class="hljs-string">`main`</span>...
</code></pre><p>git checkout main</p>
<pre><code>
John decides to make another change, <span class="hljs-keyword">in</span> his own <span class="hljs-keyword">new</span> branch:
</code></pre><p>git checkout -b john_branch_3</p>
<pre><code>
And he replaces the line <span class="hljs-string">"Everyone had the boot in"</span> <span class="hljs-keyword">with</span> the line <span class="hljs-string">"Everyone had a wet dream"</span>. In addition, John changed the word <span class="hljs-string">"feet"</span> to <span class="hljs-string">"foot"</span>, following his talk <span class="hljs-keyword">with</span> Paul.

Observe the diff:
</code></pre><p>git diff main</p>
<pre><code>
![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-278.png)</span>
_The output <span class="hljs-keyword">of</span> <span class="hljs-string">`git diff main`</span> <span class="hljs-keyword">from</span> John<span class="hljs-string">'s branch (Source: [Brief](https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_

Store this output as well:</span>
</code></pre><p>git diff main &gt; john_3.patch</p>
<pre><code>
Now, stage and commit:
</code></pre><p>git add everyone.md
git commit -m "Commit 12"</p>
<pre><code>
This is our current history:

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-351.png)</span>
_The history after introducing <span class="hljs-string">"Commit 12"</span> (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

Paul told John he added a <span class="hljs-keyword">new</span> verse, so John would like to merge Paul<span class="hljs-string">'s changes.

Can John simply apply Paul'</span>s patch?

Consider the patch again:
</code></pre><p>git diff main paul_branch_3</p>
<pre><code>
![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-277.png)</span>
_The output <span class="hljs-keyword">of</span> <span class="hljs-string">`git diff main paul_branch_3`</span> (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

As you can see, <span class="hljs-built_in">this</span> diff relies on the line <span class="hljs-string">"Everyone had the boot in"</span>, but <span class="hljs-built_in">this</span> line no longer exists on John<span class="hljs-string">'s branch. As a result, you could expect applying the patch to fail. Go on, give it a try:</span>
</code></pre><p>git apply paul_3.patch</p>
<pre><code>
![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-280.png)</span>
_Applying the patch failed (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

Indeed, you can see that it failed.

But should it really fail? 🤔

As explained earlier, <span class="hljs-string">`git merge`</span> uses a <span class="hljs-number">3</span>-way merge algorithm, and <span class="hljs-built_in">this</span> can come <span class="hljs-keyword">in</span> handy here. What would be the first step <span class="hljs-keyword">of</span> <span class="hljs-built_in">this</span> algorithm?

Well, first, Git would find the merge base – that is, the common ancestor <span class="hljs-keyword">of</span> Paul<span class="hljs-string">'s branch and John'</span>s branch. Consider the history:

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-351.png)</span>
_The history after introducing <span class="hljs-string">"Commit 12"</span> (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

So the common ancestor <span class="hljs-keyword">of</span> <span class="hljs-string">"Commit 11"</span> and <span class="hljs-string">"Commit 12"</span> is <span class="hljs-string">"Commit 10"</span>. We can verify <span class="hljs-built_in">this</span> by running the command:
</code></pre><p>git merge-base john_branch_3 paul_branch_3</p>
<pre><code>
Now we can take the patches we generated <span class="hljs-keyword">from</span> the diffs on both branches, and apply them to <span class="hljs-string">`main`</span>. Would that work?

First, <span class="hljs-keyword">try</span> to apply John<span class="hljs-string">'s patch, and then Paul'</span>s patch.

Consider the diff:
</code></pre><p>git diff main john_branch_3</p>
<pre><code>

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-278.png)</span>
_The output <span class="hljs-keyword">of</span> <span class="hljs-string">`git diff main john_branch_3`</span> (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

We can store it <span class="hljs-keyword">in</span> a file:
</code></pre><p>git diff main john_branch_3 &gt; john_3.patch</p>
<pre><code>
And I want to apply <span class="hljs-built_in">this</span> patch on <span class="hljs-string">`main`</span>, <span class="hljs-attr">so</span>:
</code></pre><p>git checkout main
git apply john_3.patch</p>
<pre><code>
Let<span class="hljs-string">'s consider the result:</span>
</code></pre><p>nano everyone.md</p>
<pre><code>
![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-282.png)</span>
_The contents <span class="hljs-keyword">of</span> <span class="hljs-string">`everyone.md`</span> after applying John<span class="hljs-string">'s patch (Source: [Brief](https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_

The line changed as expected. Nice 😎

Now, can Git apply Paul'</span>s patch? To remind you, <span class="hljs-built_in">this</span> is the patch:

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-283.png)</span>
_The contents <span class="hljs-keyword">of</span> Paul<span class="hljs-string">'s patch (Source: [Brief](https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_

Well, Git **cannot** apply this patch, because this patch assumes that the line  "Everyone had the boot in" exists. Trying to apply is liable to fail:</span>
</code></pre><p>git apply -v paul_3.branch</p>
<pre><code>
![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-284.png)</span>
_Applying Paul<span class="hljs-string">'s patch failed._

What you tried to do now, applying Paul'</span>s patch on <span class="hljs-string">`main`</span> branch after applying John<span class="hljs-string">'s patch, is the same as being on `john_branch_3`, and attempting to apply the patch, that is:

```git checkout john_branch_3
git apply paul_3.patch</span>
</code></pre><p>What would happen if we tried the other way around?</p>
<p>First, clean up the state:</p>
<pre><code>git reset --hard
</code></pre><p>And start from Paul's branch:</p>
<pre><code>git checkout paul_branch_3
</code></pre><p>Can we apply John's patch? As a reminder, this is the status of <code>everyone.md</code> on this branch:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-285.png" alt="Image" width="600" height="400" loading="lazy">
_The contents of <code>everyone.md</code> on <code>paul_branch_3</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>And this is John's patch:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-286.png" alt="Image" width="600" height="400" loading="lazy">
_The contents of John's patch (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Would applying John's patch work? 🤔
Try to answer yourself before reading on.</p>
<p>You can try:</p>
<pre><code>git apply john_3.patch
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-287.png" alt="Image" width="600" height="400" loading="lazy">
_Git fails to apply John's patch (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Well, no! Again, if you are not sure what happened, you can always ask <code>git apply</code> to be a bit more verbose:</p>
<pre><code>git apply john_3.patch -v
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-288.png" alt="Image" width="600" height="400" loading="lazy">
_You can get more information by using the <code>-v</code> flag (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Git is looking for "Everyone put the feet down", but Paul has already changed this line so it now consists of the word "foot" instead of "feet". As a result, applying this patch fails.</p>
<p>Notice that changing the number of context lines here (that is, using <code>git apply</code> with the <code>-C</code> flag, as discussed in <a target="_blank" href="https://www.freecodecamp.org/news/git-diff-and-patch/">a previous post</a>) is irrelevant – Git is unable to locate the actual line that the patch is trying to erase.</p>
<p>But actually, Git <em>can</em> make this work, if you just add a flag to <code>apply</code>, telling it to perform a 3-way merge under the hood:</p>
<pre><code>git apply <span class="hljs-number">-3</span> john_3.patch
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-289.png" alt="Image" width="600" height="400" loading="lazy">
_Applying with <code>-3</code> flag succeeds (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>And let's consider the result:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-290.png" alt="Image" width="600" height="400" loading="lazy">
_The contents of <code>everyone.md</code> after ther merge (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Exactly what we wanted! You have Paul's verse (marked in the image above), and both of John's changes!</p>
<p>So, how was Git able to accomplish that?</p>
<p>Well, as I mentioned, Git really did <strong>a 3-way merge</strong>, and with this example, it will be a good time to dive into what this actually means.</p>
<h1 id="heading-how-gits-3-way-merge-algorithm-works">How Git's 3-way Merge Algorithm Works</h1>
<p>Get back to the state before applying this patch:</p>
<pre><code>git reset --hard
</code></pre><p>You have now three versions: the merge base, which is "Commit 10", Paul's branch, and John's branch. In general terms, we can say these are the <code>merge base</code>, <code>commit A</code> and <code>commit B</code>. Notice that the <code>merge base</code> is by definition an ancestor of both <code>commit A</code> and <code>commit B</code>.</p>
<p>To perform the merge, Git looks at the diff between the three different versions of the file in question on these three revisions. In your case, it's the file <code>everyone.md</code>, and the revisions are "Commit 10", Paul's branch – that is, "Commit 11", and John's branch, that is, "Commit 12".</p>
<p>Git makes the merging decision based on the status of each line in each of these versions.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-291.png" alt="Image" width="600" height="400" loading="lazy">
_The three versions considered for the 3-way merge (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>In case <em>not</em> all three versions match, that is a conflict. Git can resolve many of these conflicts automatically, as we will now see.</p>
<p>Let's consider specific lines.</p>
<p>The first lines here exist only on Paul's branch:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-292.png" alt="Image" width="600" height="400" loading="lazy">
_Lines that appear on Paul's branch only (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>This means that the state of John's branch is equal to the state of the merge base. So the 3-way merge goes with Paul's version.</p>
<p>In general, if the state of the merge base is the same as <code>A</code>, the algorithm goes with <code>B</code>. The reason is that since the merge base is the ancestor of both <code>A</code> and <code>B</code>, Git assumes that this line hasn't changed in <code>A</code>, and it <em>has</em> changed in <code>B</code>, which is the most recent version for that line, and should thus be taken into account.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-353.png" alt="Image" width="600" height="400" loading="lazy">
_If the state of the merge base is the same as <code>A</code>, and this state is different from <code>B</code>, the algorithm goes with <code>B</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Next, you can see lines where all three versions agree – they exist on the merge base, <code>A</code> and <code>B</code>, with equal data.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-294.png" alt="Image" width="600" height="400" loading="lazy">
_Lines where all three versions agree (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>So the algorithm has a trivial choice – just take that version.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-355.png" alt="Image" width="600" height="400" loading="lazy">
_In case all three versions agree, the algorithm goes with that single version (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>In a previous example, we saw that if the merge base and <code>A</code> agree, and <code>B</code>'s version is different, the algorithm picks <code>B</code>. This works in the other direction too – for example, here you have a line that exists on John's branch, different than that on the merge base and Paul's branch.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-296.png" alt="Image" width="600" height="400" loading="lazy">
_A line where Paul's version matches the merge base's version, and John has a different version (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p> Hence, John's version is chosen.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-354.png" alt="Image" width="600" height="400" loading="lazy">
_If the state of the merge base is the same as <code>B</code>, and this state is different from <code>A</code>, the algorithm goes with <code>A</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Now consider another case, where both <code>A</code> and <code>B</code> agree on a line, but the value they agree upon is different from the <code>merge base</code> – both John and Paul agreed to change the line "Everyone put their feet down" to "Everyone put their foot down":</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-297.png" alt="Image" width="600" height="400" loading="lazy">
_A line where Paul's version matches the John's version; yet the merge base has a different version (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>In this case, the algorithm picks the version on both <code>A</code> and <code>B</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-352.png" alt="Image" width="600" height="400" loading="lazy">
_In case <code>A</code> and <code>B</code> agree on a version which is different from the merge base's version, the algorithm picks the version on both <code>A</code> and <code>B</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Notice this is not a democratic vote. In the previous case, the algorithm picked the minority version, as it resembled the newest version of this line. In this case, it <em>happens to</em> pick the majority – but only because <code>A</code> and <code>B</code> are the revisions that agree on the new version.</p>
<p>The same would happen if we used <code>git merge</code>:</p>
<pre><code>git merge john_branch_3
</code></pre><p>Without specifying any flags, <code>git merge</code> will default to using a 3-way merge.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-302.png" alt="Image" width="600" height="400" loading="lazy">
_By default, <code>git merge</code> uses a 3-way merge algorithm (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>The status of <code>everyone.md</code> after running the command above would be the same as the result you achieved by applying the patches with <code>git apply -3</code>.</p>
<p>If you consider the history:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-303.png" alt="Image" width="600" height="400" loading="lazy">
_Git's history after performing the merge (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>You will see that the merge commit indeed has two parents: the first is "Commit 11", that is, where <code>paul_branch_3</code> pointed to before the merge. The second is "Commit 12", where <code>john_branch_3</code> pointed to, and still points to now.</p>
<p>What will happen if you now merge from <code>main</code>? That is, switch to the main branch, which is pointing to "Commit 10":</p>
<pre><code>git checkout main
</code></pre><p>And then merge Paul's branch?</p>
<pre><code>git merge paul_branch_3
</code></pre><p>Indeed, a fast forward, as before running this command, <code>main</code> was an ancestor of <code>paul_branch_3</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-304.png" alt="Image" width="600" height="400" loading="lazy">
_A fast-forward merge (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>So, this is a 3-way merge. In general, if all versions agree on a line, then this line is used. If <code>A</code> and the <code>merge base</code> match, and <code>B</code> has another version, <code>B</code> is taken. In the opposite case, where the <code>merge base</code> and <code>B</code> match, the <code>A</code> version is selected. If <code>A</code> and <code>B</code> match, this version is taken, whether the merge base agrees or not.</p>
<p>This description leaves one open question though: What happens in cases where all three versions disagree?</p>
<p>Well, that's a conflict that Git does not resolve automatically. In these cases, Git calls for a human's help.</p>
<h2 id="heading-how-to-resolve-merge-conflicts">How to Resolve Merge Conflicts</h2>
<p>By following so far, you should understand the basics of <code>git merge</code>, and how Git can automatically resolve some conflicts. You also understand what cases are automatically resolved.</p>
<p>Next, let's consider a more advanced case.</p>
<p>Say Paul and John keep working on this song.</p>
<p>Paul creates a new branch:</p>
<pre><code>git checkout -b paul_branch_4
</code></pre><p>And he decides to add some "Yeah"s to the song, so he changes this verse as follows:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-305.png" alt="Image" width="600" height="400" loading="lazy">
_Paul's additions (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>So Paul stages and commits these changes:</p>
<p>```git add everyone.md
git commit -m "Commit 13"</p>
<pre><code>
Paul also creates another song, <span class="hljs-string">`let_it_be.md`</span> and adds it to the repo:
</code></pre><p>git add let_it_be.md
git commit -m "Commit 14"</p>
<pre><code>
This is the history:


![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-356.png)</span>
_The history after Paul introduced <span class="hljs-string">"Commit 14"</span> (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

Back to <span class="hljs-string">`main`</span>:
</code></pre><p>git checkout main</p>
<pre><code>
John also branches out:
</code></pre><p>git checkout -b john_branch_4</p>
<pre><code>
And John also works on the song <span class="hljs-string">"Everyone had a hard year"</span>, later to be called <span class="hljs-string">"I've got a feeling"</span> (again, <span class="hljs-built_in">this</span> is not an article about the Beatles, so I won<span class="hljs-string">'t elaborate on it here. See the appendix if you are curious).

John decides to change all occurrences of "Everyone" to "Everybody":


![Image](https://www.freecodecamp.org/news/content/images/2023/04/image-307.png)
_John changes al occurrences of "Everyone" to "Everybody" (Source: [Brief](https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_

He stages and commits this song to the repo:</span>
</code></pre><p>git add everyone.md
git commit -m "Commit 15"</p>
<pre><code>
Nice. Now John also creates another song, <span class="hljs-string">`across_the_universe.md`</span>. He adds it to the repo <span class="hljs-keyword">as</span> well:
</code></pre><p>git add across_the_universe.md
git commit -m "Commit 16"</p>
<pre><code>
Observe the history again:


![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-308.png)</span>
_The history after John introduced <span class="hljs-string">"Commit 16"</span> (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

You can see that the history diverges <span class="hljs-keyword">from</span> <span class="hljs-string">`main`</span>, to two different branches – <span class="hljs-string">`paul_branch_4`</span>, and <span class="hljs-string">`john_branch_4`</span>.

At <span class="hljs-built_in">this</span> point, John would like to merge the changes introduced by Paul.

What is going to happen here?

Remember the changes introduced by Paul:
</code></pre><p>git diff main paul_branch_4</p>
<pre><code>
![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-309.png)</span>
_The output <span class="hljs-keyword">of</span> <span class="hljs-string">`git diff main paul_branch_4`</span> (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

What <span class="hljs-keyword">do</span> you think? Will merge work? 🤔

Try it out:
</code></pre><p>git merge paul_branch_4</p>
<pre><code>

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-311.png)</span>
_A merge conflict (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

We have a conflict! 🥁

It seems that Git cannot merge these branches on its own. You can get an overview <span class="hljs-keyword">of</span> the merge state, using <span class="hljs-string">`git status`</span>:


![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-310.png)</span>
_The output <span class="hljs-keyword">of</span> <span class="hljs-string">`git status`</span> right after the <span class="hljs-string">`merge`</span> operation (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

The changes that Git had no problem resolving are staged <span class="hljs-keyword">for</span> commit. And there is a separate section <span class="hljs-keyword">for</span> <span class="hljs-string">"unmerged paths"</span> – these are files <span class="hljs-keyword">with</span> conflicts that Git could not resolve on its own.

It<span class="hljs-string">'s time to understand why and when these conflicts happen, how to resolve them, and also how Git handles them under the hood.
Alright then! I hope you are at least as excited as I am. 😇

Let'</span>s recall what we know about <span class="hljs-number">3</span>-way merges:

First, Git will look <span class="hljs-keyword">for</span> the merge base – the common ancestor <span class="hljs-keyword">of</span> <span class="hljs-string">`john_branch_4`</span> and <span class="hljs-string">`paul_branch_4`</span>. Which commit would that be?

Correct, it would be the tip <span class="hljs-keyword">of</span> <span class="hljs-string">`main`</span> branch, the commit <span class="hljs-keyword">in</span> which we merged <span class="hljs-string">`john_branch_3`</span> into <span class="hljs-string">`paul_branch_3`</span>.

Again, <span class="hljs-keyword">if</span> you are not sure, you can verify that by running:
</code></pre><p>git merge-base john_branch_4 paul_branch_4</p>
<pre><code>
And at the current state, <span class="hljs-string">`git status`</span> knows which files are staged and which aren<span class="hljs-string">'t.

Consider the process for each file, which is the same as the 3-way merge algorithm we considered per line, but on a file'</span>s level:

<span class="hljs-string">`across_the_universe.md`</span> exists on John<span class="hljs-string">'s branch, but doesn'</span>t exist on the merge base or on Paul<span class="hljs-string">'s branch. So Git chooses to include this file. Since you are already on John'</span>s branch and <span class="hljs-built_in">this</span> file is included <span class="hljs-keyword">in</span> the tip <span class="hljs-keyword">of</span> <span class="hljs-built_in">this</span> branch, it is not mentioned by <span class="hljs-string">`git status`</span>.

<span class="hljs-string">`let_it_be.md`</span> exists on Paul<span class="hljs-string">'s branch, but doesn'</span>t exist on the merge-base or John<span class="hljs-string">'s branch. So `git merge` "chooses" to include it.

What about `everyone.md`? Well, here we have three different states of this file: its state on the merge base, its state on John'</span>s branch, and its state on Paul<span class="hljs-string">'s branch. While performing a `merge`, Git stores all of these versions on the **index**. 

Let'</span>s observe that by looking directly at the index <span class="hljs-keyword">with</span> the command <span class="hljs-string">`git ls-files`</span>:
</code></pre><p>git ls-files -s –-abbrev</p>
<pre><code>

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-312.png)</span>
_The output <span class="hljs-keyword">of</span> <span class="hljs-string">`git ls-files -s –-abbrev`</span> after the merge operation (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

You can see that <span class="hljs-string">`everyone.md`</span> has three different entries. Git assigns each version a number that represents the <span class="hljs-string">"stage"</span> <span class="hljs-keyword">of</span> the file, and <span class="hljs-built_in">this</span> is a distinct property <span class="hljs-keyword">of</span> an index entry, alongside the file<span class="hljs-string">'s name and the mode bits (I covered the index in [a previous post](https://medium.com/swimm/a-visualized-intro-to-git-internals-objects-and-branches-68df85864037)).

When there is no merge conflict regarding a file, its "stage" is `0`. This is indeed the state for `across_the_universe.md`, and for `let_it_be.md`.

On a conflict'</span>s state, we have:

* Stage <span class="hljs-string">`1`</span> – which is the merge base.
* Stage <span class="hljs-string">`2`</span> – which is <span class="hljs-string">"your"</span> version. That is, the version <span class="hljs-keyword">of</span> the file on the branch you are merging *into*. In our example, <span class="hljs-built_in">this</span> would be <span class="hljs-string">`john_branch_4`</span>.
* Stage <span class="hljs-string">`3`</span> – which is <span class="hljs-string">"their"</span> version, also called the <span class="hljs-string">`MERGE_HEAD`</span>. That is, the version on the branch you are merging (into the current branch). In our example, that is <span class="hljs-string">`paul_branch_4`</span>.

To observe the file<span class="hljs-string">'s contents in a specific stage, you can use a command I introduced in [a previous post](https://medium.com/swimm/getting-hardcore-creating-a-repo-from-scratch-cc747edbb11c), `git cat-file`, and provide the blob'</span>s SHA:
</code></pre><p>git cat-file -p </p>
<pre><code>

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-313.png)</span>
_Using <span class="hljs-string">`git cat-file`</span> to present the content <span class="hljs-keyword">of</span> the file on John<span class="hljs-string">'s branch, right from its state in the index (Source: [Brief](https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_

And indeed, this is the content we expected – from John'</span>s branch, where the lines start <span class="hljs-keyword">with</span> <span class="hljs-string">"Everybody"</span> rather than <span class="hljs-string">"Everyone"</span>.

A nice trick that allows you to see the content quickly without providing the blob<span class="hljs-string">'s SHA-1 value, is by using `git show`, like so:</span>
</code></pre><p>git show ::everyone.md</p>
<pre><code>
For example, to get the content <span class="hljs-keyword">of</span> the same version <span class="hljs-keyword">as</span> <span class="hljs-keyword">with</span> <span class="hljs-string">`git cat-file -p &lt;BLOB_SHA_FOR_STAGE_2&gt;`</span>, you can write <span class="hljs-string">`git show :2:everyone.md`</span>.

Git records the three states <span class="hljs-keyword">of</span> the three commits into the index <span class="hljs-keyword">in</span> <span class="hljs-built_in">this</span> way at the start <span class="hljs-keyword">of</span> the merge. It then follows the three-way merge algorithm to quickly resolve the simple cases:

In <span class="hljs-keyword">case</span> all three stages match, then the selection is trivial.

If one side made a change <span class="hljs-keyword">while</span> the other did nothing – that is, stage <span class="hljs-number">1</span> matches stage <span class="hljs-number">2</span>, then we choose stage <span class="hljs-number">3</span> – or vice versa. That<span class="hljs-string">'s exactly what happened with `let_it_be.md` and `across_the_universe.md`.

In case of a deletion on the incoming branch, for example, and given there were no changes on the current branch, then we would see that stage 1 matches stage 2, but there is no stage 3. In this case, `git merge` removes the file for the merged version.

What'</span>s really cool here is that <span class="hljs-keyword">for</span> matching, Git doesn<span class="hljs-string">'t need the actual files. Rather, it can rely on the SHA-1 values of the corresponding blobs. This way, Git can easily detect the state a file is in.

![Image](https://www.freecodecamp.org/news/content/images/2023/04/image-352.png)
_Git performs the same 3-way merge algorithm on a files level (Source: [Brief](https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_

Cool, so for `everyone.md` you have this special case – where stage 1, stage 2 and stage 3 are all different from one another. That is, they have different blob SHAs. It'</span>s time to go deeper and understand the merge conflict. 😊

One way to <span class="hljs-keyword">do</span> that would be to simply use <span class="hljs-string">`git diff`</span>. In [a previous post](https:<span class="hljs-comment">//www.freecodecamp.org/news/git-diff-and-patch/), we examined `git diff` in detail, and saw that it shows the differences between various combinations of the working tree, index or commits. </span>

But <span class="hljs-string">`git diff`</span> also has a special mode <span class="hljs-keyword">for</span> helping <span class="hljs-keyword">with</span> merge conflicts:

<span class="hljs-string">`git diff`</span>

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-314.png)</span>
_The output <span class="hljs-keyword">of</span> <span class="hljs-string">`git diff`</span> during a conflict (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

This output may be confusing at first, but once you get used to it, it<span class="hljs-string">'s pretty clear. Let'</span>s start by understanding it, and then see how you can resolve conflicts <span class="hljs-keyword">with</span> other, more visual tools.

The conflicted section is separated by the <span class="hljs-string">"equal"</span> marks (====), and marked <span class="hljs-keyword">with</span> the corresponding branches. In <span class="hljs-built_in">this</span> context, <span class="hljs-string">"ours"</span> is the current branch. In <span class="hljs-built_in">this</span> example, that would be <span class="hljs-string">`john_branch_4`</span>, the branch that <span class="hljs-string">`HEAD`</span> was pointing to when we initiated the <span class="hljs-string">`git merge`</span> command. <span class="hljs-string">"Theirs"</span> is the <span class="hljs-string">`MERGE_HEAD`</span>, the branch that we are merging <span class="hljs-keyword">in</span> – <span class="hljs-keyword">in</span> <span class="hljs-built_in">this</span> <span class="hljs-keyword">case</span>, <span class="hljs-string">`paul_branch_4`</span>.

So <span class="hljs-string">`git diff`</span> without any special flags shows changes between the working tree and the index, which <span class="hljs-keyword">in</span> <span class="hljs-built_in">this</span> <span class="hljs-keyword">case</span> are the conflicts yet to be resolved. The output doesn<span class="hljs-string">'t include staged changes, which is very convenient for resolving the conflict.

Time to resolve this manually. Fun!

So, why is this a conflict?

For Git, Paul and John made different changes to the same line, for a few lines. John changed it to one thing, and Paul changed it to another thing. Git cannot decide which one is correct.

This is not the case for the last lines, like the line that used to be "Everyone had a hard year" on the merge base. Paul hasn'</span>t changed <span class="hljs-built_in">this</span> line, or the lines surrounding it, so its version on <span class="hljs-string">`paul_branch_4`</span>, or <span class="hljs-string">"theirs"</span> <span class="hljs-keyword">in</span> our <span class="hljs-keyword">case</span>, agrees <span class="hljs-keyword">with</span> the merge_base. Yet John<span class="hljs-string">'s version, "ours", is different. Thus `git merge` can easily decide to take this version.

But what about the conflicted lines?

In this case, I know what I want, and that is actually a combination of these lines. I want the lines to start with `Everybody`, following John'</span>s change, but also to include Paul<span class="hljs-string">'s "yeah"s. So go ahead and create the desired version by editing `everyone.md`:
`nano everyone.md`

![Image](https://www.freecodecamp.org/news/content/images/2023/04/image-315.png)
_Editing the file manually to achieve the desired state (Source: [Brief](https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_

To compare the result file to what you had in the branch prior to the merge, you can run:</span>
</code></pre><p>git diff --ours</p>
<pre><code>
Similarly, <span class="hljs-keyword">if</span> you wish to see how the result <span class="hljs-keyword">of</span> the merge differs <span class="hljs-keyword">from</span> the branch you merged into our branch, you can run:
</code></pre><p>git diff -–theirs</p>
<pre><code>
You can even see how the result is different <span class="hljs-keyword">from</span> both sides using:
</code></pre><p>git diff -–base</p>
<pre><code>
Now you can stage the fixed version:
</code></pre><p>git add everyone.md</p>
<pre><code>
After staging, <span class="hljs-keyword">if</span> you look at <span class="hljs-string">`git status`</span>, you will see no conflicts:

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-316.png)</span>
_After staging the fixed version <span class="hljs-string">`everyone.md`</span>, there are no conflicts (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

You can now simply use <span class="hljs-string">`git commit`</span>, and Git will present you <span class="hljs-keyword">with</span> a commit message containing details about the merge. You can modify it <span class="hljs-keyword">if</span> you like, or leave it <span class="hljs-keyword">as</span> is. Regardless <span class="hljs-keyword">of</span> the commit message, Git will create a <span class="hljs-string">"merge commit"</span> – that is, a commit <span class="hljs-keyword">with</span> more than one parent. 

To validate that, consider the history:

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-317.png)</span>
_The history after completing the merge operation (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

<span class="hljs-string">`john_branch_4`</span> now points to the <span class="hljs-keyword">new</span> merge commit. The incoming branch, <span class="hljs-string">"theirs"</span>, <span class="hljs-keyword">in</span> <span class="hljs-built_in">this</span> <span class="hljs-keyword">case</span>, <span class="hljs-string">`paul_branch_4`</span>, stays where it was.

# How to Use VS Code to Resolve Conflicts

I will show you now how to resolve the same conflict using a graphical tool. For <span class="hljs-built_in">this</span> example, I will use VS Code, which is free and very common. There are many other tools, yet the process is similar, so I will just show VS Code <span class="hljs-keyword">as</span> an example. 

First, get back to the state before the merge:
</code></pre><p>git reset --hard HEAD~</p>
<pre><code>
And <span class="hljs-keyword">try</span> to merge again:
</code></pre><p>git merge paul_branch_4</p>
<pre><code>
You should be back at the same status:

![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-318.png)</span>
_Back at the conflicting status (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

Let<span class="hljs-string">'s see how this appears on VS Code:

![Image](https://www.freecodecamp.org/news/content/images/2023/04/image-320.png)
_Conflict resolution with VS Code (Source: [Brief](https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_

VS Code marks the different versions with "Current Change" – which is the "ours" version, the current `HEAD`, and "Incoming Change" for the branch we are merging into the active branch. You can accept one of the changes (or both) by clicking on one of the options.

If you clicked on `Resolve in Merge editor`, you would get a more visual view of the state. VS Code shows the status of each line:


![Image](https://www.freecodecamp.org/news/content/images/2023/04/image-321.png)
_VS Code'</span>s Merge Editor (Source: [Brief](https:<span class="hljs-comment">//www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_</span>

If you look closely, you will see that VS Code shows changes within words – <span class="hljs-keyword">for</span> example, showing that <span class="hljs-string">"Every**one**"</span> was changed to <span class="hljs-string">"Every**body**"</span>, marking the changed parts. 

You can accept either version, or you can accept a combination. In <span class="hljs-built_in">this</span> <span class="hljs-keyword">case</span>, <span class="hljs-keyword">if</span> you click on <span class="hljs-string">"Accept Combination"</span>, you get <span class="hljs-built_in">this</span> result:


![Image](https:<span class="hljs-comment">//www.freecodecamp.org/news/content/images/2023/04/image-322.png)</span>
_VS Code<span class="hljs-string">'s Merge Editor after clicking on "Accept Combination" (Source: [Brief](https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;amp;t=561s&amp;amp;ab_channel=Brief))_

VS Code did a really good job! The same three way merge algorithm was implemented here and used on the *word* level rather than the *line* level. So VS Code was able to actually resolve this conflict in a rather impressive way. Of course, you can modify VS Code'</span>s suggestion, but it provided a very good start.


# One More Powerful Tool 🪛
Well, <span class="hljs-built_in">this</span> was the first time <span class="hljs-keyword">in</span> <span class="hljs-built_in">this</span> entire series <span class="hljs-keyword">of</span> Git articles that I use a tool <span class="hljs-keyword">with</span> a graphical user interface. Indeed, graphical interfaces can be very convenient to understand what<span class="hljs-string">'s going on when you are resolving merge conflicts.

However, like in many other cases, when we need the big guns or *really* understand what'</span>s going on, the command line becomes handy. So <span class="hljs-keyword">let</span><span class="hljs-string">'s get back to the command line and learn a tool that can come in handy in more complicated cases.

Again, go back to the state before the merge:</span>
</code></pre><p>git reset --hard HEAD~</p>
<pre><code>
And merge:
</code></pre><p>git merge paul_branch_4</p>
<pre><code>
And say, you are not exactly sure what happened. Why is there a conflict? One very useful command would be:
</code></pre><p>git log -p -–merge
```</p>
<p>As a reminder, <code>git log</code> shows the history of commits that are reachable from <code>HEAD</code>. Adding <code>-p</code> tells <code>git log</code> to show the commits along the diffs they introduced. The <code>--merge</code> switch makes the command show all commits containing changes relevant to any <em>unmerged files</em>, on either branch, together with their diffs.</p>
<p>This can help you identify the changes in history that led to the conflicts. So in this example, you'd see:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-323.png" alt="Image" width="600" height="400" loading="lazy">
_The output of <code>git log -p -–merge</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>The first commit we see is "Commit 15", as in this commit John modified <code>everyone.md</code>, a file that still has conflicts. Next, Git shows "Commit 13", where Paul changed <code>everyone.md</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-324.png" alt="Image" width="600" height="400" loading="lazy">
_The output of <code>git log -p -–merge</code> - continued (Source: <a target="_blank" href="https://www.youtube.com/watch?v=BCNZ5Uxctuk&amp;t=561s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Notice that <code>git log --merge</code> did not mention previous commits that had changed <code>everyone.md</code> before "Commit 13", as they had not affected the current conflict.</p>
<p>This way, <code>git log</code> tells you all you need to know to understand the process that got you into the current conflicting state. Cool! 😎</p>
<p>Using the command line, you can also ask Git to take only one side of the changes – either "ours" or "theirs", even for a specific file. </p>
<p>You can also instruct Git to take some parts of the diffs of one file and another from another file. I will provide links that describe how to do that in the additional resources section below. </p>
<p>For the most part, you can accomplish that pretty easily either manually or from the UI of your favorite IDE.</p>
<p>For now, it's time for a recap.</p>
<h1 id="heading-recap">Recap</h1>
<p>In this guide, you got an extensive overview of merging with Git. You learned that merging is the process of combining the recent changes from several branches into a single new commit. The new commit has two parents – those commits which had been the tips of the branches that were merged.</p>
<p>We considered a simple, fast-forward merge, which is possible when one branch diverged from the base branch, and then just added commits on top of the base branch. </p>
<p>We then considered three-way merges, and explained the three-stage process:</p>
<ul>
<li>First, Git locates the merge base. As a reminder, this is the first commit that is reachable from both branches.</li>
<li>Second, Git calculates two diffs – one diff from the merge base to the <em>first</em> branch, and another diff from the merge base to the <em>second</em> branch. Git generates patches based on those diffs.</li>
<li>Third and last, Git applies both patches to the merge base using a 3-way merge algorithm. The result is the state of the new, merge commit.</li>
</ul>
<p>We dove deeper into the process of a 3-way merge, whether at a file level or a hunk level. We considered when Git is able to rely on a 3-way merge to automatically resolve conflicts, and when it just can't. </p>
<p>You saw the output of <code>git diff</code> when we are in a conflicting state, and how to resolve conflicts either manually or with VS Code.</p>
<p>There is much more to be said about merges – different merge strategies, recursive merges, and so on. Yet, after this guide, you should have a robust understanding of what merge is, and what happens under the hood in the vast majority of cases.</p>
<h1 id="heading-about-the-author"><strong>About the Author</strong></h1>
<p><a target="_blank" href="https://www.linkedin.com/in/omer-rosenbaum-034a08b9/">Omer Rosenbaum</a> is <a target="_blank" href="https://swimm.io/">Swimm</a>’s Chief Technology Officer. He's the author of the Brief <a target="_blank" href="https://youtube.com/@BriefVid">YouTube Channel</a>. He's also a cyber training expert and founder of Checkpoint Security Academy. He's the author of <a target="_blank" href="https://data.cyber.org.il/networks/networks.pdf">Computer Networks (in Hebrew)</a>. You can find him on <a target="_blank" href="https://twitter.com/Omer_Ros">Twitter</a>.</p>
<h1 id="heading-additional-references"><strong>Additional References</strong></h1>
<ul>
<li><a target="_blank" href="https://www.youtube.com/playlist?list=PL9lx0DXCC4BNUby5H58y6s2TQVLadV8v7">Git Internals YouTube playlist — by Brief</a>.</li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/git-internals-objects-branches-create-repo/">Omer's previous post about Git internals.</a></li>
<li><a target="_blank" href="https://medium.com/@Omer_Rosenbaum/git-undo-how-to-rewrite-git-history-with-confidence-d4452e2969c2">Omer's piece about Git UNDO - rewriting history with Git</a>.</li>
<li><a target="_blank" href="https://git-scm.com/book/en/v2/Git-Tools-Advanced-Merging">https://git-scm.com/book/en/v2/Git-Tools-Advanced-Merging</a>.</li>
<li><a target="_blank" href="https://blog.plasticscm.com/2010/11/live-to-merge-merge-to-live.html">https://blog.plasticscm.com/2010/11/live-to-merge-merge-to-live.html</a>.</li>
<li><a target="_blank" href="https://www.oreilly.com/library/view/git-pocket-guide/9781449327507/ch07.html">https://www.oreilly.com/library/view/git-pocket-guide/9781449327507/ch07.html</a>.</li>
<li><a target="_blank" href="https://jwiegley.github.io/git-from-the-bottom-up/1-Repository/4-how-trees-are-made.html">https://jwiegley.github.io/git-from-the-bottom-up/1-Repository/4-how-trees-are-made.html</a>.</li>
</ul>
<h1 id="heading-appendix-beatles-related-resources">Appendix – Beatles-related resources</h1>
<ul>
<li><a target="_blank" href="https://www.the-paulmccartney-project.com/song/ive-got-a-feeling/">https://www.the-paulmccartney-project.com/song/ive-got-a-feeling/</a></li>
<li><a target="_blank" href="https://www.cheatsheet.com/entertainment/did-john-lennon-or-paul-mccartney-write-the-classic-a-day-in-the-life.html/">https://www.cheatsheet.com/entertainment/did-john-lennon-or-paul-mccartney-write-the-classic-a-day-in-the-life.html/</a></li>
<li><a target="_blank" href="http://lifeofthebeatles.blogspot.com/2009/06/ive-got-feeling-lyrics.html">http://lifeofthebeatles.blogspot.com/2009/06/ive-got-feeling-lyrics.html</a></li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Git Diff and Patch – Full Handbook for Developers ]]>
                </title>
                <description>
                    <![CDATA[ Many of the interesting processes in Git like merging, rebasing, or even committing are based on diffs and patches. Developers work with diffs all the time, whether using Git directly or relying on the IDE's diff view. In this post, you will learn wh... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/git-diff-and-patch/</link>
                <guid isPermaLink="false">66c17c1dc711c748ec71e870</guid>
                
                    <category>
                        <![CDATA[ Git ]]>
                    </category>
                
                    <category>
                        <![CDATA[ handbook ]]>
                    </category>
                
                    <category>
                        <![CDATA[ version control ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Omer Rosenbaum ]]>
                </dc:creator>
                <pubDate>Tue, 21 Feb 2023 21:42:37 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/07/Git-Diff-and-Patch-for-Developers-Book-Cover--1-.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Many of the interesting processes in Git like merging, rebasing, or even committing are based on diffs and patches.</p>
<p>Developers work with diffs all the time, whether using Git directly or relying on the IDE's diff view. In this post, you will learn what Git diffs and patches are, their structure, and how to apply patches.</p>
<p>In <a target="_blank" href="https://www.freecodecamp.org/news/git-internals-objects-branches-create-repo/">a previous post</a>, you learned about Git’s objects. Specifically, we discussed that a commit is a snapshot of the working tree at a certain point in time, in addition to some meta-data.</p>
<p>Yet, it is really hard to make sense of individual commits by looking at the entire working tree. Rather, it is more helpful to look at how different a commit is from its parent commit, that is, the <strong>diff</strong>  between these commits.</p>
<p>So, what do I mean when I say <code>diff</code>? Let’s start with some history.</p>
<h1 id="heading-git-diffs-history">Git Diff's History 📖</h1>
<p>Git’s <code>diff</code> is based on the diff utility on UNIX systems. <code>diff</code> was developed in the early 1970s on the Unix operating system. The first released version shipped with the 5th Edition of Unix in 1974.</p>
<p><code>git diff</code> is a command that takes two inputs, and computes the difference between them. Inputs can be commits, but also files, and even files that have never been introduced to the repository.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-214.png" alt="Image" width="600" height="400" loading="lazy">
<em>Git diff takes two inputs, which can be commits or files (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>This is important – <code>git diff</code> computes the difference between two strings, which most of the time happen to consist of code, but not necessarily.</p>
<h1 id="heading-time-to-get-hands-on"><strong>Time to Get Hands-On 🙌🏻</strong></h1>
<p>You are encouraged to run the commands yourself while reading this post.</p>
<p>Consider this very short text file, called <code>file.txt</code> on my machine, which consists of 6 lines:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-158.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>file.txt</code> consists of 6 lines (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>Now, modify this file a bit. Remove the second line, and insert a new line as the fourth line. Add an <code>!</code> to the end of the last line, so you get this result:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-159.png" alt="Image" width="600" height="400" loading="lazy">
<em>After modifying <code>file.txt</code>, we get different 6 lines (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>Save this file with a new name, say <code>new_file.txt</code>.</p>
<p>Now we can run <code>git diff</code> to compute the difference between the files like so:</p>
<p><code>git diff -–no-index file.txt new_file.txt</code>
(I will explain the <code>--no-index</code> switch of this command later.)</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-160.png" alt="Image" width="600" height="400" loading="lazy">
<em>The output of <code>git diff</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>So the output of <code>git diff</code> shows quite a lot of things.</p>
<p>For now, focus on the part starting with <code>This is a simple line</code>. You can see that the added line (<code>// new test</code>) is preceded by a <code>+</code> sign. The deleted line is preceded by a <code>-</code> sign. </p>
<p>Interestingly, notice that Git views a modified line as a sequence of two changes - erasing a line and adding a new line instead. So the patch includes deleting the last line, and adding a new line that equals to that line, with the addition of a <code>!</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-165.png" alt="Image" width="600" height="400" loading="lazy">
<em>Addition lines are preceded by <code>+</code>, deletion lines by <code>-</code>, and modification lines are sequences of deletions and additions (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>The terms <code>patch</code> and <code>diff</code> are often used interchangeably, although there is a distinction, at least historically. </p>
<p>A <code>diff</code> shows the differences between two files, or snapshots, and can be quite minimal in doing so. A <code>patch</code> is an extension of a <code>diff</code>, augmented with further information such as context lines and filenames, which allow it to be applied more widely. It is a text document that describes how to alter an existing file or codebase. </p>
<p>These days, the Unix diff program, and <code>git diff</code>, can produce patches of various kinds.</p>
<p>A <code>patch</code> is a compact representation of the differences between two files. It describes how to turn one file into another. </p>
<p>That is, if you apply the “instructions” produced by <code>git diff</code> on <code>file.txt</code> – that is, remove the second line, insert <code>// new text</code> as the fourth line, and add another <code>!</code> to the last line – you would get the content of <code>new_file.txt</code>.</p>
<p>Another important thing to note is that a patch is asymmetric: the patch from <code>file.txt</code> to <code>new_file.txt</code> is not the same as the patch for the other direction. </p>
<p>So, in this example, generating a <code>patch</code> between <code>new_file.txt</code> and <code>file.txt</code>, in this order, would mean exactly the opposite instructions than before - add the second line instead of removing it, and so on.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-167.png" alt="Image" width="600" height="400" loading="lazy">
<em>A <code>patch</code> consists of asymmetric instructions to get from one file to another (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>Try it out:
<code>git diff -–no-index new_file.txt file.txt</code></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-169.png" alt="Image" width="600" height="400" loading="lazy">
<em>Running <code>git diff</code> in the reverse direction yields the reverse instructions - add a line instead of removing it, and so on (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>The <code>patch</code> format uses context, as well as line numbers, to locate differing file regions. This allows a <code>patch</code> to be applied to a somewhat earlier or later version of the first file than the one from which it was derived, as long as the applying program can still locate the context of the change.</p>
<h2 id="heading-the-structure-of-a-diff">The Structure of a Diff 🔍</h2>
<p>So, it's time to dive deeper 😎.</p>
<p>Generate a diff from <code>file.txt</code> to <code>new_file.txt</code> again, and consider the output more carefully:</p>
<p><code>git diff -–no-index file.txt new_file.txt</code></p>
<p>The first line introduces the compared files. Git always gives one file the name <code>a</code>, and the other the name <code>b</code>. So in this case <code>file.txt</code> is called <code>a</code>, whereas <code>new_file.txt</code> is called <code>b</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-170.png" alt="Image" width="600" height="400" loading="lazy">
<em>The first line in <code>diff</code> 's output introduces the files being compared (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>Then the second line, starting with <code>index</code>, includes the blob SHAs of these files. So even though in our case they are not even stored within a Git repo, Git shows their corresponding SHA-1 values. </p>
<p>If you need a reminder about blobs in particular and Git objects in general, check out <a target="_blank" href="https://www.freecodecamp.org/news/git-internals-objects-branches-create-repo/">this post</a>.</p>
<p>The third value in this line, <code>100644</code>, is the "mode bits", indicating that this is a "regular" file: not executable and not a symbolic link.</p>
<p>The use of two dots (<code>..</code>) here between the blob SHAs is just as a separator (unlike other cases where it is used within Git).</p>
<p>Other header lines might indicate the old and new mode bits if they changed, old and new filenames if the file were being renamed, and so on.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-171.png" alt="Image" width="600" height="400" loading="lazy">
<em>The second line in <code>diff</code> 's output includes the blob SHAs of the compared files, as well as the mode bits (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>The blob SHAs (also called "blob IDs") are helpful if this patch is later applied by Git to the same project and there are conflicts while applying it.</p>
<p>After the blob IDs, we have two lines: one starting with <code>-</code> signs, and the other starting with <code>+</code> signs. This is the traditional "unified diff" header, again showing the files being compared and the direction of the changes: <code>-</code> signs show lines in the A version but missing from the B version, and <code>+</code> signs, lines missing in A version but present in B. </p>
<p>If the patch were of this file being added or deleted in its entirety, then one of these would be <code>/dev/null</code> to signal that.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-172.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>-</code> signs show lines in the A version but missing from the B version; and <code>+</code> signs, lines missing in A version but present in B (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>Consider the case where we delete a file:
<code>rm file.txt</code></p>
<p>And then we use <code>git diff</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-173.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>diff</code>'s output for a deleted file (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>The A version, representing the state of the index, is currently <code>file.txt</code>, compared to the working dir where this file does not exist, so it is <code>/dev/null</code>. All lines are preceded by <code>-</code> signs as they exist only in the A version.</p>
<p>Going back to the previous diff:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-174.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>diff</code>'s output includes changes sections called "hunks" or "chunks" (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>After this unified diff header, we get to the main part of the diff, consisting of "difference sections", also called "hunks" or "chunks" in Git. </p>
<p>Note that these terms are used interchangeably, and you may stumble upon either of them in Git's documentation and tutorials, as well as Git's source code.</p>
<p>Every hunk begins with a single line, starting with two <code>@</code> signs. These signs are followed by at most four numbers, and then a header for the chunk - which is an educated guess by Git which sometimes works well. </p>
<p>Usually, it will include the beginning of a function or a class, when possible. In this example it doesn't include anything as this is a text file, so consider another example for a moment:</p>
<p><code>git diff -–no-index example.py example_changed.py</code></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-175.png" alt="Image" width="600" height="400" loading="lazy">
<em>When possible, Git includes a header for each hunk, for example a function or class definition (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>In the image above the hunk's header includes the beginning of the function that includes the changed lines - <code>def example_function(x)</code>.</p>
<p>Back to our previous example then:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-174.png" alt="Image" width="600" height="400" loading="lazy">
<em>Back to the previous <code>diff</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>After the two <code>@</code> signs, you can find four numbers.</p>
<p>The first numbers are preceded by a <code>-</code> sign as they refer to <code>file A</code>. The first number represents the line number corresponding to the first line in <code>file A</code> this hunk refers to. In the example above, it is <code>1</code>, meaning that the line <code>This is a simple file</code> corresponds to line number <code>1</code> in version <code>file A</code>.</p>
<p>This number is followed by a comma (<code>,</code>), and then the number of lines this chunk consists of in <code>file A</code>. This number includes all context lines (the lines preceded with a space in the diff), or lines marked with a <code>-</code> sign, as they are part of <code>file A</code>, but not lines marked with a <code>+</code> sign, as they do not exist in <code>file A</code>. </p>
<p>In the example above this number is <code>6</code>, counting the context line <code>This is a simple file</code>, the <code>-</code> line <code>It has a nice poem:</code>, then the three context lines, and lastly <code>Are belong to you</code>.</p>
<p>As you can see, the lines beginning with a space character are context lines, which means they appear as shown in both <code>file A</code> and <code>file B</code>.</p>
<p>Then, we have a <code>+</code> sign to mark the two numbers that refer to <code>file B</code>. First, the line number corresponding to the first line in <code>file B</code>, followed by the number of lines this chunk consists of - in <code>file B</code>. </p>
<p>This number includes all context lines, as well as lines marked with the <code>+</code> sign, as they are part of <code>file B</code>, but not lines marked with a <code>-</code> sign.</p>
<p>After the header of the chunk, we get the actual lines - either context, <code>-</code> or <code>+</code> lines.</p>
<p>Typically and by default, a hunk starts and ends with three context lines, in case there are of course three lines before and after the modified lines in the source file.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-176.png" alt="Image" width="600" height="400" loading="lazy">
<em>The patch format by <code>git diff</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<h2 id="heading-how-to-produce-diffs">How to Produce diffs ⌨️</h2>
<p>The example above shows a diff between the two files exactly. A single patch file can contain the differences for any number of files, and <code>git diff</code> produces diffs for all altered files in the repository in a single patch. </p>
<p>Often, you will see the output of <code>git diff</code> showing two versions of the <em>same</em> file and the difference between them.</p>
<p>To demonstrate, consider this other repository:</p>
<p><code>cd ~/brief-example</code></p>
<p>At the current state, the active directory is a Git repository, with a clean status:</p>
<p><code>git status</code></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-177.png" alt="Image" width="600" height="400" loading="lazy">
<em>In another repository with a clean status (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>Take an existing file, like this one:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-178.png" alt="Image" width="600" height="400" loading="lazy">
_An example file - <code>my_file.py</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)_</p>
<p>And change one of its lines. For example, consider the second line:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-179.png" alt="Image" width="600" height="400" loading="lazy">
_The contents of <code>my_file.py</code> after modifying the second line (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)_</p>
<p>And run <code>git diff</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-180.png" alt="Image" width="600" height="400" loading="lazy">
_The output of <code>git diff</code> for <code>my_file.py</code> after changing it (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)_</p>
<p>The output of <code>git diff</code> shows the difference between <code>my_file.py</code>'s version in the staging area, which in this case is the same as the last commit (<code>HEAD</code>), and in the working directory. </p>
<p>I covered the terms "working directory", "staging area", and "commit" in <a target="_blank" href="https://www.freecodecamp.org/news/git-internals-objects-branches-create-repo/">a previous post</a>, so check it out in case you missed it or would like to refresh your memory. </p>
<p>As <a target="_blank" href="https://www.freecodecamp.org/news/git-internals-objects-branches-create-repo/">a reminder</a>, the terms "staging area" and "index" are interchangeable, and both are widely used.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-182.png" alt="Image" width="600" height="400" loading="lazy">
<em>At this state, the status of the working dir is the same as the status of the index and that of <code>HEAD</code>. (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>So to see the difference between the working dir and the staging area, use <code>git diff</code>, without any additional flags.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-181.png" alt="Image" width="600" height="400" loading="lazy">
<em>Without switches, <code>git diff</code> shows the difference between the staging area (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>As you can see, <code>git diff</code> lists here both <code>file A</code> and <code>file B</code> pointing to <code>my_file.py</code>. So <code>file A</code> here refers to the version of <code>my_file.py</code> in the staging area, whereas <code>file B</code> refers to its version in the working dir.</p>
<p>Note that if you modify <code>my_file.py</code> in a text editor, and don’t save the file, then <code>git diff</code> will not be aware of the changes you've made, as they haven’t been saved to the working dir.</p>
<p>There are a few switches we can provide to <code>git diff</code> to get the diff between the working dir and a specific commit, or between the staging area and the latest commit, or between two commits and so on.</p>
<p>First create a new file, <code>new_file.txt</code>, and save it. Currently the file is in the working dir, and it is actually untracked in Git.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-183.png" alt="Image" width="600" height="400" loading="lazy">
_A simple new file saved as <code>new_file.txt</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)_</p>
<p>Now stage and commit this file:
<code>git add new_file.txt</code>
<code>git commit -m "new file!"</code></p>
<p>Now, the state of <code>HEAD</code> is the same as the state of the staging area, as well as the working tree:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-184.png" alt="Image" width="600" height="400" loading="lazy">
<em>The state of <code>HEAD</code> is the same as the index and the working dir (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>Next, edit <code>new_file.txt</code>, by adding a new line at the beginning and another new line at the end:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-185.png" alt="Image" width="600" height="400" loading="lazy">
_Modifying <code>new_file.txt</code> by adding a line in the beginning and another in the end (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)_</p>
<p>As a result, the state is as follows:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-186.png" alt="Image" width="600" height="400" loading="lazy">
<em>After saving, the state in the working dir is different than that of the index or <code>HEAD</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>A nice trick would be to use <code>git add -p</code>, which allows you you split the changes even within a file, and consider which ones you'd like to stage. </p>
<p>So in this case, add the first line to the index, but not the last line. To do that, you can split the hunk using <code>s</code>, then accept to stage the first hunk (using <code>y</code>), and not the second part (using <code>n</code>). </p>
<p>If you are not sure what each letter stands for, you can always use a <code>?</code> and Git will tell you.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-187.png" alt="Image" width="600" height="400" loading="lazy">
<em>Using <code>git add -p</code>, you can stage only the first change (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>So now the state in <code>HEAD</code> is without either of those new lines. In the staging area we have the first line but not the last line, and in the working dir we have both new lines.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-189.png" alt="Image" width="600" height="400" loading="lazy">
<em>The state after staging only the first line (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>If you use <code>git diff</code>, what will happen?</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-188.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git diff</code> shows the difference between the index and the working dir (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>Well, as stated before, you get the diff between the staging area and the working tree.</p>
<p>What happens if you want to get the diff between <code>HEAD</code> and the staging area? For that, you can use <code>git diff –cached</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-190.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git diff --cached</code> shows the difference between <code>HEAD</code> and the index (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>And what if we want the difference between <code>HEAD</code> and the working tree? For that we can run <code>git diff HEAD</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-191.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git diff HEAD</code> shows the difference between <code>HEAD</code> and the working dir (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>To summarize the different switches for <code>git diff</code>, see this diagram that you can go back to as a reference when needed:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-192.png" alt="Image" width="600" height="400" loading="lazy">
<em>Different switches for <code>git diff</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>As a reminder, at the beginning of this post you used <code>git diff -–no-index</code>. With the <code>--no-index</code> switch you can compare two files that are not part of the repository - or of any staging area.</p>
<p>Now, commit the changes you have in the staging area:</p>
<p><code>git commit -m "added a first line"</code></p>
<p>To observe the diff between this commit, and its parent commit, you can run the following command:</p>
<p><code>git diff HEAD~1 HEAD</code></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-194.png" alt="Image" width="600" height="400" loading="lazy">
<em>The output of <code>git diff HEAD~1 HEAD</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>By the way, we can omit the <code>1</code> above and write <code>HEAD~</code>, and get the same result. Using <code>1</code> is the explicit way to state you are referring to the first parent of the commit.</p>
<p>Note that writing the parent commit here, <code>HEAD~1</code>, first results in a diff showing how to get from the parent commit to the current commit. Of course, I could also generate the reverse diff by writing:</p>
<p><code>git diff HEAD HEAD~1</code></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-195.png" alt="Image" width="600" height="400" loading="lazy">
<em>The output of <code>git diff HEAD HEAD~1</code> generates the reverse patch (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-196.png" alt="Image" width="600" height="400" loading="lazy">
<em>The different switches for <code>git diff</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>A short way to view the diff between a commit and its parent, is by using <code>git show</code>, for example:</p>
<p><code>git show HEAD</code></p>
<p>This is the same as writing:</p>
<p><code>git diff HEAD~ HEAD</code></p>
<p>We can now update our diagram:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-197.png" alt="Image" width="600" height="400" loading="lazy">
_The contents of <code>new_file.txt</code> after using <code>git reset --hard HEAD~1</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)_</p>
<p>As <a target="_blank" href="https://www.freecodecamp.org/news/git-internals-objects-branches-create-repo/">a reminder</a>, Git commits are snapshots - of the entire working directory of the repository, at a certain point in time. Yet, it is sometimes not very useful to regard a commit as a whole snapshot, but rather by the changes this specific commit introduced. In other words, by the <strong>diff</strong> between a parent commit to the next commit. </p>
<p>It is still important to remember that Git stores the entire snapshots, and the diff is dynamically generated from the snapshot data - by comparing the root trees of the commit and its parent.</p>
<p>Of course, Git can compare any two snapshots in time, not just adjacent commits, and also generate a diff of files not included in a repository.</p>
<h2 id="heading-how-to-apply-patches">How to Apply Patches 💪🏻</h2>
<p>By using <code>git diff</code> you can see the patch, and you can then apply this patch using <code>git apply</code>.</p>
<h3 id="heading-historical-note">Historical note 📔</h3>
<p>Actually, sharing patches used to be the main way to share code in the early days of open source. But now - virtually all projects have moved to sharing Git commits directly through pull requests (called "merge requests" on some platforms).</p>
<p>The biggest problem with using patches is that it is hard to apply a patch when your working directory does not match the sender's previous commit. </p>
<p>Losing the commit history makes it difficult to resolve conflicts. You will better understand it as you dive deeper into the process of <code>git apply</code>.</p>
<h3 id="heading-a-simple-apply">A simple apply</h3>
<p>What does it mean to apply a patch? It's time to try it out!</p>
<p>Take the output of <code>git diff</code>:</p>
<p><code>git diff HEAD~1 HEAD</code></p>
<p>And store it in a file:</p>
<p><code>git diff HEAD~1 HEAD &gt; my_patch.patch</code></p>
<p>And <code>reset</code> to undo the last commit:</p>
<p><code>git reset –hard HEAD~1</code></p>
<p>If you are not completely comfortable with <code>git reset</code>, check <a target="_blank" href="https://www.freecodecamp.org/news/save-the-day-with-git-reset/">a previous post that covered it in depth</a>. In short, it allows us to "reset" the state of where <code>HEAD</code> is pointing to, as well as the state of the index and of the working dir. </p>
<p>In the example above, they are all set to the state of <code>HEAD~1</code>, or <code>Commit 3</code> in the diagram.</p>
<p>So after running the reset command, the contents of the file are as follows:</p>
<p><code>nano new_file.txt</code></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-198.png" alt="Image" width="600" height="400" loading="lazy">
<em>The patch you are about to apply, as generated by <code>git diff</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>And we will apply this patch:</p>
<p><code>nano my_patch.patch</code></p>
<p>This patch tells git to find the lines:</p>
<pre><code class="lang-txt">This is a new file
With new content!
</code></pre>
<p>That used to be lines <code>1</code> and <code>2</code>, and add a line <code>START</code> right above them.</p>
<p>Run this command to apply the patch:</p>
<p><code>git apply my_patch.patch</code></p>
<p>And as a result, you get this version of your file, just like the commit you have created before:</p>
<p><code>nano new_file.txt</code></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-199.png" alt="Image" width="600" height="400" loading="lazy">
_The contents of <code>new_file.txt</code> after applying the patch (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)_</p>
<h2 id="heading-understanding-the-context-lines">Understanding the Context Lines 🧑🏻‍🏫</h2>
<p>To understand the importance of context lines, consider a more advanced scenario. What happens if line numbers have changed since you created the patch file? 🤔</p>
<p>To test, start by creating another file:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-201.png" alt="Image" width="600" height="400" loading="lazy">
_Creating another file - <code>another_file.txt</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)_</p>
<p>Stage and commit this file:</p>
<p><code>git add another_file.txt</code></p>
<p><code>git commit -m "another file"</code></p>
<p>Now, change this file by adding a new line, and also erasing the line before the last one:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-202.png" alt="Image" width="600" height="400" loading="lazy">
_Changes to <code>another_file.txt</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)_</p>
<p>Observe the difference between the original version of the file and the version including your changes:</p>
<p><code>git diff -- another_file.txt</code></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-203.png" alt="Image" width="600" height="400" loading="lazy">
_The output for <code>git diff -- another_file.txt</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)_</p>
<p>(Using <code>-- another_file.txt</code> tells Git to run the command <code>diff</code>, taking into consideration only <code>another_file.txt</code>, so you don't get the diff for other files.)</p>
<p>Store this diff into a patch file:</p>
<p><code>git diff -- another_file.txt &gt; new_patch.patch</code></p>
<p>Now, reset your state to that before introducing the changes:
<code>git reset --hard</code></p>
<p>If you were to apply <code>new_patch.patch</code> now, it would simply work. Consider a more interesting case.</p>
<p>Modify <code>another_file.txt</code> again by adding a new line at the beginning:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-209.png" alt="Image" width="600" height="400" loading="lazy">
_Adding a new line at the beginning of <code>another_file.txt</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)_</p>
<p>As a result, the line numbers are different from the original version where the patch has been created. Consider the patch you created before:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-210.png" alt="Image" width="600" height="400" loading="lazy">
_<code>new_patch.patch</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)_</p>
<p>It assumes that the line <code>So this is a file</code> is the first line in <code>another_file.txt</code>, which is no longer the case. So...will <code>git apply</code> work?</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-211.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git apply</code> doesn't apply the patch (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>Well, no. The patch does not apply. But why? Is it really because of the change in line numbers?</p>
<p>To better understand the process Git is performing, you can add the <code>--verbose</code> flag to <code>git apply</code>, like so:</p>
<p><code>git apply --verbose new_patch.patch</code></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-213.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git apply --verbose</code> shows the process Git is taking to apply the patch (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)</em></p>
<p>It seems that Git searched for the entire contents of the file, specifically, including the line <code>So we are writing an example</code>, that no longer exists in the file. As Git cannot find this line, it cannot apply the patch.</p>
<p>Why does Git look for the entire file? By default, Git looks for <code>3</code> lines of context before and after each change introduced in the patch. If you take three lines before and after the added line, and three lines before and after the deleted line (actually only one line after, as no other lines exist) - you get to the entire file.</p>
<p>You can ask Git to rely on fewer lines of context, using the <code>-C</code> argument. For example, to ask Git to look for <code>1</code> line of the surrounding context, run the following command:</p>
<p><code>git apply -C1 new_patch.patch</code></p>
<p>The patch applies cleanly! 🎉</p>
<p>Why is that? Consider the patch again:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/image-210.png" alt="Image" width="600" height="400" loading="lazy">
_<code>new_patch.patch</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=eG9oAroMcPk">Brief</a>)_</p>
<p>When applying the patch with the <code>-C1</code> option, Git is looking for the lines:</p>
<pre><code class="lang-text">It has some really nice lines
Like this one
</code></pre>
<p>in order to add the line <code>!!!This is the new line I am adding!!!</code> between these two lines. These lines exist (and, importantly, they appear one right after the other). So Git can successfully add the line between them, even though the line numbers changed.</p>
<p>Similarly, Git would look for the lines:</p>
<pre><code class="lang-text">And we are now learning about Git
So we are writing an example
Git is lovely!
</code></pre>
<p>As Git can find these lines, Git can erase the middle one.</p>
<p>If we changed one of these lines, say, changed <code>And we are now learning about Git</code> to <code>And we are now learning about patches in Git</code>, then Git would not be able to find the string above, and thus the patch would not apply.</p>
<h1 id="heading-recap">Recap</h1>
<p>In this post, you learned what a diff is, and the difference between a diff and a patch. You learned how to generate various patches using different switches for <code>git diff</code>. </p>
<p>You also learned what the output of <code>git diff</code> looks like, and how it is constructed. Ultimately, you learned how patches are applied, and specifically the importance of context.</p>
<p>Understanding diffs is a major milestone for understanding many other processes within Git - for example, merging or rebasing. </p>
<p>In future tutorials, you will use your knowledge from this post to dive into these other areas of Git.</p>
<h1 id="heading-about-the-author">About the Author</h1>
<p><a target="_blank" href="https://www.linkedin.com/in/omer-rosenbaum-034a08b9/">Omer Rosenbaum</a> is <a target="_blank" href="https://swimm.io/">Swimm</a>’s Chief Technology Officer. He's the author of the Brief <a target="_blank" href="https://youtube.com/@BriefVid">YouTube Channel</a>. He's also a cyber training expert and founder of Checkpoint Security Academy. He's the author of <a target="_blank" href="https://data.cyber.org.il/networks/networks.pdf">Computer Networks (in Hebrew)</a>. You can find him on <a target="_blank" href="https://twitter.com/Omer_Ros">Twitter</a>.</p>
<h1 id="heading-additional-references">Additional References</h1>
<ul>
<li><a target="_blank" href="https://www.youtube.com/playlist?list=PL9lx0DXCC4BNUby5H58y6s2TQVLadV8v7">Git Internals YouTube playlist — by Brief</a>.</li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/git-internals-objects-branches-create-repo/">Omer's previous post about Git internals.</a></li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Learn Wireshark – Computer Networking Tutorial ]]>
                </title>
                <description>
                    <![CDATA[ In this post, you will learn about the single most important and useful tool in Computer Networks – Wireshark. This post relies on basic knowledge of computer networks. Be sure to check my previous post about the five layers model if you need a refre... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/learn-wireshark-computer-networking/</link>
                <guid isPermaLink="false">66c17c3dea5637f064224a0a</guid>
                
                    <category>
                        <![CDATA[ computer network ]]>
                    </category>
                
                    <category>
                        <![CDATA[ computer networking ]]>
                    </category>
                
                    <category>
                        <![CDATA[ information security ]]>
                    </category>
                
                    <category>
                        <![CDATA[ #infosec ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Omer Rosenbaum ]]>
                </dc:creator>
                <pubDate>Mon, 23 Jan 2023 23:35:33 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/01/Computer-Networks-Ethernet--3-.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In this post, you will learn about the single most important and useful tool in Computer Networks – Wireshark.</p>
<p>This post relies on basic knowledge of computer networks. Be sure to check my <a target="_blank" href="https://www.freecodecamp.org/news/the-five-layers-model-explained/">previous post about the five layers model</a> if you need a refresher.</p>
<h1 id="heading-what-is-wireshark">What is Wireshark?</h1>
<p>Wireshark is a sniffer, as well as a packet analyzer.</p>
<p>What does that mean?</p>
<p>You can think of a <strong>sniffer</strong> as a measuring device. We use it to examine what’s going on inside a network cable, or in the air if we are dealing with a wireless network. A sniffer shows us the data that passes through our network card.</p>
<p>But Wireshark does more than that. A sniffer could just display a stream of bits - ones and zeroes, that the network card sees. Wireshark is also a <strong>packer analyzer</strong> that displays lots of meaningful data about the frames that it sees.</p>
<p>Wireshark is an open-source and free tool, and is widely used to analyze network traffic.</p>
<p>Wireshark can be helpful in many cases. It might be helpful for debugging problems in your network, for instance – if you can’t connect from one computer to another, and want to understand what’s going on. </p>
<p>It can also help programmers. For example, imagine that you were implementing a chat program between two clients, and something was not working. In order to understand what exactly is being sent, you may use Wireshark to see the data transmitted over the wire.</p>
<p>So, let’s get to know Wireshark.</p>
<h1 id="heading-how-to-download-and-install-wireshark">How to Download and Install Wireshark</h1>
<p>Start by downloading Wireshark from its official website:</p>
<p><a target="_blank" href="https://www.wireshark.org/#download">https://www.wireshark.org/#download</a></p>
<p>Follow the instructions on the installer and you should be good to go.</p>
<h1 id="heading-how-to-sniff-traffic-with-wireshark">How to Sniff Traffic with Wireshark</h1>
<p>Launch Wireshark, and start by sniffing some data. For that, you can hit <code>Ctrl+K</code> (PC) or <code>Cmd+K</code> (Mac)  to get the <code>Capture Options</code> window. Notice that you can reach this window in other ways. You can go to <code>Capture-&gt;Options</code>. Alternatively, you can click the <code>Capture Options</code> icon.</p>
<p>I encourage you to use keyboard shortcuts and get comfortable with them right from the start, as they'll allow you to save time and work more efficiently.</p>
<p>So, again, I’ve used <code>Ctrl+K</code> (or <code>Cmd+K</code>) and got this screen:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-208.png" alt="Image" width="600" height="400" loading="lazy">
<em>The <code>Capture Options</code> window in Wireshark (Source: <a target="_blank" href="https://www.youtube.com/watch?v=nbTJXIdEzlo">Brief</a>)</em></p>
<p>Here we can see a list of interfaces, and I happen to have quite a few. Which one is relevant? If you’re not sure at this point, you can look at the <code>Traffic</code> column, and see which interfaces currently have traffic. </p>
<p>Here we can see that <code>Wi-Fi 3</code> has got traffic going through it, as the line is high. Select the relevant network interface, and then hit <code>Enter</code>, or click the button <code>Start</code>.</p>
<p>Let Wireshark sniff the network for a bit, and then stop the sniff using <code>Ctrl+E</code> / <code>Cmd+E</code>. Again, this can be achieved in other ways – such as going to <code>Capture-&gt;Stop</code> or clicking the <code>Stop</code> icon.</p>
<p>Consider the different sections:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-210.png" alt="Image" width="600" height="400" loading="lazy">
<em>Wireshark's sections (Source: <a target="_blank" href="https://www.youtube.com/watch?v=nbTJXIdEzlo">Brief</a>)</em></p>
<p>The section marked in red includes Wireshark’s menu, with all kinds of interesting options.</p>
<p>The main toolbar is marked in blue, providing quick access to some items from the menu.</p>
<p>Next, marked in green, is the <strong>display filter</strong>. We will get back to it shortly, as this is one of the most important features of Wireshark.</p>
<p>Then follows:</p>
<h1 id="heading-the-packet-list-pane">The Packet List Pane</h1>
<p>The packet list pane is marked in orange. It displays a short summary of each packet captured.</p>
<p>(Note: the term Frame belongs to a sequence of bytes in the <a target="_blank" href="https://www.freecodecamp.org/news/the-five-layers-model-explained/">Data Link layer</a>, while a Packet is a sequence of bytes from the <a target="_blank" href="https://www.freecodecamp.org/news/the-five-layers-model-explained/">Network layer</a>. In this post I will use the terms interchangeably, though to be accurate, every packet is a frame, but not every frame is a packet, as there are frames that don't hold network layer data.)</p>
<p>As you can see in the image above, we have a few columns here:</p>
<p>NUMBER (No.) – The number of the packet in the capture file. This number won’t change, even if we use filters. This is just a sequential number – the first frame that you have sniffed gets the number 1, the second frame gets the number 2, and so on.</p>
<p>Time – The timestamp of the packet. It shows how much time has passed from the very first packet we have sniffed until we sniffed the packet in question. Therefore, the time for packet number 1 is always 0.</p>
<p>Source – The address where this packet is coming from. Don’t worry if you don’t understand the format of the addresses just yet, we will cover different addresses in future tutorials.</p>
<p>Destination – The address where this packet is going.</p>
<p>Protocol – The protocol name in a short version. This will be the top protocol – that is, the protocol of the highest layer.</p>
<p>Length – The length of each packet, in bytes.</p>
<p>Info – Additional information about the packet content. This changes according to the protocol.</p>
<p>By clicking on packets in this pane, you control what is displayed in the other two panes which I will now describe.</p>
<h1 id="heading-the-packet-details-pane">The Packet Details Pane</h1>
<p>Click on one of the captured packets. In the example below I clicked on packet number 147:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-211.png" alt="Image" width="600" height="400" loading="lazy">
<em>Selecting a specific packet changes the packet details pane (Source: <a target="_blank" href="https://www.youtube.com/watch?v=nbTJXIdEzlo">Brief</a>)</em></p>
<p>Now, the <strong>packet details pane</strong> displays the packet selected in the packet list pane in more detail. You can see the layers here. </p>
<p>In the example above, we have Ethernet II as the second layer, IPv4 as the third layer, UDP as the fourth layer, and some data as a payload.</p>
<p>When we click on a specific layer, we actually see the <strong>header</strong> of that layer.</p>
<p>Notice that we don’t see the first layer on its own. As a reminder, the first layer is responsible for <strong>transmitting a single bit</strong> – 0 or 1 – over the network (if you need a refresher about the different layers, <a target="_blank" href="https://www.freecodecamp.org/news/the-five-layers-model-explained/">check out this post</a>).</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-215.png" alt="Image" width="600" height="400" loading="lazy">
<em>The packet bytes pane in Wireshark (Source: <a target="_blank" href="https://www.youtube.com/watch?v=nbTJXIdEzlo">Brief</a>)</em></p>
<p>Below the packet details pane, we have the <strong>packet bytes pane</strong>. It displays the data from the packet selected in the packet list pane. This is the actual data being sent over the wire. We can see the data in hexadecimal base, as well as ASCII form.</p>
<h1 id="heading-how-to-use-the-display-filter">How to Use the Display Filter</h1>
<p>Wireshark has many different functions, and today we will focus on one thing – the display filter. </p>
<p>As you can see, once you start sniffing data, you get a LOT of traffic. But you definitely don’t want to look at everything. </p>
<p>Recall the example from before – using Wireshark in order to debug a chat program that you’ve implemented. In that case, you would like to see the traffic related to the chat program only.</p>
<p>Let’s say I want to filter only messages sent by the source address of frame number 149 ( <code>192.168.1.3</code> ). I will cover IP addresses in future posts, but for now you can see that it consists four numbers, delimited by a dot:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-217.png" alt="Image" width="600" height="400" loading="lazy">
<em>The <code>display filter</code> in Wireshark (Source: <a target="_blank" href="https://www.youtube.com/watch?v=nbTJXIdEzlo">Brief</a>)</em></p>
<p>Now, even if you don’t know how to filter only packets sent from this IP address, you can use Wireshark to show you how it’s done. </p>
<p>For that, go to the right field we would like to filter – in this case, the source IP address. Then right click -&gt; and choose <code>filter -&gt; Apply as Filter</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-218.png" alt="Image" width="600" height="400" loading="lazy">
<em>Applying a display filter (Source: <a target="_blank" href="https://www.youtube.com/watch?v=nbTJXIdEzlo">Brief</a>)</em></p>
<p>After applying the filter, you only see packets that have been sent from this address. Also, you can look at the display filter line and see the command used. In this way, you can learn about the display filter syntax (in this example, it is <code>ip.src</code> for the IP source address field):</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-219.png" alt="Image" width="600" height="400" loading="lazy">
<em>Applying a display filter (Source: <a target="_blank" href="https://www.youtube.com/watch?v=nbTJXIdEzlo">Brief</a>)</em></p>
<p>Now, try to filter only packets that have been sent from this address, and <strong>to</strong> the address <code>172.217.16.142</code> (as in Frame 130 in the image above). How would you do that?</p>
<p>Well, you could go to the relevant field – in this case, the IP destination address. Now, right click -&gt; <code>Apply as Filter</code> -&gt; and select <code>...and Selected</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-220.png" alt="Image" width="600" height="400" loading="lazy">
<em>Applying a display filter (Source: <a target="_blank" href="https://www.youtube.com/watch?v=nbTJXIdEzlo">Brief</a>)</em></p>
<p>If you look at the display filter line after applying this filter:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-221.png" alt="Image" width="600" height="400" loading="lazy">
<em>Applying a display filter (Source: <a target="_blank" href="https://www.youtube.com/watch?v=nbTJXIdEzlo">Brief</a>)</em></p>
<p>You can also learn that you can use the <code>&amp;&amp;</code> operand in order to perform <code>and</code>. You could also write the word <code>and</code>, instead, and get the same result.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-222.png" alt="Image" width="600" height="400" loading="lazy">
<em>Applying multiple conditions using <code>&amp;amp;&amp;amp;</code> or <code>and</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=nbTJXIdEzlo">Brief</a>)</em></p>
<h1 id="heading-how-to-use-wireshark-to-research-the-ping-utility">How to Use Wireshark to Research the Ping Utility</h1>
<p><strong>Ping</strong> is a useful utility to check for remote servers’ connectivity.</p>
<p><a target="_blank" href="https://www.howtogeek.com/235101/10-ways-to-open-the-command-prompt-in-windows-10/">This page</a> explains how to use <code>ping</code> in Windows, and <a target="_blank" href="https://macpaw.com/how-to/use-terminal-on-mac">this page</a> explains how to do that in OSX.</p>
<p>Now, we can try to <code>ping &lt;address&gt;</code> using the command line. By default, ping sends <code>4</code> requests and waits for a <strong>pong</strong> answer. If we want it to send a single request, we could use <code>-n 1</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-224.png" alt="Image" width="600" height="400" loading="lazy">
<em>Using the command line to ping Google (Source: <a target="_blank" href="https://www.youtube.com/watch?v=nbTJXIdEzlo">Brief</a>)</em></p>
<p>You can see that Google has responded. The time it took for the message to return was 92 milliseconds. We will learn about the meaning of TTL in future posts.</p>
<p>Ping is useful to determine whether a remote service is available, and how fast it is to reach that service. If it takes a very long time to reach a reliable server such as google.com, we might have a connectivity problem.</p>
<h2 id="heading-try-it-yourself">Try it yourself</h2>
<p>Now, try to use Wireshark to answer the following questions:</p>
<p>1) What protocol does the <strong>ping</strong> utility use?</p>
<p>2) Using only Wireshark, compute the RTT (Round Trip Time) – how long it took since your ping request was sent and until the ping reply was received?</p>
<p>Next, run the following command:</p>
<p><code>ping -n 1 -l 342 www.google.com</code></p>
<p>3) What is the main difference between the packet sent by this command, and the packet sent by the previous command? Where in Wireshark can you see this difference, inspecting the packets?  </p>
<p>4) What is the content (data) provided in the ping request packet? What is the content provided in the ping response packet?</p>
<h2 id="heading-lets-solve-it-together">Let's solve it together</h2>
<p>So the first question is:</p>
<h3 id="heading-what-protocol-does-the-ping-utility-use">What protocol does the ping utility use?</h3>
<p>To answer that question, start sniffing in Wireshark, and simply run the <code>ping</code> command. Stop the sniff, and consider the packets pane:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-225.png" alt="Image" width="600" height="400" loading="lazy">
<em>Sniffing while running ping (source: <a target="_blank" href="https://www.youtube.com/watch?v=B5iEmaZK9xI&amp;t=2s">Brief</a>)</em></p>
<p>Wireshark marks the packets as <code>Echo (ping) request</code> and <code>Echo (ping) reply</code>.</p>
<p>Considering these packets, we can see they consist of <code>Ethernet</code> for the Data Link layer (though that may differ from one network to another), <code>IPv4</code> as the Network layer, and then <code>ICMP</code> as the protocol for Ping itself. So the answer we found is: <strong>ICMP</strong>.</p>
<p>Next question:</p>
<h3 id="heading-using-only-wireshark-compute-the-round-trip-time">Using only Wireshark, compute the Round Trip Time</h3>
<p>Looking at the captured packets, we can see the <code>Time</code> column, and subtract the time of the Pong packet ( <code>7.888...</code> ) from the time of the Ping packet ( <code>7.796...</code>).</p>
<p>So in this case the RTT was: <strong>92 ms</strong>. Of course, the value can be different when you run the <code>ping</code> utility.</p>
<h3 id="heading-what-is-the-main-difference-between-the-packet-sent-by-this-command-and-the-packet-sent-by-the-previous-command">What is the main difference between the packet sent by this command, and the packet sent by the previous command?</h3>
<p>For question number 3, we are asked to run the following command:</p>
<blockquote>
<p>ping -n 1 -l 342 www.google.com</p>
</blockquote>
<p>Looking at the first run of <code>ping</code>, we can see the length of the packets are <code>74</code> bytes:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-225.png" alt="Image" width="600" height="400" loading="lazy">
<em>Sniffing while running ping (source: <a target="_blank" href="https://www.youtube.com/watch?v=B5iEmaZK9xI&amp;t=2s">Brief</a>)</em></p>
<p>Observing the packets sent after running <code>ping</code> with the <code>-l 342</code> argument, we can see that the value is bigger:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-228.png" alt="Image" width="600" height="400" loading="lazy">
<em>Sniffing while running ping (source: <a target="_blank" href="https://www.youtube.com/watch?v=B5iEmaZK9xI&amp;t=2s">Brief</a>)</em></p>
<p>So the main difference is the amount of bytes sent as the data.</p>
<p>Question number four:</p>
<h3 id="heading-what-is-the-content-data-provided-in-the-ping-request-packet">What is the content (data) provided in the ping request packet?</h3>
<h3 id="heading-what-is-the-content-provided-in-the-ping-response-packet">What is the content provided in the ping response packet?</h3>
<p>Click on the request packet to observe the data sent:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-230.png" alt="Image" width="600" height="400" loading="lazy">
<em>Observing the data sent by the <code>ping</code> utility (source: <a target="_blank" href="https://www.youtube.com/watch?v=B5iEmaZK9xI&amp;t=2s">Brief</a>)</em></p>
<p>The answer for the ping request is <code>a</code> through <code>w</code>, over and over again.</p>
<p>Regarding the ping response – it is the same as the request.</p>
<h1 id="heading-summary">Summary</h1>
<p>Wireshark is a wonderful tool for anyone working with Computer Networks. It can help you understand how protocols work and also help you debug applications or network issues. </p>
<p>As you have seen, you can learn how things work by simply running Wireshark in the background while using them and then inspect the traffic. With this tool under your belt, the sky is the limit. </p>
<p>In future tutorials, we will also rely on our knowledge of Wireshark and use it to further understand various concepts in computer networks.</p>
<h2 id="heading-about-the-author">About the Author</h2>
<p><a target="_blank" href="https://www.linkedin.com/in/omer-rosenbaum-034a08b9/">Omer Rosenbaum</a> is <a target="_blank" href="https://swimm.io/">Swimm</a>’s Chief Technology Officer. He's the author of the Brief <a target="_blank" href="https://youtube.com/@BriefVid">YouTube Channel</a>. He's also a cyber training expert and founder of Checkpoint Security Academy. He's the author of <a target="_blank" href="https://data.cyber.org.il/networks/networks.pdf">Computer Networks (in Hebrew)</a>. You can find him on <a target="_blank" href="https://twitter.com/Omer_Ros">Twitter</a>.</p>
<h3 id="heading-additional-references">Additional References</h3>
<ul>
<li><a target="_blank" href="https://www.youtube.com/playlist?list=PL9lx0DXCC4BMS7dB7vsrKI5wzFyVIk2Kg">Computer Networks Playlist - on my Brief channel</a>.</li>
<li><a target="_blank" href="https://www.wireshark.org/">Wireshark's website</a>.</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Handle Errors in Computer Networks ]]>
                </title>
                <description>
                    <![CDATA[ There are some magical things about the Internet, and one thing in particular is that it works. In spite of so many obstacles, we can deliver our packets over the globe, and do so fast. Even more specifically, one amazing thing about the Internet is ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-handle-errors-in-computer-networks/</link>
                <guid isPermaLink="false">66c17c3858ee0865d2671b5d</guid>
                
                    <category>
                        <![CDATA[ computer network ]]>
                    </category>
                
                    <category>
                        <![CDATA[ computer networking ]]>
                    </category>
                
                    <category>
                        <![CDATA[ error ]]>
                    </category>
                
                    <category>
                        <![CDATA[ error handling ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Omer Rosenbaum ]]>
                </dc:creator>
                <pubDate>Wed, 18 Jan 2023 16:05:43 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/01/Copy-of-Computer-Networks-Hub-Switch.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>There are some magical things about the Internet, and one thing in particular is that it works. In spite of so many obstacles, we can deliver our packets over the globe, and do so fast.</p>
<p>Even more specifically, one amazing thing about the Internet is its ability to handle errors. </p>
<p>What do I mean by errors? When a packet or a frame is received by a machine, we say it contains an error if the data that had been sent is not the data that was received. For instance, a single <code>1</code> was mistakenly received as a <code>0</code> after its transmission. </p>
<p>This can happen due to many different reasons. Perhaps there was some disturbance in the wire where the data was transmitted – say, a child rode her bicycle over the wire. Perhaps there was some collision in the air as many people transmitted at once. Maybe it was a device's error.</p>
<p>Regardless of the specific reason, you still get valid data on the Internet. Without handling errors, you may read the last sentence and instead of <code>errors</code> read <code>errbbb</code>. Weird, isn't it? So how does the Internet handle errors?</p>
<p>There are two main approaches for handling errors – detection, and correction. We shall start by describing detection, and then talk about correction.</p>
<h1 id="heading-what-is-error-detection">What is Error Detection?</h1>
<p>When dealing with error detection, we are looking for a boolean result – <code>True</code>, or <code>False</code>. Is the frame/packet valid, or not. That is all. We don’t want to know where the error occurred. If the frame is invalid, we will simply drop it.</p>
<p>So when the receiver receives a frame, they will determine whether an error has occurred. If the frame is valid, they will read it. If the frame contains errors - the receiver will drop it.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-84.png" alt="Image" width="600" height="400" loading="lazy">
_Error Detection: we only want to know if the frame/packet is valid or not. (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p>One method for error detection is using a <strong>checksum</strong>. A common implementation of a checksum is called <strong>CRC – Cyclic Redundancy Check</strong>. </p>
<p>In this post we will not trouble ourselves with the mathematical implementation of CRCs in the real world (if you're interested, check out <a target="_blank" href="https://en.wikipedia.org/wiki/Cyclic_redundancy_check">Wikipedia</a>). Rather, we'll simply try to understand the concept. To do so, let’s implement a very simple checksum mechanism ourselves.</p>
<p>Consider a protocol for transmitting 10-digit phone numbers between endpoints. This protocol is extremely simple: each packet includes exactly 10 bytes, each one representing a digit. For example, a packet might include the following digits:</p>
<p><code>5551234567</code></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-85.png" alt="Image" width="600" height="400" loading="lazy">
_A packet with a payload of 10 digits (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p>For simplicity's sake, we will omit the headers of the packet and focus solely on the payload. </p>
<p>Now, we will add a checksum. Say that we <strong>add</strong> all the digits. So in this example, we would calculate <code>5</code> + <code>5</code> +<code>5</code> +<code>1</code>+… all the way through <code>7</code>. We would get <code>43</code>. This would be our checksum value.</p>
<p>Now, the sender won’t only send the phone number, but also the checksum value right after it. In this example, the sender would send:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-86.png" alt="Image" width="600" height="400" loading="lazy">
_The packet's data is followed by a checksum. (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p>Now, as the receiver, you can do the same thing. You will read the phone number, and calculate the checksum. You will add the digits, and get <code>43</code>. </p>
<p>Since you've received the correct result (that is, your calculation based on the data matches the checksum value sent in the packet), you can assume that the frame is valid.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-89.png" alt="Image" width="600" height="400" loading="lazy">
_The sender compares their calculated checksum value and the checksum in the packet. If the values match, the packet is assumed to be valid (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p>What happens in case of an error? 🤔</p>
<p>Let’s say, for instance, that the digit <code>2</code> was replaced by an <code>8</code>. Now, even though the sender sent the same stream as before ( <code>555123456743</code> ), you, as the receiver, see something a bit different:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-90.png" alt="Image" width="600" height="400" loading="lazy">
_A packet containing an error (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p>Now, you are calculating the checksum, adding all the digits. You get <code>49</code>. Since this value is different from the checksum value specified in the original frame, <code>43</code>, the frame is considered to be invalid and you drop it.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-91.png" alt="Image" width="600" height="400" loading="lazy">
_The sender compares their calculated checksum value and the checksum in the packet. If the values don't match, the packet is assumed to be invalid (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p>Are there problems with this method? 🤔</p>
<p>Yes, there are. Consider, for example, what happens if there are two errors – and instead of the original stream ( <code>555123456743</code> ), you receive the following:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-92.png" alt="Image" width="600" height="400" loading="lazy">
_A packet received with two errors, resulting in the stream <code>456123456743</code> (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p>What happens when you add the digits?</p>
<p>Even though the digits are not the same as the original packet, the checksum will remain correct, and the frame will be regarded as valid.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-93.png" alt="Image" width="600" height="400" loading="lazy">
_Despite the errors, the checksum value happens to be correct, resulting in a false assumption that the packet is valid (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p>Real checksum functions, such as CRCs, are of course much better implemented than the one in our example – but in extremely rare cases, such problems may occur. </p>
<p>Notice that using this kind of method, error detection, we don’t know where the problem occurred, but only whether the frame is valid or not. If the checksum value is invalid, we assume that the frame is invalid and drop it.</p>
<h1 id="heading-what-is-error-correction">What is Error Correction?</h1>
<p>As mentioned earlier, detection is not the only way to handle errors. Another approach might be to find the error and correct it. How can we do that?</p>
<p>An extremely simple way would be to transmit the data many times – let’s say, three times. For example, the stream <code>5551234567</code> would be transmitted as follows:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-94.png" alt="Image" width="600" height="400" loading="lazy">
_Sending the same data multiple times (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p>So we basically sent the data three times.</p>
<p>Now, in case of an error in one digit, the receiver can look at the other two digits, and choose the one that appears two times out of three.</p>
<p>So, for instance, if we had a problem and <code>2</code> was replaced with an <code>8</code>, the receiver would get this stream:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-95.png" alt="Image" width="600" height="400" loading="lazy">
_An error in one of the occurrences of the data (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p>Now, as a receiver, you can say: “I have <code>2</code>, <code>8</code>, <code>2</code>… so it was probably <code>2</code> in the original message”.</p>
<p>Is this problematic? Well, in some rare cases, we might get the same error twice. So it is possible, even though unlikely, that two of the original twos have been received as eights.</p>
<p>So while the sender sent this stream:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-94.png" alt="Image" width="600" height="400" loading="lazy">
_Sending the same data multiple times (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p>The first <code>2</code> was mistakenly read as an <code>8</code>, and also the second <code>2</code> was received as an <code>8</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-96.png" alt="Image" width="600" height="400" loading="lazy">
_Two identical errors; Rare, but possible (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p> Now, it looks as if the original message included an <code>8</code>, and not a <code>2</code>.</p>
<p>What can you do in order to lower the probability of such scenario?</p>
<p>The most simple solution would be to simply send the data even more times. Let’s say, five times. So now we duplicate all the data, and send it 5 times in total… </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-97.png" alt="Image" width="600" height="400" loading="lazy">
_Sending the data five(!) times (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p>Now, say that two errors occurred, and again two of the <code>2</code> digits were replaced with <code>8</code>s.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-98.png" alt="Image" width="600" height="400" loading="lazy">
_Two identical errors; Rare, but possible (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p>Clearly, it is very unlikely to get the same error twice, but even in this case, we still get <code>2</code> three times, so as the receiver you can tell, with a high probability, that the original message contained a <code>2</code>, rather than an <code>8</code>.</p>
<h2 id="heading-whats-the-overhead">What's the Overhead?</h2>
<p>Now would be a good time to introduce the term <strong>overhead</strong>. When we say overhead, we basically mean data or time needed to convey the actual message. Let’s first understand what this term means in general, and then consider it in the context of handling errors.</p>
<p>Let’s say that I have a lesson to teach in my university. My goal is to teach the lesson itself, which is also called the <strong>payload</strong> in that context – that is, the actual data or message I would like to convey.</p>
<p>In order to teach the lesson, or to convey the payload, I first have to physically get to the university – so I get out of my home, walk to the bus station, wait for the bus, take the bus, get off the bus, walk to the building, wait for the lesson to start – and only then do I actually get to teach the lesson. </p>
<p>This entire process is <strong>overhead</strong> that I have to pay in order to deliver the <strong>payload</strong>, in this case – to teach the lesson.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-99.png" alt="Image" width="600" height="400" loading="lazy">
_Overhead and Payload are two extremely important terms in Computer Networks (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p>The same applies in computer networks. Our <strong>payload</strong> is the data, and there is always some <strong>overhead</strong> associated with sending it. </p>
<h2 id="heading-back-to-handling-errors">Back to Handling Errors</h2>
<p>In the context here – sending the data three times, as suggested earlier, means that for every byte of payload we have two bytes of overhead. If we send the data five times, then for every byte of payload, we have four bytes of overhead. That’s a LOT!</p>
<p>Consider error <em>detection</em>, on the other hand. In our example protocol for sending phone numbers, how much overhead did we have?</p>
<p>Recall that for every ten-digit phone number, that is ten bytes, we included a two-digit checksum value. In other words, we had two bytes of overhead for ten bytes of payload. It is clear that in our example, error detection yields much smaller overhead in comparison to error correction.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-100.png" alt="Image" width="600" height="400" loading="lazy">
_In the sample protocol, for every ten-digit phone number (ten bytes of payload), we included a two-digit checksum value (two bytes of overhead) (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p>There are better ways to achieve error correction with high accuracy than to simply send the data so many times, but they are more complicated and out of scope for this post. Even with very complicated error correction techniques, they still require lots of overhead when compared to error detection.</p>
<p>Also, notice that except for the bytes sent as overhead in case of error correction, error detection is much simpler. </p>
<h1 id="heading-error-correction-vs-error-detection-which-is-better">Error Correction vs Error Detection – Which is Better?</h1>
<p>We already concluded that error detection is simpler, and with a smaller payload compared to error correction.</p>
<h3 id="heading-so-when-would-we-prefer-error-correction">So, when would we prefer error correction?</h3>
<p>One case might be when we have a one-way link. That is, a network where we can only transfer data in one direction. </p>
<p>For example, say you have a secret agent that you need to send a message to. The agent knows that they need to look up to the sky at exactly midnight, and they will see a series of flashes indicating the secret message. </p>
<p>The secret agent cannot reply, or their location and identity will be revealed. In addition, you don’t want to send the message over and over again, as not to draw much attention, and to make it harder for someone to intercept the message.</p>
<p>In this case, you definitely want your agent to receive the exact message that you’ve sent. Consider a case where you want to send them the message “do not place the bomb”. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-101.png" alt="Image" width="600" height="400" loading="lazy">
_A sensitive message for a secret agent (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p>Of course, you don’t want to risk the unfortunate scenario of the agent reading the message as “do <strong>now</strong> place the bomb”, due to an error.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-102.png" alt="Image" width="600" height="400" loading="lazy">
_An error may change the meaning of the message substantially (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p>If you use error <em>detection</em>, the agent might be aware that the message they received is invalid in case of an error, but they won’t be able to tell you that they need you to send the message again. As you want the agent to be able to read your message correctly and without sending any data back to us, error correction is preferred.</p>
<p>So, one-way link is one case where we prefer error correction. What about other cases?</p>
<p>Sometimes you just <em>can’t</em> send the data again, perhaps because it has been erased from the memory of your machine. That is, the data is deleted right after it has been sent. In this case, you'd clearly prefer error correction, as sending the data again, as we would do with error detection, is just impossible.</p>
<p>Also, if sending the data again is possible, but extremely expensive, error correction may be preferable. </p>
<p>For example, if you send a message to the moon, say, with a spaceship – it might be really expensive to send it over again in case of an error. Using error correction, you send the data only once and the receiver should be able to deal with it, even if an error occurred.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/01/image-103.png" alt="Image" width="600" height="400" loading="lazy">
_Cases where correction is preferred (<a target="_blank" href="https://www.youtube.com/watch?v=H_bYtVDF6T4&amp;ab_channel=Brief">Source: Brief</a>)_</p>
<p>In general, we prefer error correction when retransmitting the data is costly or impossible. </p>
<h3 id="heading-when-would-we-prefer-error-detection">When would we prefer error detection?</h3>
<p>Well, in case we can retransmit the data, we usually prefer error detection since it comes with very little overhead compared to error correction. Especially, when sending the data is relatively cheap.</p>
<p>For example, on the Internet, if an error occurs when you send a frame, no problem – you can simply send it again! </p>
<p>For example, when I covered <a target="_blank" href="https://www.freecodecamp.org/news/the-complete-guide-to-the-ethernet-protocol/">the Ethernet protocol in a previous post</a>, I mentioned that Ethernet protocol uses change detection, namely <code>CRC32</code> – that is, 32 bits (or 4 bytes) of a checksum for every frame. </p>
<p>Note that it doesn’t mean that error detection is simply better. It just better fits the Internet than error correction. As mentioned before, error correction is preferable in other cases.</p>
<h1 id="heading-wrapping-up">Wrapping Up</h1>
<p>In this tutorial, we discussed various methods for handling errors. We looked at <strong>error detection</strong>, where we only know whether a frame is valid or not. We also considered <strong>error correction</strong>, where the receiver can restore the correct value of an erroneous frame. We also introduced the term <strong>overhead</strong>. </p>
<p>We then understood why we use error detection on the Internet, rather than error correction. Stay tuned for more posts in this series about Computer Networks 💪🏻</p>
<h2 id="heading-about-the-author"><strong>About the Author</strong></h2>
<p><a target="_blank" href="https://www.linkedin.com/in/omer-rosenbaum-034a08b9/">Omer Rosenbaum</a> is <a target="_blank" href="https://swimm.io/">Swimm</a>’s Chief Technology Officer. He's the author of the Brief <a target="_blank" href="https://youtube.com/@BriefVid">YouTube Channel</a>. He's also a cyber training expert and founder of Checkpoint Security Academy. He's the author of <a target="_blank" href="https://data.cyber.org.il/networks/networks.pdf">Computer Networks (in Hebrew)</a>. You can find him on <a target="_blank" href="https://twitter.com/Omer_Ros">Twitter</a>.</p>
<h2 id="heading-additional-resources"><strong>Additional Resources</strong></h2>
<ul>
<li><a target="_blank" href="https://www.youtube.com/playlist?list=PL9lx0DXCC4BMS7dB7vsrKI5wzFyVIk2Kg">Computer Networks Playlist - on my Brief channel</a></li>
<li><a target="_blank" href="https://en.wikipedia.org/wiki/Cyclic_redundancy_check">CRC - Wikipedia</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/the-complete-guide-to-the-ethernet-protocol/">The Complete Guide to Ethernet Protocol</a></li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Use Scapy – Python Networking Tool Explained ]]>
                </title>
                <description>
                    <![CDATA[ In this post you will learn about an amazing tool named Scapy. Scapy is a Python library that enables us to send, sniff, and dissect network frames.  It is useful in a variety of use cases, one of which is to actually get some hands-on experience whe... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-use-scapy-python-networking/</link>
                <guid isPermaLink="false">66c17c3b675b2f6950fa0bfa</guid>
                
                    <category>
                        <![CDATA[ computer network ]]>
                    </category>
                
                    <category>
                        <![CDATA[ computer networking ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Omer Rosenbaum ]]>
                </dc:creator>
                <pubDate>Wed, 21 Dec 2022 21:02:17 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/12/Computer-Networks-Hub-Switch--1-.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In this post you will learn about an amazing tool named <strong>Scapy</strong>. Scapy is a Python library that enables us to send, sniff, and dissect network frames. </p>
<p>It is useful in a variety of use cases, one of which is to actually get some hands-on experience when you learn Computer Networks. Wouldn't it be great if, when <a target="_blank" href="https://www.freecodecamp.org/news/the-complete-guide-to-the-ethernet-protocol/">learning about Ethernet</a>, for example, you could create, send, sniff and parse Ethernet frames on your own? Scapy is the perfect tool for that.</p>
<p>In addition, you can use Scapy for creating networking-based applications, parsing network traffic to analyze data, and many other cases.</p>
<p>This post assumes you have some background knowledge in Computer Networks, for example about <a target="_blank" href="https://www.freecodecamp.org/news/the-five-layers-model-explained/">the layers model</a>. It also assumes you have some basic Python knowledge.</p>
<h1 id="heading-what-will-you-learn">What will you learn?</h1>
<p>In this post we will start from the very basics – what Scapy is, and how to install it. </p>
<p>You will learn how to sniff data and parse it with Scapy, and how to display it in a meaningful manner. </p>
<p>You will also learn how to create frames or packets, and how to send them. Altogether, you should have a new powerful tool under your belt.</p>
<h1 id="heading-how-to-install-scapy">How to Install Scapy</h1>
<p>To install Scapy, you can simply use <code>pip install scapy</code>.</p>
<p>If you run into trouble, simply follow <a target="_blank" href="https://scapy.readthedocs.io/en/latest/installation.html">the official documentation</a>.</p>
<h1 id="heading-how-to-use-scapy">How to Use Scapy</h1>
<p>For now, let’s open up the command line and type in <strong><code>scapy</code></strong>.</p>
<p>You should expect something like the following:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-83.png" alt="Image" width="600" height="400" loading="lazy">
_Running Scapy from the CLI (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Note that the warning messages are fine.</p>
<p>Since this is a Python environment, <em>dir</em>, <em>help</em>, and any other Python function for information retrieval are available for you. Of course, you can always combine Python code with your Scapy scripts.</p>
<h1 id="heading-how-to-work-with-packets-and-frames-in-scapy">How to Work with Packets and Frames in Scapy</h1>
<p>Packets and frames in Scapy are described by objects created by stacking different layers. So a packet can have a variable number of layers, but will always describe the sequence of bytes that have been sent (or are going to be sent) over the network.</p>
<p>Let's create a frame that consists of an Ethernet layer, with an IP layer on top:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-85.png" alt="Image" width="600" height="400" loading="lazy">
_Stacking Layers (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Look how easy that is! We’ve used the <code>/</code> operator in order to stack the IP layer on top of the Ethernet layer. </p>
<p>Note that when looking at this object, it only tells us non-default values. The type of Ethernet is <code>0x800</code> (in hexadecimal base) as this is the type when an IP layer is overloaded.</p>
<p>Let's look more deeply at the fields of the packet:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-86.png" alt="Image" width="600" height="400" loading="lazy">
_With the <code>show</code> method we can observe all fields of the frame (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Pretty cool! 😎</p>
<h1 id="heading-how-to-sniff-with-scapy">How to Sniff with Scapy</h1>
<p>Scapy also allows us to sniff the network by running the <strong>sniff</strong> command, like so:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-87.png" alt="Image" width="600" height="400" loading="lazy">
_Sniffing with the <code>sniff</code> command (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>After running <code>sniff</code> with <code>count=2</code>, Scapy sniffs your network until <code>2</code> frames are received. Then it returns – and in this case, the variable <code>packets</code> will store the frames that have been received.</p>
<p>The return value of sniff can be treated as a list. Therefore <code>packets[0]</code> will contain the first packet received, and <code>packets[1]</code> will contain the second:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-88.png" alt="Image" width="600" height="400" loading="lazy">
_The return value of <code>sniff</code> is an iterable, so it can be accessed as a list (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>A helper function <code>summary</code> is available too and will provide minimal information regarding the packet collection:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-89.png" alt="Image" width="600" height="400" loading="lazy">
_Using <code>summary</code> we can get some information of the packet collection (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>When looking at a specific frame, every layer or field can be accessed in a very elegant way. For instance, in order to get the <strong>IP</strong> section of the packet, we can access it like so:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-90.png" alt="Image" width="600" height="400" loading="lazy">
_Accessing a specific layer (and its payload) (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Note that this shows us everything from the IP layer and <em>above</em> (that is, the <em>payload</em> of the IP layer). Let's now observe the source Ethernet address of this frame:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-91.png" alt="Image" width="600" height="400" loading="lazy">
_Accessing a specific field (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Nice and easy. Now, you will learn how to run a specific command for every frame that you sniff. </p>
<p>First, create the callback function that will be run on every packet. For example, a function that will just print the source Ethernet address of the received frame:  </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-92.png" alt="Image" width="600" height="400" loading="lazy">
_Defining a callback function that receives a frame as its argument (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Now, we can pass this function to <code>sniff</code>, using the <code>prn</code> argument:  </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-93.png" alt="Image" width="600" height="400" loading="lazy">
_Run a callback function on every sniffed frame (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>The Ethernet addresses have been printed as a result of <code>print_source_ethernet</code> being executed, where every time, it receives a sniffed frame as an argument.<br>Note that you can write the same in Python using a lambda function, as follows:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-94.png" alt="Image" width="600" height="400" loading="lazy">
_Define the callback function using <code>lambda</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>If you prefer to write an explicit function like the one we’ve written above, that’s perfectly fine.</p>
<p>We usually want to <strong>filter</strong> traffic that we receive – and look only at relevant frames. Scapy’s <code>sniff</code> function can take a filter function as an argument – that is, a function that will be executed on every frame, and return a <code>boolean</code> value – whether this frame is filtered or not.</p>
<p>For example, say we would like to filter only frames that are sent to broadcast. Let’s write a simple filtering function that does just that:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-95.png" alt="Image" width="600" height="400" loading="lazy">
_A simple filtering function (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Now, we can use the <code>lfilter</code> parameter of <code>sniff</code> in order to filter the relevant frames:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-96.png" alt="Image" width="600" height="400" loading="lazy">
_Filtering frames based on a filter function (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>In order to clarify, let’s draw this process:  </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-97.png" alt="Image" width="600" height="400" loading="lazy">
_The process of sniffing and filtering with <code>lfilter</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>A frame <code>f</code> is received by the network card. It is then transferred to <code>lfilter(f)</code>. If the filter function returns <code>False</code>, <code>f</code> is discarded. If the filter returns <code>True</code>, then we execute the <code>prn</code> function on <code>f</code>.</p>
<p>So we can now combine these two arguments of <code>sniff</code>, namely <code>lfilter</code> and <code>prn</code>, and print the source address of every frame that is sent to the broadcast address. Let’s do this now using <code>lambda</code>:  </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-98.png" alt="Image" width="600" height="400" loading="lazy">
_Combining <code>lfilter</code> and <code>prn</code> 💪🏻 (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>This is equivalent to writing the following line, without lambda:</p>
<pre><code class="lang-py">sniff(count=<span class="hljs-number">2</span>, lfilter=is_broadcast_frame, prn=print_source_ethernet)
</code></pre>
<p>Readable, quick, and useful. Have you noticed that I love Scapy? 🥰</p>
<p>Alright, so far we’ve learnt how to sniff frames. When sniffing, we know how to filter only relevant frames, and how to execute a function on each filtered frame.</p>
<h1 id="heading-how-to-create-frames-in-scapy">How to Create Frames in Scapy</h1>
<p>To create a frame, simply create an Ethernet layer using <code>Ether()</code>. Then, stack additional layers on top of it. For instance, to stack an <code>IP</code> layer:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-99.png" alt="Image" width="600" height="400" loading="lazy">
_Creating a frame with two stacked layers (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Alternatively, we can just add raw data, as follows:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-100.png" alt="Image" width="600" height="400" loading="lazy">
_Using Raw data as the payload (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>If you want to specify a specific value, for instance the destination address of the frame, you can do it when you initially create the frame, like so:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-101.png" alt="Image" width="600" height="400" loading="lazy">
_Creating a frame and specifying specific values (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Or, we can modify the specific field after creation:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-102.png" alt="Image" width="600" height="400" loading="lazy">
_Modifying specific values (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>How can we look at the frame we’ve just created? One way is to observe a frame using <code>show</code>, as we’ve done above. Another way of looking at a frame is by looking at its byte stream, just like in Wireshark. You can do this using the <code>hexdump</code> function:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-103.png" alt="Image" width="600" height="400" loading="lazy">
_Viewing the hexadecimal byte stream (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Well, even better – we can just look at it inside Wireshark! By running <code>wireshark(frame)</code>.</p>
<h1 id="heading-how-to-send-frames-in-scapy">How to Send Frames in Scapy</h1>
<p>You can send frames using <code>sendp</code>, as follows:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-104.png" alt="Image" width="600" height="400" loading="lazy">
_Sending frames with <code>sendp</code> (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Let's sniff in wireshark while sending the frame to make sure that it’s actually sent:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/12/image-105.png" alt="Image" width="600" height="400" loading="lazy">
_Observing the frame we've sent using Wireshark (Source: <a target="_blank" href="https://www.youtube.com/watch?v=f0vpwwNAcdI&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Note that we use <code>sendp</code> only when we send an entire frame, using the second layer and above. If you want to send a packet including only the third layer and above, use <code>send</code> instead.</p>
<h1 id="heading-recap">Recap</h1>
<p>In this post you got to know an awesome tool called Scapy. You saw how you can sniff, how to filter packets, and how to run a function on sniffed packets. You also learned how to create and send frames.</p>
<h2 id="heading-about-the-author">About the Author</h2>
<p><a target="_blank" href="https://www.linkedin.com/in/omer-rosenbaum-034a08b9/">Omer Rosenbaum</a> is <a target="_blank" href="https://swimm.io/">Swimm</a>’s Chief Technology Officer. He's the author of the Brief <a target="_blank" href="https://youtube.com/@BriefVid">YouTube Channel</a>. He's also a cyber training expert and founder of Checkpoint Security Academy. He's the author of <a target="_blank" href="https://data.cyber.org.il/networks/networks.pdf">Computer Networks (in Hebrew)</a>. You can find him on <a target="_blank" href="https://twitter.com/Omer_Ros">Twitter</a>.</p>
<h2 id="heading-additional-resources"><strong>Additional Resources</strong></h2>
<ul>
<li><a target="_blank" href="https://www.youtube.com/playlist?list=PL9lx0DXCC4BMS7dB7vsrKI5wzFyVIk2Kg">Computer Networks Playlist - on my Brief channel</a></li>
<li><a target="_blank" href="https://scapy.readthedocs.io/en/latest/">Official Scapy documentation</a></li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Network Devices – How Hubs and Switches Work and How to Secure Them ]]>
                </title>
                <description>
                    <![CDATA[ In a previous post I described every bit and byte of the Ethernet protocol. In this post you will learn about two network devices, how they work, and how this knowledge may be used by hackers. How Classic Ethernet Works Before describing the network ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-hub-switch-work-and-how-to-protect-them/</link>
                <guid isPermaLink="false">66c17c3558ee0865d2671b5b</guid>
                
                    <category>
                        <![CDATA[ computer network ]]>
                    </category>
                
                    <category>
                        <![CDATA[ computer networking ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Security ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Omer Rosenbaum ]]>
                </dc:creator>
                <pubDate>Thu, 27 Oct 2022 14:30:00 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/10/Computer-Networks-Hub-Switch.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In <a target="_blank" href="https://www.freecodecamp.org/news/the-complete-guide-to-the-ethernet-protocol/">a previous post</a> I described every bit and byte of the Ethernet protocol. In this post you will learn about two network devices, how they work, and how this knowledge may be used by hackers.</p>
<h1 id="heading-how-classic-ethernet-works">How Classic Ethernet Works</h1>
<p>Before describing the network devices, consider a network without special network devices. That is, a network using classic Ethernet where all computers are attached to a single cable.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-168.png" alt="Image" width="600" height="400" loading="lazy">
_Four devices connected using classic Ethernet (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Youk8eUjkgQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>In this case, if computer A sends a message to another computer, for instance – B, the message is sent over the shared cable, and all devices receive it.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-169.png" alt="Image" width="600" height="400" loading="lazy">
_With classic Ethernet, If A sends a message to B - all devices (except for A) receive this message (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Youk8eUjkgQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Can you think of some problems with this network structure?</p>
<p>First, <strong>overload –</strong> all network frames are received by all computers. Let’s say A wants to send a frame to B. C also sees this frame, and has to realize that it is not destined to his address, and thus discard it. This process takes time and resources. The same process happens at machine D, of course.</p>
<p>Second, <strong>privacy –</strong> if C sees every message sent from A to B and vice versa, this means that the privacy is violated. We would rather have a network where only A and B see the messages sent between them.</p>
<p>Third, <strong>extensibility –</strong> this network is not really extensible. Assume that up to 10 computers can attach to this cable. What happens when you need to add one more computer? You'd have to replace the entire cable. This is expensive and inconvenient. </p>
<p>Well, the person who actually has to replace the cable is probably the I.T. person - you know, the one who makes sure that everything runs well in your network and is rarely noticed until something bad happens (at least when you work in an organization large enough to have I.T. people). </p>
<p>Just to be clear – we LOVE the I.T. person. We want their life to be good, we don’t want them to be running around buying cables all the time.</p>
<p>Fourth, <strong>collisions</strong> – let’s say A wants to send a message to B, and C wants to send a message to D. At the same time, both of them might start their transmission, and the messages will <em>collide</em>. </p>
<p>In this case, we get errors – much like the case where two people start to speak at the same time, and it is impossible to understand either of them.</p>
<p>Fifth, this network structure might lead to <strong>starvation</strong> – let’s say that A is transmitting a frame. If the other stations wish to avoid collisions, they will refrain from sending data. But now, machine A can keep on transmitting forever, thereby taking all the bandwidth to itself and not letting any other station speak. This is called starvation.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-181.png" alt="Image" width="600" height="400" loading="lazy">
_Five major problems with classic Ethernet networks (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Youk8eUjkgQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Well, this doesn’t seem like the best network, does it?</p>
<p>We'll now get to know network devices that help deal with these issues.</p>
<h1 id="heading-how-network-devices-solve-these-problems">How Network Devices Solve These Problems</h1>
<h2 id="heading-what-is-a-hub">What is a Hub?</h2>
<p>One device that solves only the <strong>extensibility</strong> issue is called a <strong>Hub</strong>. A hub is a device with multiple ports that single Ethernet cables are connected to:  </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-182.png" alt="Image" width="600" height="400" loading="lazy">
_An Ethernet hub is a device with multiple ports, each connected to a single Ethernet cable (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Youk8eUjkgQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>So now, instead of having one cable with multiple ports with many computers attached to it, we have instead a single hub, and each computer is connected to it via a single cable. This makes the I.T. person's life much easier.</p>
<p>The hub simply takes the pulse it receives and multiplies it – that is, sends it to all other ports. For example, if A sends a frame to B, the hub will send this frame to B, C and D – all ports except A’s port.</p>
<p>The hub doesn’t understand Ethernet, and doesn’t know anything about MAC addresses. For the hub, all bits are just bits transmitted over the wire, and these bits should get to all other ends.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-183.png" alt="Image" width="600" height="400" loading="lazy">
_A hub simply takes a bitstream and multiplies it to all ports but the source port (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Youk8eUjkgQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Now, if you need to add a new computer to the network, you can simply connect it to the hub. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-199.png" alt="Image" width="600" height="400" loading="lazy">
_To add a new device to the network, we simply connect it to the Hub (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Youk8eUjkgQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>What happens if the hub runs out of ports? No problem, we will connect it to another Hub, like so:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-200.png" alt="Image" width="600" height="400" loading="lazy">
_In case you run out of ports, you can add another Hub (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Youk8eUjkgQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Nice! This is a lot easier to maintain than classic Ethernet.</p>
<p>Yet, at least with classic hubs, all other issues still remain. Since all computers receive the frame sent from A to B, there is no <strong>privacy</strong>, the network is <strong>overloaded</strong>, <strong>collisions</strong> may occur, and the network is prone to <strong>starvation</strong>. </p>
<p>What we really want is a device that, when A sends a frame to B, forwards that frame to B and <strong>only</strong> B. This device is called a <strong>switch</strong>.</p>
<h2 id="heading-what-is-a-switch">What is a Switch?</h2>
<p>If all the stations are connected via a <strong>switch</strong>, and A sends a frame to B, only B receives it. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-201.png" alt="Image" width="600" height="400" loading="lazy">
_With a Switch, if A sends a message to B - only B will receive it (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Youk8eUjkgQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Notice that this means that all issues are indeed solved. The devices won’t be overloaded as every frame will get only to the relevant recipients. There are no privacy issues since, apart from the switch, only A and B see the frame. The network is easily extensible by plugging additional switches if needed.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-202.png" alt="Image" width="600" height="400" loading="lazy">
_Similar to working with Hubs, the network is easily extensible by adding multiple Switches (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Youk8eUjkgQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>The switch can avoid collisions as every connection between a switch and an endpoint is a single <strong>collision domain</strong> – that is, the switch will refrain from sending more than one frame on a single wire at the same time.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-204.png" alt="Image" width="600" height="400" loading="lazy">
_Every connection between the Switch and another device forms an independent collision domain (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Youk8eUjkgQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Similarly, there will be no starvation as B and C can communicate with one another while A is sending data. Even if A keeps sending frames destined to the entire network, that is the broadcast address, the switch can allow messages sent by other hosts to be transferred in between.</p>
<p>But, how can this magical switch operate?</p>
<p>Let’s say we have just bought a brand new switch and plugged it into the network. A sends a frame destined to B. How does the switch know where computer B resides?</p>
<p>One option would be to manually configure the switch. That is, have a table mapping between a MAC address and the relevant port, and have someone manually configure that table.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-205.png" alt="Image" width="600" height="400" loading="lazy">
_The Switch may hold a table mapping MAC addresses to physical ports (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Youk8eUjkgQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>When we say <em>someone</em>, we usually mean the I.T. person. And, well, we LOVE I.T. people. We wouldn’t want to make them do this tedious job every time. </p>
<p>In addition, I don’t know about you, but most people don’t usually have an I.T. person at home for every time they plug a device into their network.</p>
<p>Another option would be to send a special message from the switch to every port, and then the endpoints will reply with their MAC addresses. The major downside here is that we now have to make all devices aware of the switch. We need to change the devices’ behavior so they reply to that special message.</p>
<p>It would be so much better if the switch were just <strong>transparent</strong> – no endpoint would need to know that it’s there, but it would still do the job.</p>
<p>Apparently, this can indeed be achieved!</p>
<p>Consider this network, with a brand new switch that has just been added to the network. The switch stores a table, mapping a MAC address to a physical port. This table is empty.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-206.png" alt="Image" width="600" height="400" loading="lazy">
_When a Switch joins a new network, the table mapping MAC addresses to physical ports is empty (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Youk8eUjkgQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Now, A sends a frame to B.</p>
<p>The switch understands Ethernet, and can look at the Frame’s header and read the <strong>source address</strong>. Since this source address maps to “A”, and since the message has been sent from physical port number 2, the switch adds the mapping of A’s MAC address and port number 2 to its table.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-207.png" alt="Image" width="600" height="400" loading="lazy">
_When machine A sends a frame, the Switch inspects the frame, reads the source address, and maps it with the corresponding physical port (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Youk8eUjkgQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>But what will the switch do with the frame? Well, for now, the switch doesn’t know where B resides, so the switch simply multiplies the frame and sends it to all ports, just like a hub would do. So for now, B, C and D all get the frame.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-208.png" alt="Image" width="600" height="400" loading="lazy">
_Since the Switch's table doesn't include a record for B, a frame destined to B is actually sent to all ports but the source port - the same as a Hub would do (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Youk8eUjkgQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Next, A sends another message to B. The switch looks at it, and already knows that A’s MAC address is plugged to port number 2. It still doesn’t know B, so this frame is sent to all other ports as well.</p>
<p>Now, C sends a frame to A. The switch looks at the <strong>source address</strong>, and adds the mapping between C’s MAC address and port number 5 to its table.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-209.png" alt="Image" width="600" height="400" loading="lazy">
_Upon receiving a frame from C, the Switch parses its header, extracts the source address, and associates it with the corresponding physical port - port number 5 (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Youk8eUjkgQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>This time, since the frame is destined to A’s MAC address, and since the switch knows that address – the frame can be forwarded to port number 2, and port number 2 only. Yay! 👏🏻👏🏻👏🏻</p>
<p>Now, B sends a message to C. The switch creates a mapping between port number 7 and B’s MAC address, which appears at the <strong>source address</strong> field.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-210.png" alt="Image" width="600" height="400" loading="lazy">
_The Switch keeps on learning the addresses gradually, filling in its internal mappings (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Youk8eUjkgQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>The switch can also forward the message to C, as it already knows C's address.</p>
<p>So, in general, the switch uses the <strong>source address</strong> field of Ethernet frames to dynamically learn what addresses reside behind every port.</p>
<p>Now, a question for you: Is it possible for two different addresses to map to a single port? For example, to have the address of computer A map to port number 3, and also have the address of computer B map to port number 3? 🤔</p>
<p>Well, the answer is yes. Consider the following network:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-211.png" alt="Image" width="600" height="400" loading="lazy">
_A network diagram with five endpoints and three Switches (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Youk8eUjkgQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Now, given that the switches know the network, when A sends a message to D, it will be sent to Switch 1, and then to Switch 2, and finally forwarded by Switch 2 to D. When Switch 2 sees the frame, what address does it see in the <strong>source address</strong> field?</p>
<p>The address of computer A, of course. Notice that switches are transparent, and never modify the MAC addresses. So Switch 2 learns that the MAC address of computer A resides behind port number 3. </p>
<p>Next, when computer B sends a frame to computer C, this message will also be transferred via switch 1 and then switch 2. So now, switch 2 learns that the MAC address of computer B resides behind port number 3 as well. So, in this case, both the MAC address of A and that of B reside behind port number 3. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-213.png" alt="Image" width="600" height="400" loading="lazy">
_Given this network diagram, switch 2 registers both the MAC address of A as well as that of B - with port number 3 (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Youk8eUjkgQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>NOTE that a switch is <strong>not</strong> an additional <em>hop</em>! We are not talking about routing here. As we’ve said earlier, a switch is a <strong>transparent</strong> device. From the endpoints’ perspective, there is no switch – A “feels” as if it were directly connected to B, C and D.</p>
<p>All devices that are connected via one <strong>hop</strong> are said to be in the same <strong>network segment</strong>. So here, all computers and switches – A, B, C, D, switch 1 and switch 2 – all reside within the same segment.</p>
<p>In the resources section below, I’ve added a link to an exercise about hubs and switches. You are welcome to solve it in order to make sure everything is clear. If you have any questions, feel free to reach out 😊</p>
<h2 id="heading-interim-summary">Interim Summary</h2>
<p>So far you learned about two network devices. First, a hub, which is basically a first layer device. That is, it only transmits bits from one port to other ports, without understanding any protocols. </p>
<p>Second, you got to know a second layer network device, namely a switch, which already "understands" the Ethernet protocol and MAC addresses. It uses that knowledge in order to transfer frames only to relevant ports, at least once it knows the network.</p>
<h1 id="heading-security-twist">Security Twist 😈</h1>
<p>Now that you understand how hubs and switches work under the hood, it's time to consider their security implications.</p>
<p>Assume that I am connected to a certain Ethernet segment, and you run on computer A. B sends a message to C. Is it possible for you to see that message?</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-214.png" alt="Image" width="600" height="400" loading="lazy">
_Four PCs, B is sending a frame to C (Source: <a target="_blank" href="https://www.youtube.com/watch?v=YVcBShtWFmo&amp;t=3s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>In case the computers are connected via a hub, you certainly will see the message, as the hub simply forwards the frame to all ports (except for the source port) regardless of the destination address.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-215.png" alt="Image" width="600" height="400" loading="lazy">
_A hub would simply multiply the frame and send it to A, C and D (Source: <a target="_blank" href="https://www.youtube.com/watch?v=YVcBShtWFmo&amp;t=3s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Furthermore, if the computers are connected via a switch, but the switch has not yet learned the address of the destination, this message will also be sent to your port – and, in general to all ports other than the source port, just like a hub would act.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-216.png" alt="Image" width="600" height="400" loading="lazy">
_A new switch acts just like a hub until it learns the destination address (Source: <a target="_blank" href="https://www.youtube.com/watch?v=YVcBShtWFmo&amp;t=3s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>So, in these cases, your network card will receive the frames, but will it handle them?</p>
<p>As I covered in <a target="_blank" href="https://www.freecodecamp.org/news/the-complete-guide-to-the-ethernet-protocol/">a previous tutorial</a>, the first field of an Ethernet frame is the destination address. By default, the network card will discard frames that are not destined to its address, or to a group which its system belongs to, such as the broadcast address. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-217.png" alt="Image" width="600" height="400" loading="lazy">
_Ethernet frame structure - the devices first consider the destination address (Source: <a target="_blank" href="https://www.youtube.com/watch?v=YVcBShtWFmo&amp;t=3s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>So, by default, if your network card happens to receive a frame that was not destined to it, the frame will be discarded. This is exactly where <strong>promiscuous mode</strong> comes in handy. When the network card is in promiscuous mode, it will not discard frames based on their destination MAC addresses.</p>
<p>Now, consider a network with a switch, and that switch has already learned all addresses of the network, thereby achieving privacy.</p>
<p>Let’s say that a malicious person works from computer C, and wants to see the communication being sent to computer B, even though the switch forwards those frames to B only.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-218.png" alt="Image" width="600" height="400" loading="lazy">
_A network with a switch that has already learned the MAC addresses and their corresponding ports. Can a malicious person see private communication? (Source: <a target="_blank" href="https://www.youtube.com/watch?v=YVcBShtWFmo&amp;t=3s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Can the malicious person do something in order to steal the data?</p>
<p>Well, the malicious person can pretend that they have B’s address. That is, the malicious person will send a frame with the source address of B. It doesn’t really matter what the destination address of that frame would be.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-219.png" alt="Image" width="600" height="400" loading="lazy">
_The malicious person sends a frame and impersonates B by specifying B's MAC address as the source address of the frame (Source: <a target="_blank" href="https://www.youtube.com/watch?v=YVcBShtWFmo&amp;t=3s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Now, the switch sees a frame being sent from B’s address and from C’s port, in our diagram, port 5, and changes the mapping of B’s address to port 5. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-220.png" alt="Image" width="600" height="400" loading="lazy">
_As a result, the Switch changes the port associated with B's address (Source: <a target="_blank" href="https://www.youtube.com/watch?v=YVcBShtWFmo&amp;t=3s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>As I mentioned earlier, it is indeed possible to have two different MAC addresses map to the same port number (for instance in case of an additional switch that connects the devices that have these addresses). But it is not possible to have B’s address mapped to two different ports.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-221.png" alt="Image" width="600" height="400" loading="lazy">
_As far as the Switch is concerned, B and C may indeed both be attached to it via port 5, perhaps through another Switch (Source: <a target="_blank" href="https://www.youtube.com/watch?v=YVcBShtWFmo&amp;t=3s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Now, if A sends a message to B, it will actually get to C, but not to B! 😨</p>
<p>This technique is called <strong>MAC SPOOFING</strong>. The malicious entity is said to <strong>spoof</strong> B’s MAC address.</p>
<p>Is this technique very useful for the attacker? 🤔</p>
<p>Well, not really. Once B sends <em>any</em> frame at all to the network, the switch will replace the entry for B’s MAC address to that of the correct port number. So, for the attacker to keep receiving data, they will have to keep sending more frames on B’s behalf, thereby causing the switch to rewrite the table entry again and again.</p>
<p>This way, C will send a frame using B’s address, and the switch will map B’s MAC address to C’s port. Then, B will send a frame, and the switch will map B’s MAC address to B’s port again.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-223.png" alt="Image" width="600" height="400" loading="lazy">
_Once B send any frame, the Switch will overwrite its entry and the original value will be restored (Source: <a target="_blank" href="https://www.youtube.com/watch?v=YVcBShtWFmo&amp;t=3s&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Hence, B will receive some of the traffic, and this attack is easily noticeable.</p>
<p>There are many ways to defend a switch from such attacks. One would be to set the port with a maximum number of MAC addresses that are attached to it. For instance, if no other switch is supposed to be connected to a certain port, the maximum number of linked MAC addresses can be set to one.</p>
<p>How cool is that?! By understanding how a switch operates, we are able to estimate security issues that stem from its way of operation, as well as relevant countermeasures. 🤯</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>In this post you learned about two important network devices, a hub and a switch. </p>
<p>You learned that a hub simply multiplies the bitstream it receives to all ports other than the port that received the bitstream, whereas a switch forwards the frame only to the right port (once it has learned the network). You also learned how switches are able to achieve this ability automatically. </p>
<p>Lastly, you learned about a security problem that arises from the way switches operate, and how it may be mitigated.</p>
<h2 id="heading-about-the-author">About the Author</h2>
<p><a target="_blank" href="https://www.linkedin.com/in/omer-rosenbaum-034a08b9/">Omer Rosenbaum</a> is <a target="_blank" href="https://swimm.io/">Swimm</a>’s Chief Technology Officer. He's the author of the Brief <a target="_blank" href="https://youtube.com/@BriefVid">YouTube Channel</a>. He's also a cyber training expert and founder of Checkpoint Security Academy. He's the author of <a target="_blank" href="https://data.cyber.org.il/networks/networks.pdf">Computer Networks (in Hebrew)</a>. You can find him on <a target="_blank" href="https://twitter.com/Omer_Ros">Twitter</a>.</p>
<h2 id="heading-additional-resources">Additional Resources</h2>
<ul>
<li><a target="_blank" href="https://www.youtube.com/playlist?list=PL9lx0DXCC4BMS7dB7vsrKI5wzFyVIk2Kg">Computer Networks Playlist - on my Brief channel</a></li>
<li><a target="_blank" href="https://drive.google.com/file/d/1WeHTbRNph7mevNLwGeIkys1aP6_Z-Fbk/view">A DIY exercise about Hubs and Switches</a></li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How the Ethernet Protocol Works – A Complete Guide ]]>
                </title>
                <description>
                    <![CDATA[ Whether you’ve been aware of it or not, you’ve probably used the Ethernet in the past. Does this cable look familiar? _(Source: Wikipedia)_ Ethernet is extremely popular, and is the most widely used Data Link Layer protocol, at least where the devic... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/the-complete-guide-to-the-ethernet-protocol/</link>
                <guid isPermaLink="false">66c17c46c711c748ec71e873</guid>
                
                    <category>
                        <![CDATA[ computer networking ]]>
                    </category>
                
                    <category>
                        <![CDATA[ handbook ]]>
                    </category>
                
                    <category>
                        <![CDATA[ internet ]]>
                    </category>
                
                    <category>
                        <![CDATA[ networking ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Omer Rosenbaum ]]>
                </dc:creator>
                <pubDate>Fri, 21 Oct 2022 17:12:40 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/07/The-Ethernet-Protocol-Book-Cover--1-.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Whether you’ve been aware of it or not, you’ve probably used the Ethernet in the past. Does this cable look familiar?</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-94.png" alt="Image" width="600" height="400" loading="lazy">
_(Source: <a target="_blank" href="https://en.wikipedia.org/wiki/Ethernet_physical_layer#/media/File:EthernetCableYellow3.jpg">Wikipedia</a>)_</p>
<p>Ethernet is extremely popular, and is the most widely used Data Link Layer protocol, at least where the devices are linked by physical cables (rather than wireless). </p>
<p>If you need a reminder about the Data Link Layer and its role within the Layers Model, check out <a target="_blank" href="https://www.freecodecamp.org/news/the-five-layers-model-explained/">my previous post</a>.</p>
<p>In this tutorial, you will learn everything about Ethernet – its history, as well as every bit and byte of the Ethernet frame. You will also get to know how protocols are formed, why it is so hard to change them after they are published, and what lessons can be learned for other protocols.</p>
<h2 id="heading-heres-what-well-cover">Here's what we'll cover:</h2>
<ol>
<li><a class="post-section-overview" href="#heading-some-ethernet-history">Some Ethernet History</a></li>
<li><a class="post-section-overview" href="#heading-ethernet-frame-overview">Ethernet Frame Overview</a><br>– <a class="post-section-overview" href="#heading-before-the-frame-preamble-8-bytes">Before the frame – preamble (8 bytes)</a><br>– <a class="post-section-overview" href="#heading-destination-address-and-source-address-6-bytes-each">Destination Address and Source Address (6 bytes each)</a><br>– <a class="post-section-overview" href="#heading-type-length-field-ethernet-ii-type-2-bytes">Type / Length field – Ethernet II (Type) (2 bytes)</a><br>– <a class="post-section-overview" href="#heading-data-and-pad-46-1500-bytes">Data and Pad (46-1500 bytes)</a><br>– <a class="post-section-overview" href="#heading-checksum-crc32-4-bytes">Checksum – CRC32 (4 bytes)</a><br>– <a class="post-section-overview" href="#heading-the-problem-with-the-type-length-field">The Problem with the Type / Length Field</a></li>
<li><a class="post-section-overview" href="#heading-how-ethernet-addresses-work">How Ethernet Addresses Work</a><br>– <a class="post-section-overview" href="#heading-unicast-and-multicast-bits">Unicast and Multicast Bits</a><br>– <a class="post-section-overview" href="#heading-globally-unique-locally-administered-bit">Globally Unique / Locally Administered Bit</a></li>
<li><a class="post-section-overview" href="#heading-why-does-an-ethernet-frame-have-a-minimum-length">Why Does an Ethernet Frame Have a Minimum Length?</a><br>– <a class="post-section-overview" href="#heading-how-are-collisions-handled-in-ethernet">How are Collisions Handled in Ethernet?</a></li>
<li><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></li>
</ol>
<h1 id="heading-some-ethernet-history">Some Ethernet History</h1>
<p>The first version of Ethernet was implemented in 1976. In 1978 a second version was published by DEC, Intel, and Xerox who worked together to publish <strong>DIX</strong> (which stands for DEC, Intel and Xerox). This was also called "Ethernet II". </p>
<p>In 1983, with a change that we will discuss soon, a new Ethernet version was released – the IEEE 802.3 standard, by the IEEE standards association.</p>
<p>Both Ethernet II and IEEE 802.3 are widely used, so we will cover them both. As you will see, they are almost identical. Usually, both are simply referred to as “Ethernet”. </p>
<p>For this tutorial, in order to be precise about what we mean, I will explicitly state whether I'm talking about Ethernet II or IEEE 802.3.  </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-95.png" alt="Image" width="600" height="400" loading="lazy">
_The versions of Ethernet (Source: <a target="_blank" href="https://www.youtube.com/watch?v=SoTRqDLND6Y&amp;ab_channel=Brief">Brief</a>)_</p>
<h1 id="heading-ethernet-frame-overview">Ethernet Frame Overview</h1>
<p>Let's consider the Ethernet Frame format:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-98.png" alt="Image" width="600" height="400" loading="lazy">
_Ethernet Frame Header and Trailer (Source: <a target="_blank" href="https://www.youtube.com/watch?v=SoTRqDLND6Y&amp;ab_channel=Brief">Brief</a>)_</p>
<h2 id="heading-before-the-frame-preamble-8-bytes">Before the Frame – Preamble (8 bytes)</h2>
<p>First comes a <strong>Preamble</strong> consisting of 8 bytes, each containing the bit pattern of alternating <code>1</code>s and <code>0</code>s, that is, <code>10101010</code>. </p>
<p>In Ethernet II, all 8 bytes had this pattern. In 802.3, the seven first bytes carry the value <code>10101010</code>, yet the last bit of the last byte is set to <code>1</code>, so the byte carries the value of <code>1010101**1**</code>. </p>
<p>This last byte is called the <strong>Start of Frame</strong>. The last two <code>1</code> bits tell the receiver that the rest of the frame is about to start. </p>
<p>Sending this bit pattern before a new frame allows devices on the network to easily synchronize their receiver clocks. Note that the preamble is not really a part of the actual frame – it only precedes every frame, and thus you won't see it on many diagrams of the Ethernet protocol.  </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-99.png" alt="Image" width="600" height="400" loading="lazy">
_Ethernet Preamble (Source: <a target="_blank" href="https://www.youtube.com/watch?v=SoTRqDLND6Y&amp;ab_channel=Brief">Brief</a>)_</p>
<h2 id="heading-destination-address-and-source-address-6-bytes-each">Destination Address and Source Address (6 bytes each)</h2>
<p>Next, we have two addresses, each consisting of <strong>6</strong> bytes. I'll describe Ethernet Addresses in more detail later on in this post, but for now, let's notice that a frame starts with a <strong>destination</strong> address, followed by the <strong>source</strong> address. </p>
<p>Why would the frame start with the destination address? Is there a reason for that?</p>
<p>Well, there is. The very first thing a device is likely to do with a frame it has received is to check whether this frame is destined to it, or not. If the frame is not destined to this device, it can be simply dropped. Therefore, the destination address comes in first.</p>
<p>Why is the source address important? Well, to know to whom the receiver should send a reply, if necessary. This source address also plays a role in the way some network devices are implemented, as we will see in future posts.</p>
<h2 id="heading-type-length-field-ethernet-ii-type-2-bytes">Type / Length field – Ethernet II (Type) (2 bytes)</h2>
<p>Next comes a quite problematic field, called the <strong>Type</strong> or <strong>Length</strong> field.</p>
<p>In Ethernet II, this field is called <strong>Type</strong>, and tells the receiver what payload this frame carries. </p>
<p>For instance, if this frame carries an IP layer (that is, the <em>data</em> of the Ethernet layer is an IP packet), then the receiving network card should forward the frame’s payload to the IP handler. If the frame’s payload is ARP, then the ARP handler should deal with it. </p>
<p>By <strong>handler</strong> I mean the code that handles this protocol, for instance the code that parses ARP.</p>
<p>We will come back to the need for Length and how it is dealt within IEEE 802.3 shortly.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-100.png" alt="Image" width="600" height="400" loading="lazy">
_In Ethernet II, the Type field carries the type of the payload (Source: <a target="_blank" href="https://www.youtube.com/watch?v=SoTRqDLND6Y&amp;ab_channel=Brief">Brief</a>)_</p>
<h2 id="heading-data-and-pad-46-1500-bytes">Data and Pad (46-1500 bytes)</h2>
<p>After this field, we get up to 1500 bytes of <strong>Data</strong>. This number was chosen because RAM was expensive back in 1978, and a receiver would have needed more RAM if the frame had been bigger.</p>
<p>This means that if the third layer wants to send more than 1500 bytes of data over Ethernet, it must be sent across multiple frames.</p>
<p>There is also a minimum length of data, which is 46 bytes. Together with the other fields of the frame, the minimum length of an Ethernet frame is 64 bytes in total.</p>
<p>Why would we need a minimum frame length? We will discuss this in a subsequent section.</p>
<p>For now, given that we have a minimum length for an Ethernet frame, what happens if the sender wants to send a very short message, let’s say just one byte? </p>
<p>In that case, the sender has to <strong>pad</strong> the message, for instance with <code>0</code>s until reaching the minimum length. For example, if the sender wants to send only 1 byte of data, such as the letter <code>A</code>, they will have to add 45 bytes of <code>0</code>s.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-101.png" alt="Image" width="600" height="400" loading="lazy">
_46-1500 bytes of data, with padding if needed (Source: <a target="_blank" href="https://www.youtube.com/watch?v=SoTRqDLND6Y&amp;ab_channel=Brief">Brief</a>)_</p>
<h2 id="heading-checksum-crc32-4-bytes">Checksum – CRC32 (4 bytes)</h2>
<p>Last but not least, we have a <strong>Checksum</strong>. This is a <a target="_blank" href="https://en.wikipedia.org/wiki/Cyclic_redundancy_check">32-bit CRC checksum</a>, used to determine whether the bits of the frame have been received correctly. In case of an error, the frame is dropped. </p>
<p>The CRC is computed on <strong>the entire frame</strong> – that is, including the header. Notice that it doesn’t include the preamble, as it is not really a part of the frame.</p>
<p>When we use CRC-32 for the checksum, we set a fixed overhead of 32 bits, or 4 bytes, regardless of the length of the data. In other words, if we send only 1 byte of data, we get a 32-bit checksum, and if we send a thousand bytes of data – we still get 32-bits of checksum.</p>
<h2 id="heading-the-problem-with-the-type-length-field">The Problem with the Type / Length Field</h2>
<p>Earlier, we mentioned that the <strong>Data</strong> field has to be at least 46 bytes long, and if not, we pad it. For simplicity’s sake, let’s assume we pad with <code>0</code>s, as the standard indicates. </p>
<p>Well, we actually have a problem here.</p>
<p>Let’s say the sender wants to send a single byte, consisting of the character <code>A</code>. So they will send an <code>A</code> followed by 45 <code>0</code>s.</p>
<p>What happens in case the sender wants to send <code>A</code> and zero? That is, the data actually consists of <code>A0</code>. In this case, they would also send an <code>A</code>, followed by 45 <code>0</code>s. But this time, the first zero is actually part of the data, and not the padding.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-102.png" alt="Image" width="600" height="400" loading="lazy">
_Whether the sender would like to send <code>A</code> as data or <code>A0</code> as the data, due to padding the frame consists of <code>A</code> and 45 <code>0</code>s (Source: <a target="_blank" href="https://www.youtube.com/watch?v=SoTRqDLND6Y&amp;ab_channel=Brief">Brief</a>)_</p>
<p>As a receiver, you'd need a way to differentiate these cases, and understand which bytes belong to the padding, and which bytes belong to the data, in case of a short frame.</p>
<p>Ethernet II dealt with this problem by… Well, not handling it. That is, the third layer will receive the data and the padding, which would be an <code>A</code> followed by 45 <code>0</code>s in this example. It will then have to figure out on its own which bytes belong to the data and which don’t. </p>
<p>This is doable, of course, if the third layer includes a length field. However, this solution is far from elegant – why would the third layer deal with a padding problem that should be dealt with by the second layer? </p>
<p>This is a clear violation of our layers model (if you would like to see an overview about the Layers Model, refer to <a target="_blank" href="https://www.freecodecamp.org/news/the-five-layers-model-explained/">this tutorial</a>).</p>
<p>For this reason, IEEE decided to change the <strong>Type</strong> field into a <strong>Length</strong> field in IEEE 802.3 . So, for example, a frame carrying a single byte of data, <code>A</code>, will have the Length field set to <code>1</code>, whereas a fame carrying two bytes of data, <code>A0</code>, will have the Length field set to <code>2</code>. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-103.png" alt="Image" width="600" height="400" loading="lazy">
_In 802.3, the Length field sets the data apart from the padding (Source: <a target="_blank" href="https://www.youtube.com/watch?v=SoTRqDLND6Y&amp;ab_channel=Brief">Brief</a>)_</p>
<p>This is an elegant solution, but now two issues arise:</p>
<p>First, if you receive an Ethernet frame, how do you know if it’s an Ethernet II frame, where this field means Type, or an IEEE 802.3 frame, where this field means length?</p>
<p>Second, what happens with the Type field? How would the receiver know what protocol is carried inside the frame?</p>
<p>Let's start with the first question. Just to clarify, by the time IEEE 802.3 was published, many Ethernet cards had already been in use. People didn’t want to replace their network cards just because a new standard was published. </p>
<p>Think about it, would you want to buy a new network card? Or perhaps your friends who are not programmers – would they get a new card as someone told them that "the internet geeks" decided that there was "a new standard" (whatever that means?).</p>
<p>The solution was to allow both Ethernet II and IEEE 802.3 to operate on the same network.</p>
<p>Fortunately, all the <strong>Type</strong> values used at that time had greater values than <code>1500</code>. The solution is thus straightforward: in case this field has a value less than or equal to <code>1500</code>, it actually means Length. In case it has a value greater than or equal to <code>1536</code>, it means Type. The values in between currently have no meaning.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-104.png" alt="Image" width="600" height="400" loading="lazy">
_The Type/Length field is divided: values equal to or lower than 1500 are Length values, and values equal to or greater than 1536 are Type values. (Source: <a target="_blank" href="https://www.youtube.com/watch?v=SoTRqDLND6Y&amp;ab_channel=Brief">Brief</a>)_</p>
<p>For example, if we see a frame where the value of this field is <code>400</code>, it is clear that we have an IEEE 802.3 frame, which is <code>400</code> bytes long.</p>
<p>Now you try: in case we see a frame where this field is set to <code>20</code>, is it an Ethernet II frame or IEEE 802.3 frame? </p>
<p>Indeed, this is an IEEE 802.3 frame, which has <code>20</code> bytes of data, and thus <code>26</code> bytes of padding. And… in case we see a frame where this field is set to <code>2000</code>? </p>
<p>In this case we know that this is an Ethernet II frame, and <code>2000</code> is the Type.</p>
<p>So this is how we know whether we are dealing with an Ethernet 2 or an IEEE 802.3 frame.</p>
<p>Next, how does an IEEE 802.3 frame contain the Type information? That is, given that IEEE 802.3 overrode the Type field, there was no way for the receiver to figure out what to do with an incoming frame. Thus, IEEE 802.3 adds another header of the <a target="_blank" href="https://en.wikipedia.org/wiki/IEEE_802.2">802.2 LLC (Logical Link Control) protocol</a> right before the data. This header conveys the type information.</p>
<p>So an IEEE 802.3 frame will have a destination address field, then a source field, then a length field, and then an LLC header, followed by the data and the checksum.</p>
<h3 id="heading-wait-wasnt-ieee-8023-published-in-1983-why-is-it-relevant">Wait, wasn't IEEE 802.3 published in 1983? Why is it relevant? 🤔</h3>
<p>As mentioned beforehand, in 1978, Ethernet II was published. Not so long later, in 1983, a new format came out – and its authors allowed for backward compatibility, probably believing that in a few years, all devices would be upgraded to the new standard.</p>
<p>Oh, were they wrong.</p>
<p>If you check your own network (given that you are connected to an Ethernet one), I bet you will see Ethernet II frames. </p>
<p>Your device probably supports both versions, but by default it will transmit Ethernet II frames, rather than 802.3. After all, it is guaranteed that any device connected to an Ethernet network can read Ethernet II frames, and it's not guaranteed that the device can read 802.3 ones. If Ethernet II works, why not use it? </p>
<p>All third-layer protocols had to account for the fact that Ethernet doesn't solve the problem of differentiating data from padding. So if all protocols already deal with that, why don't we just...keep things the way they are?</p>
<p>Endpoint devices (such as personal computers) almost always communicate over Ethernet II. IEEE 802.3 is also very common, though, and it's used by default on most modern network devices (such as switches).</p>
<p>This story actually entails a really important lesson.</p>
<p>It is very, very hard to replace protocols after the fact, especially when they are implemented on hardware devices (such as network cards).</p>
<h3 id="heading-whats-an-interpacket-gap">What's an Interpacket Gap?</h3>
<p>After an Ethernet frame is sent, transmitters wait a very short period of time before transmitting the next frame, in order to allow the receiver to know that the transmission of a frame is over. This idle time between frames is called the “Interpacket gap”.</p>
<h1 id="heading-how-ethernet-addresses-work">How Ethernet Addresses Work</h1>
<p>Every Ethernet frame carries two addresses – first, the destination, and second, the source. We mentioned that the destination address appears first so the receiver will be able to tell whether the frame is relevant for it. If not, the frame will be discarded.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-105.png" alt="Image" width="600" height="400" loading="lazy">
_Ethernet addresses within the Ethernet Frame (Source: <a target="_blank" href="https://www.youtube.com/watch?v=sGZzU4U39Bw&amp;ab_channel=Brief">Brief</a>)_</p>
<p>What does an Ethernet address look like?</p>
<p>An Ethernet address consists of 6 bytes – that is, 48 bits. Usually, they are presented in hexadecimal base, delimited either by dashes or colons, as you can see in these examples:   </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-106.png" alt="Image" width="600" height="400" loading="lazy">
_Two representations of the same Ethernet Address (Source: <a target="_blank" href="https://www.youtube.com/watch?v=sGZzU4U39Bw&amp;ab_channel=Brief">Brief</a>)_</p>
<pre><code><span class="hljs-number">00</span>:<span class="hljs-number">01</span>:<span class="hljs-number">42</span>:a9:c2:dd
<span class="hljs-number">00</span><span class="hljs-number">-01</span><span class="hljs-number">-42</span>-a9-c2-dd
</code></pre><p>These are two representations of the exact same Ethernet address, and there is no real difference between the two.</p>
<p>In general, Ethernet addresses are supposed to be globally unique. That is, no two Ethernet devices share the same address (at least, in theory). </p>
<p>The first 3 bytes of any address is called the <strong>OUI</strong> – Organizationally Unique Identifier. To make sure the addresses are unique, IEEE assigns these OUIs to various manufacturers, such as Dell, HP or IBM. </p>
<p>This part of the address is also called the <strong>Vendor ID</strong> (with the exception of the two least significant bits, as we will see). Then, the manufacturers assign the remaining 3 bytes to specific hosts. This part is also called the <strong>Host ID</strong>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-107.png" alt="Image" width="600" height="400" loading="lazy">
_The most significant 3 bytes are the Vendor ID, and the least significant 3 bytes are the Host ID (Source: <a target="_blank" href="https://www.youtube.com/watch?v=sGZzU4U39Bw&amp;ab_channel=Brief">Brief</a>)_</p>
<p>For example, the OUI <code>00:01:42</code> belongs to Cisco. Now, Cisco can manufacture a network card and assign it the address <code>00:01:42:00:00:01</code>. Next, it can manufacture another card and assign it the address <code>00:01:42:00:00:02</code>, and so on. These two addresses share the same <strong>Vendor ID</strong>, but have different <strong>Host IDs</strong>.</p>
<p>Since a single OUI leaves 3 bytes to be used for the host IDs, we have <code>2^24</code> host IDs per OUI – that is, 16,777,216 host IDs. Of course, big manufacturers need many more addresses, and thus they are assigned additional OUIs. For example, <code>00:01:64</code> is another OUI that belongs to Cisco.</p>
<h2 id="heading-unicast-and-multicast-bits">Unicast and Multicast Bits</h2>
<p>Ethernet addresses also consist of two special bits.</p>
<p>The first special bit indicates whether the address is a unicast or a multicast address. Unicast means that the address represents a single device. Multicast addresses represent a group of devices – such as all printers on the network, or all devices in the same local network. </p>
<p>The bit representing whether the address is unicast or multicast is the least significant bit within the most significant byte. Wait, what?</p>
<p>Consider the following Ethernet address:</p>
<p><code>06:b2:d9:a2:32:9e</code></p>
<p>The most significant byte is <code>06</code>.</p>
<p>Let’s convert this to binary:</p>
<p><code>00000110</code></p>
<p>Now we look at the least significant bit – that is, this <code>0</code>:  </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-108.png" alt="Image" width="600" height="400" loading="lazy">
_When the least significant bit within the most significant byte is set to <code>0</code>, this is a Unicast address (Source: <a target="_blank" href="https://www.youtube.com/watch?v=sGZzU4U39Bw&amp;ab_channel=Brief">Brief</a>)_</p>
<p>This bit is off. This means that this is a <strong>unicast</strong> address. In other words, it belongs to a single device, such as a computer’s network card.</p>
<p>Let’s consider another address:</p>
<p><code>11:c0:ff:ee:d8:ab</code></p>
<p>The most significant byte is <code>11</code> (in hexadecimal base).</p>
<p>Let’s convert this to binary:</p>
<p><code>00010001</code></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-109.png" alt="Image" width="600" height="400" loading="lazy">
_When the least significant bit within the most significant byte is set to <code>1</code>, this is a Multicast address (Source: <a target="_blank" href="https://www.youtube.com/watch?v=sGZzU4U39Bw&amp;ab_channel=Brief">Brief</a>)_</p>
<p>The least significant bit is this one. Since it is on, we can tell that this is a <strong>multicast</strong> address. That is, it’s an address of a group. You can send a frame to this address, and all devices that belong to this group will consider the frame as sent to them.</p>
<p>One very famous multicast address is called the <strong>broadcast</strong> address, that is – the group that contains all machines. The address of this group is:</p>
<p><code>FF:FF:FF:FF:FF:FF</code><br>In other words, the address where all bits are on.</p>
<p><strong>All</strong> the machines are part of the broadcast group.</p>
<h3 id="heading-globally-unique-locally-administered-bit">Globally Unique / Locally Administered Bit</h3>
<p>The second special bit indicates whether the address is indeed globally unique. This bit is the second least significant bit within the most significant byte. Um, what?</p>
<p>Well, again, consider the first address from before:</p>
<p><code>06:b2:d9:a2:32:9e</code></p>
<p>The first byte is <code>06</code>.</p>
<p>Converted to binary, we get:</p>
<p><code>00000110</code></p>
<p>So the second least significant bit is the one right here:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-110.png" alt="Image" width="600" height="400" loading="lazy">
_When the second least significant bit within the most significant byte is set to <code>1</code>, this address is <strong>not</strong> globally unique (Source: <a target="_blank" href="https://www.youtube.com/watch?v=sGZzU4U39Bw&amp;ab_channel=Brief">Brief</a>)_</p>
<p>This bit is on, and thus we know that this address is actually <strong>not</strong> globally unique. IEEE will never assign this address to any vendor. So what is this address? Well, in this case it’s just one that I’ve made up. If I wanted to, I could assign it to a specific device. The fact that this bit is on declares that it is not globally unique.</p>
<p>Consider another address:</p>
<p><code>00:01:42:a9:c2:dd</code></p>
<p>The first byte is <code>00</code>, so the second least significant bit is <code>0</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-111.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>This is indeed a globally unique address, assigned to Cisco.</p>
<h3 id="heading-ethernet-addresses-recap">Ethernet Addresses – Recap</h3>
<p>So, all in all, an Ethernet address has two main parts: The vendor ID, and the host ID.</p>
<p>There are also two special bits: the least significant bit within the most significant byte states whether the address is unicast or multicast. The second least significant bit within the most significant byte states whether the address is globally unique.</p>
<h1 id="heading-why-does-an-ethernet-frame-have-a-minimum-length">Why Does an Ethernet Frame Have a Minimum Length?</h1>
<p>This is more of a "bonus" part of this post, and concerns collisions. Collisions is a very interesting topic, but since this post focuses on the Ethernet protocol, collisions will not be of our focus. I will therefore address this issue just briefly. While it's not crucial to understand in order to understand Ethernet frames, I promised a <em>complete</em> overview of the Ethernet protocol. </p>
<p>In the overview, I mentioned that an Ethernet frame consists of minimum 46 bytes of data and maximum 1500 of data. I already explained why we have that maximum limit, but what about the minimum?</p>
<p>To simplify our discussion, let's consider a network using classic Ethernet where all computers are attached to a single cable.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-112.png" alt="Image" width="600" height="400" loading="lazy">
_A "classic Ethernet" network with four devices connected via a single cable (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ECl8DnWeVD4&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Let’s say A wants to send a message to B, and C wants to send a message to D. Let’s say that while A is transmitting its frame, C is also transmitting its frame. In this case, the frames will <em>collide</em>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-113.png" alt="Image" width="600" height="400" loading="lazy">
_In case two devices transmit data on the same time - their frames will collide (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ECl8DnWeVD4&amp;ab_channel=Brief">Brief</a>)_</p>
<p>When this happens, we get errors – much like the case where two people start to speak in the same time, and it is impossible to understand either of them. </p>
<h2 id="heading-how-are-collisions-handled-in-ethernet">How are collisions handled in Ethernet?</h2>
<p>Ethernet uses two main mechanisms to deal with collisions. The first is called <strong>CSMA</strong>, which stands for <strong>Carrier Sense Multiple Access</strong>. This basically means that when a station wants to transmit data, it first senses the channel to see if anyone else is transmitting by checking the signal level of the line. If the channel is in use, the station will wait and try again.</p>
<p>So, if A is transmitting, and C wants to send data, C will wait until A finishes its transmission before starting to transmit.</p>
<p>This is just like the case in a human conversation, where one person waits until the other stops talking, and only then does that person talk.</p>
<p>Yet, just like the case where two people might start talking at the same time, two Ethernet machines might start transmitting data at the same time. In this case, <strong>CD</strong> – <strong>Collision Detection</strong> – comes into play. Collision Detection means that the transmitting devices detect the fact that a collision has occurred. This is achieved by listening to the channel while transmitting.</p>
<p>For example, assume that station A transmits the bit stream <code>11001010</code>. While transmitting, A is also listening to the channel. If no collision occurred, A would also read the signal <code>11001010</code> from the line.  </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-114.png" alt="Image" width="600" height="400" loading="lazy">
_With <strong>Collision Detection</strong>, A is listening to the channel while transmitting data. In case no collision occurred, A will sense exactly the bitstream it has sent (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ECl8DnWeVD4&amp;ab_channel=Brief">Brief</a>)_</p>
<p>If, however, a collision occurred, say with a frame sent by C, then A would read something different from the line – for instance, <code>11011010</code>. This way, machine A realizes that its frame has collided.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-115.png" alt="Image" width="600" height="400" loading="lazy">
_With <strong>Collision Detection</strong>, A is listening to the channel while transmitting data. In case of a collision, A reads a different bitstream than that it has sent (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ECl8DnWeVD4&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Machine A can realize that a collision has occurred even before it finished transmitting the frame. Then, machine A stops transmitting and issues a JAM signal to tell the other station that a collision has occurred. As a result, both stations stop transmitting and wait a random interval of time before trying to submit again. </p>
<p>The amount of time that the stations wait increases with the number of collisions in the network. So on the first collision, A and C wait for a relatively short amount of time before transmitting again. If another collision occurs, they might wait longer.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-116.png" alt="Image" width="600" height="400" loading="lazy">
_After a collision occurs, the amount of time that the stations wait increases with the number of collisions in the network (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ECl8DnWeVD4&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Now, back to Ethernet. Ethernet requires that valid frames must be at least 64 bytes long, from destination address to checksum, including both. So, that data has to be at least 46 bytes long. If the frame is too short, then it must be padded.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-117.png" alt="Image" width="600" height="400" loading="lazy">
_The minimum length of an Ethernet frame consists of 46 bytes of data, or 64 bytes overall (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ECl8DnWeVD4&amp;ab_channel=Brief">Brief</a>)_</p>
<p>One reason for having this minimum is directly related to the collision detection mechanism stated above.</p>
<p>Let's consider the following scenario. Host A wants to transmit a really really short frame to B, a frame that is only 1 byte long. I am exaggerating of course, this can’t really happen in Ethernet, but it will be helpful for the explanation. </p>
<p>Host A transmits this frame, which consists of 8 <code>1</code>s. Then, A listens to the channel while transmitting, and also reads 8 <code>1</code>s from it, reaching the conclusion that the frame has been transmitted successfully. </p>
<p>However, before the frame reaches the other end of the network, D starts transmitting a very short frame, one byte long, consisting of 8 <code>0</code>s. D listens to the channel while transmitting, and also reads 8 <code>0</code>s from it, concluding that the frame has been transmitted successfully.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-118.png" alt="Image" width="600" height="400" loading="lazy">
_Both A and D send a really short frame, and they finish transmitting without realizing a collision is liable to take place (Source: <a target="_blank" href="https://www.youtube.com/watch?v=ECl8DnWeVD4&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Now, these two really short frames collide. Yet, neither A nor D are aware of this collision, as they have already concluded that the frame has been successfully delivered.</p>
<p>In order to avoid such cases, the frame must be long enough to prevent a station from completing its transmission before the first bit of the frame reaches the far end of the line. Having a minimum length for Ethernet frames solves this issue.</p>
<p>This was a very short discussion of collisions. If you’d like to know more about this topic, refer to the "additional resources" section below.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>In this tutorial, we covered every bit and byte of the Ethernet protocol. You should now have a good understanding of this protocol, as well as a reference to consult when needed. </p>
<h2 id="heading-about-the-author"><strong>About the Author</strong></h2>
<p><a target="_blank" href="https://www.linkedin.com/in/omer-rosenbaum-034a08b9/">Omer Rosenbaum</a> is <a target="_blank" href="https://swimm.io/">Swimm</a>’s Chief Technology Officer. He's the author of the Brief <a target="_blank" href="https://youtube.com/@BriefVid">YouTube Channel</a>. He's also a cyber training expert and founder of Checkpoint Security Academy. He's the author of <a target="_blank" href="https://data.cyber.org.il/networks/networks.pdf">Computer Networks (in Hebrew)</a>. You can find him on <a target="_blank" href="https://twitter.com/Omer_Ros">Twitter</a>.</p>
<h3 id="heading-additional-references"><strong>Additional References</strong></h3>
<ul>
<li><a target="_blank" href="https://www.youtube.com/playlist?list=PL9lx0DXCC4BMS7dB7vsrKI5wzFyVIk2Kg">Computer Networks Playlist - on my Brief channel</a></li>
<li><a target="_blank" href="https://en.wikipedia.org/wiki/Carrier-sense_multiple_access_with_collision_detection">Carrier-sense multiple access with collision detection - Wikipedia</a></li>
<li><a target="_blank" href="https://www.itprc.com/carrier-sense-multiple-access-collision-detect-csmacd-explained/">Carrier Sense Multiple Access Collision Detect (CSMA/CD) Explained - ITPRC</a></li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ What is the Five Layers Model? The Framework of the Internet Explained ]]>
                </title>
                <description>
                    <![CDATA[ Computer Networks are a beautiful, amazing topic. Networks involve so much knowledge from different fields, from physics to algorithms.  When dealing with Computer Networks, there is one framework that puts everything into place – and that is the lay... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/the-five-layers-model-explained/</link>
                <guid isPermaLink="false">66c17c4cc711c748ec71e875</guid>
                
                    <category>
                        <![CDATA[ computer network ]]>
                    </category>
                
                    <category>
                        <![CDATA[ computer networking ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Omer Rosenbaum ]]>
                </dc:creator>
                <pubDate>Mon, 17 Oct 2022 13:37:19 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/10/d.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Computer Networks are a beautiful, amazing topic. Networks involve so much knowledge from different fields, from physics to algorithms. </p>
<p>When dealing with Computer Networks, there is one framework that puts everything into place – and that is the layers model. </p>
<p>In this post you'll learn <em>why</em> we need layers, as well as <em>what</em> the five layers model is. You will also understand the role of each layer in this model. </p>
<h1 id="heading-why-layers">Why Layers?</h1>
<p>Imagine you are given the task to design and implement the Internet! Where do you start? What do we actually want from a network, and an important one such as the Internet? </p>
<p>Well, we actually want quite a lot of things. To name a few:</p>
<ul>
<li>We want it to be <strong>fast</strong> – that is, allow fast communication. We don’t want to wait long for a message to get from one host to another.</li>
<li>It should also be <strong>reliable</strong> – when sending a message, we want the receiver to actually receive it.</li>
<li>The network should be <strong>extendable</strong> – that is, allow more devices to join. We wouldn’t want to start with two computers, and then not bee able to add a third one.</li>
<li>The network should support <strong>different devices and connections</strong> – it should be able to connect a wired PC, wireless laptop, and a cellphone, for example.</li>
</ul>
<p>And this is just a partial list.</p>
<p>So, how do we go about implementing the internet when we want to achieve so many different things?</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-58.png" alt="Image" width="600" height="400" loading="lazy">
<em>Computer Networks are complex (Source: <a target="_blank" href="https://xkcd.com/2259/">XKCD</a>)</em></p>
<p>In order to simplify things and make networks flexible, the communication is divided into <strong>layers</strong>.</p>
<p>Each layer has its own responsibility. It provides services to an upper layer, and uses services provided by a lower layer.</p>
<p>Consider an example network consisting of three devices:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-51.png" alt="Image" width="600" height="400" loading="lazy">
_An example network with three devices (Source: <a target="_blank" href="https://www.youtube.com/watch?v=iHp5J_f_ToQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>We have two layers:</p>
<p><strong>Layer Alpha</strong> is responsible for transmitting data between hosts that are directly connected to each other. In the diagram above, it's between hosts A and B, or between hosts B and C.</p>
<p><strong>Layer Beta</strong> is responsible for transmitting data between distant hosts. In the diagram, it's between hosts A and C.</p>
<p>What did we gain from this division? We gained a lot of <strong>flexibility</strong>.</p>
<p>Each layer can be developed and implemented by different people. The upper layer doesn’t care about the implementation of the lower layer, and vice versa.</p>
<p>For instance, the connection between hosts A and B could be a WiFi connection, while the connection between B and C could consist of a carrier pigeon. These are (completely) different implementations of Layer Alpha. </p>
<p>Notice that this way also enables us to have different specializations and expertise – an expert in training carrier pigeons does not necessarily have to be qualified at building solid WiFi network cards, or vice versa.  </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-52.png" alt="Image" width="600" height="400" loading="lazy">
_The Alpha Layer may have different implementations on the same network (Source: <a target="_blank" href="https://www.youtube.com/watch?v=iHp5J_f_ToQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Developers of Layer Beta don’t need to bother themselves with this difference. At this layer, host A needs to know that in order to reach host C, it first needs to send his message to host B, rather than, say, host D. Then, host B will forward it to host C.</p>
<p>This way, Layer Beta is only responsible for finding the route to send the message. It uses the service provided by Layer Alpha – transmitting data between directly connected hosts.</p>
<p>In general, networks are very complicated, and have various requirements. Dividing the communication into layers will allow us to simplify things and make communication more flexible.</p>
<p>Now that you understand <em>why</em> we need layers, we can go on to learn about the layers that are actually used in networks. </p>
<h1 id="heading-what-is-the-five-layers-model">What is the Five Layers Model?</h1>
<p>There have been a few layer models proposed along the years – most notably, the five layers model, the 7 layers model (aka OSI model), or the 4 layers model (aka the TCP/IP model). </p>
<p>They are way more similar than different, and I choose to focus on the five layers model as it is the most practical of all – and best describes the way the Internet actually works.</p>
<h2 id="heading-the-first-layer-the-physical-layer">The First Layer – The Physical Layer</h2>
<p>The first layer is responsible for <strong>transmitting a single bit</strong> – 0 or 1 – over the network.</p>
<p>To get some intuition as to what this layer is responsible for, consider the time of transmission. Assume that we have some kind of cable to transmit our data, and we use <code>+5</code> Voltage to transmit <code>1</code>, and <code>-5</code> Voltage to transmit <code>0</code>. What bits does the following diagram represent?</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-53.png" alt="Image" width="600" height="400" loading="lazy">
_A physical layer implementation encoding 1 as +5 Voltage and 0 as -5 Voltage (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Q3qqd6Y2FbQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Well, it might be <code>1001</code>. That is the case if it takes <em>this</em> long to transmit a single bit (demonstrated by the dashed orange line in the diagram below):</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-54.png" alt="Image" width="600" height="400" loading="lazy">
_An example bitstream encoded by this signal (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Q3qqd6Y2FbQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>However, it might also represent other bit streams. For instance, if it only takes half the time to transmit a single bit (demonstrated by the dashed green line below), then the bit stream might be <code>11000011</code>:  </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-55.png" alt="Image" width="600" height="400" loading="lazy">
_Another possible bitstream encoded by the same signal (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Q3qqd6Y2FbQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>The difference lies in the time dedicated for transmitting a single bit. This is called the <strong>bitrate –</strong> that is, the number of bits that are conveyed per unit of time.</p>
<p>Of course, achieving a high bitrate is preferable, as it means we can send many bits in a short timeframe. But it is hard to achieve high bitrates without getting many errors.</p>
<p>This is only one of the things that the first layer needs to take into consideration. The important thing for now is the goal of this layer: to transmit and receive a single bit.</p>
<h2 id="heading-the-second-layer-the-data-link-layer">The Second Layer – The Data Link Layer</h2>
<p>The second layer is responsible for transmitting data between <strong>two hosts that are directly linked</strong>, despite possible errors.</p>
<p>What do we mean by “directly linked”? For now, imagine that there is no device in between the two devices. So, if we have two computers here – computer A and computer B, and they are connected via computer M – then computer A and computer B are NOT directly linked. But computer A and computer M <strong>are</strong> directly linked, and so are computer M and computer B.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-56.png" alt="Image" width="600" height="400" loading="lazy">
_Two remote hosts connected via another device (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Q3qqd6Y2FbQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Another way to put it is that computer A and computer M are <strong>one hop</strong> away from one another, whereas computer A and computer B are <strong>two hops</strong> away. </p>
<p>That is, in order to get from computer A to computer B we need two “hops” – one hop from A to M, and another hop from M to B.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-57.png" alt="Image" width="600" height="400" loading="lazy">
_Every direct connection is called a Hop (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Q3qqd6Y2FbQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Going back to the second layer's responsibility – we mentioned it is responsible for transmitting data between two hosts that are directly linked, <strong>despite possible errors</strong>.</p>
<p>What do we mean by <strong>errors</strong>? The physical layer might provide erroneous data. For example, <code>1</code> instead of <code>0</code>. So a stream of bits such as <code>000110</code>, might be received as <code>001110</code>. </p>
<p>Many reasons might cause these kind of errors. For instance, we can think of a truck literally running over the wire where the bits are transmitted, causing some problem. Regardless of the reason, the second layer must handle the communication despite these errors.</p>
<p>The second layer sends data in <em>datagrams</em>, that is, in chunks. Datagrams in this layer are called <strong>Frames</strong>. Frames will usually contain <strong>MAC addresses</strong>, which are physical addresses, one identifying the sender, and another identifying the receiver.  </p>
<p>Why would we need a MAC address?</p>
<p>First, the receiving devices would like to know whether the frame is intended for them. The receiver wouldn’t like to waste precious time reading data intended for someone else. If the frame contains a MAC address that doesn’t belong to a receiver's device, that device can simply ignore this frame.</p>
<p>Second, for privacy reasons - we would like messages to arrive only at intended receivers, so only they can read the data.</p>
<p>Third, the sender would like the receiver to know who sent the frame. That way, the receiver will be able to send their response back to the sender, and not to someone else.</p>
<p>Note that we would like these addresses to be unique. That is, we want one address to identify a single device. That way, we know that if we send a message to a specific address it will be sent to the intended device only.</p>
<h2 id="heading-the-third-layer-the-network-layer">The Third Layer – The Network Layer</h2>
<p>The third layer is responsible for <strong>routing</strong> – that is, determining the path where the data will “travel”.</p>
<p>You can think of this layer as the successful routing app, Google Maps. When you get in the car and use Google Maps, you tell the app your destination, and Google Maps finds out the best route for you to drive in. </p>
<p>Notice that Google Maps is dynamic – it won’t necessarily pick the same route each time. Sometimes, one path will have a traffic jam, so Google Maps will prefer another route.</p>
<p>We said that the second layer has physical addresses, called MAC addresses. The third layer is responsible for <strong>logical addresses</strong>, such as <strong>IP addresses</strong>.</p>
<p>In this layer, datagrams are called <strong>packets</strong>.</p>
<p>Consider the following network diagram:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-59.png" alt="Image" width="600" height="400" loading="lazy">
_A network diagram with Computer A in France, Computer B in the US, and 10 routers in between (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Q3qqd6Y2FbQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>We have two computers here – one in France, and one in the United States. Of course, they are not directly linked. Rather, they are linked via third layer devices called <strong>routers</strong>.</p>
<p>Which layer is responsible for each connection?</p>
<p>Consider the connection between Computer A and Router 1. The second layer is responsible for this connection. What about the connection between Router 2 and Router 5? Right, again, this is the second layer. The same applies for each connection between two directly linked devices.</p>
<p>The third layer is responsible for defining the route – that the message sent from Computer A to Computer B will go through Routers 1, 2, 5, 8 and 10, and not in another way.</p>
<p>Note that there may be different implementations for each layer. For instance, we may have different implementations of the second layer. So while the connection between computer A and Router 1 might be over an Ethernet cable, the connection between Router 1 and 2 might be wireless and use WiFi. The connection between Router 2 and Router 5 might use a carrier pigeon, while the connection between router 5 and 9 will also use WiFi.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-61.png" alt="Image" width="600" height="400" loading="lazy">
_The second layer may be implemented differently on every link (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Q3qqd6Y2FbQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>The third layer does not care about these changes, but the second layer definitely does. If the carrier pigeon that transmits data from Router 2 to Router 5 is sick, the second layer will have to handle it. The data link layer will also have to make sure the data transmitted over the air between routers 1 and 5 is valid and without errors. </p>
<h2 id="heading-interim-summary">Interim Summary</h2>
<p>So far we have covered three of the five layers.  To recap:</p>
<ul>
<li>The physical layer is responsible for transmitting a single bit, <code>1</code> or <code>0</code>, over the network. </li>
<li>The data link layer is responsible for transmitting data between directly linked devices, that is – devices connected via a single hop. </li>
<li>The third layer is responsible for transferring data between hosts that are connected via multiple hops. It determines the route, the path that the packets will travel.</li>
</ul>
<h2 id="heading-the-fourth-layer-the-transportation-layer">The Fourth Layer – The Transportation Layer</h2>
<p>The fourth layer is an end-to-end layer. That is, it is responsible for communication from the source, all the way to the ultimate destination.</p>
<p>It allows <strong>multiplexing</strong> of multiple services. For example, one server may serve as a Web server, as well as a Mail server. When a client turns to that server, the client should be able to specify which service it would like to access. While the third layer specifies the address of the server, the transport layer identifies which <strong>service</strong> is relevant for the current communication.</p>
<p>In addition, the transport layer <em>may</em> ensure reliability. So when this layer receives data from the upper layer, it splits it into chunks, sends them, and makes sure that all those chunks arrive correctly at the other end. </p>
<p>Notice that the network layer is usually <em>not</em> reliable. Packets may arrive in incorrect order, they can arrive with incorrect data, or even not arrive at all. A reliable transportation layer makes sure that the data is correctly received.</p>
<p>In this layer, datagrams are called <strong>segments</strong>.</p>
<p>Consider the following network diagram once more:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-59.png" alt="Image" width="600" height="400" loading="lazy">
_The network diagram again (Source: <a target="_blank" href="https://www.youtube.com/watch?v=Q3qqd6Y2FbQ&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Which layer is responsible for what?</p>
<p>We have already said that the network layer is responsible for the route, that is, the path in which the packets travel. We also mentioned that the second layer is responsible for the transmission of the data between two, directly connected devices. For example, the link between Router 1 and Router 2.</p>
<p>The fourth layer views all of this network diagram as an abstract cloud. It doesn’t know the routers, and it doesn’t care about the structure of the network, or the routing. It assumes that the network can send a packet from one end to another:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-62.png" alt="Image" width="600" height="400" loading="lazy">
_The fourth layer sees the network as an abstract cloud (Source: <a target="_blank" href="https://www.youtube.com/watch?v=LYH4DwydVAM&amp;ab_channel=Brief">Brief</a>)_</p>
<p>The transportation layer makes sure that the endpoints can communicate over different services – for example, web and email. In addition, it might make sure that the connection is reliable. </p>
<p>One example would be to acknowledge every received segment. For instance, when computer A sends a segment to computer B, computer B will send a special Acknowledgement segment, announcing that it has received the packet. </p>
<h2 id="heading-the-fifth-layer-the-application-layer">The Fifth Layer – The Application Layer</h2>
<p>Last but definitely not least, we have the fifth layer, or <strong>Application Layer.</strong> This layer provides the service to the user’s application – web service, Voice over IP (VoIP), network games, streaming, and so on. </p>
<p>According to the layers model, the fifth layer doesn’t care at all about the network. It relies on the fourth layer, as well as the lower layers, to transmit the data from one endpoint to another. The fifth layer will use this service for the various needs of the application. </p>
<p>Different protocols will be used for different applications. For instance, HTTP protocol is commonly used for serving web pages on the World Wide Web. SMTP is a protocol used for emails, FTP for exchanging files, and there are many, many more.</p>
<h1 id="heading-what-is-encapsulation">What is Encapsulation?</h1>
<p>The goal of networks is to transmit data from one host to another.</p>
<p>To achieve this goal, each layer adds its own <strong>header</strong> to the data. A header contains information specific for that layer, and it precedes the data itself. </p>
<p>Consider a case where we have a lookup service, used in order to find a person’s phone number, given the person's name. The data consists of the person’s first and last name. </p>
<p>Before the packet is sent, the fifth layer might add its own <strong>header</strong>, describing that this is a REQUEST packet. The header might also specify that this is a request to map from a person’s name to a phone number, and not vice versa.  </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-64.png" alt="Image" width="600" height="400" loading="lazy">
_Header of the 5th layer, with data (Source: <a target="_blank" href="https://www.youtube.com/watch?v=DBLtFjrTvD0&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Then, the fifth layer passes the data to the fourth layer. Note, that the fourth layer regards everything as data – ones and zeroes. It doesn’t care if the fifth layer added a header, or what is written inside that header. </p>
<p>The fourth layer then adds its own header. For instance, it might specify that the requested service is the names-and-phones service. It may also include a sequential number for the packet, so it can be identified later.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-65.png" alt="Image" width="600" height="400" loading="lazy">
_Header of the 4th layer, with data which includes the 5th layer's header (Source: <a target="_blank" href="https://www.youtube.com/watch?v=DBLtFjrTvD0&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Afterwards, the fourth layer will pass the packet to the third layer. Again, the third layer will regard everything it has received – including the data itself, the header added by the fifth layer, and the header added by the fourth layer – simply as a chunk of data. </p>
<p>Then, the third layer will add its own header. For instance, it may include the source address and destination address of the packet.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-66.png" alt="Image" width="600" height="400" loading="lazy">
_Header of the 4th layer, with data which includes the 4th layer's header and data (Source: <a target="_blank" href="https://www.youtube.com/watch?v=DBLtFjrTvD0&amp;ab_channel=Brief">Brief</a>)_</p>
<p>This process goes on. So, each layer adds its own header to the packet<em>. This process is called <em>*encapsulation</em></em>.</p>
<p>On the other end, the receiver gets the packet and needs to read and remove the headers.</p>
<ul>
<li>The second layer may also include a <em>trailer</em> – an additional chunk of bits following the data, with some information.</li>
</ul>
<h1 id="heading-putting-it-all-together">Putting it All Together</h1>
<p>Now that we have covered the five layers, let’s have one example using all of them together. </p>
<p>Let’s say we would like to send a video file to our friend who lives in France, while we are enjoying a trip in Argentina. For that, we are using an email service. </p>
<p>The fifth layer defines how the email will be transmitted. For example, it includes the email address of the sender, as well as the receiver. It contains a title, and the body of the message. It requires that we follow a specific template of an email address, that will be included in the header of this layer. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-63.png" alt="Image" width="600" height="400" loading="lazy">
_The five layers model, with an example of sending an email (Source: <a target="_blank" href="https://www.youtube.com/watch?v=LYH4DwydVAM&amp;ab_channel=Brief">Brief</a>)_</p>
<p>Then, the fifth layer uses the fourth layer in order to split the email into chunks. Of course, each chunk will also carry the fourth layer's header. It is also used in order to specify that we are currently using an email service. </p>
<p>In this case, we definitely want the connection to be reliable – so the receiver will be able to play our video file correctly. Thus, the fourth layer will also handle reliability. On the receiver’s end, it might send an acknowledgment packet for every packet it receives.</p>
<p>The third layer will define the best route for every packet to be sent. It might choose different routes for different packets. Among other things, its header will contain the source and destination addresses for the packet.</p>
<p>The second layer will be responsible for every link between two directly connected devices. Its header will include the MAC addresses for each device. </p>
<p>The first layer is responsible for encoding all the ones and zeros, and to pass them over the line. And then, decoding and reading those ones and zeroes on the other end. On this layer, we don't really have a header, as it consists of single bits only.</p>
<p>This way, every layer uses the services provided by the lower layers, and the huge problem of transmitting data over the network becomes doable. How amazing is that?</p>
<h1 id="heading-summary">Summary</h1>
<p>In this post you learned what the five layers model is and why we need layers. You should now understand what each layer is responsible for, and you can fit every topic you encounter in Computer Networks into this model.</p>
<h2 id="heading-about-the-author">About the Author</h2>
<p><a target="_blank" href="https://www.linkedin.com/in/omer-rosenbaum-034a08b9/">Omer Rosenbaum</a> is <a target="_blank" href="https://swimm.io/">Swimm</a>’s Chief Technology Officer. He's the author of the Brief <a target="_blank" href="https://youtube.com/@BriefVid">YouTube Channel</a>. He's also a cyber training expert and founder of Checkpoint Security Academy. He's the author of <a target="_blank" href="https://data.cyber.org.il/networks/networks.pdf">Computer Networks (in Hebrew)</a>. You can find him on <a target="_blank" href="https://twitter.com/Omer_Ros">Twitter</a>.</p>
<h3 id="heading-additional-references">Additional References</h3>
<ul>
<li><a target="_blank" href="https://www.youtube.com/playlist?list=PL9lx0DXCC4BMS7dB7vsrKI5wzFyVIk2Kg">Computer Networks Playlist - on my Brief channel</a>.</li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/osi-model-networking-layers-explained-in-plain-english/">The Seven Layer model explained in plain English</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/what-is-tcp-ip-layers-and-protocols-explained/">The TCP/IP model – layers and protocol explained</a></li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ A Visual Guide to Git Internals — Objects, Branches, and How to Create a Repo From Scratch ]]>
                </title>
                <description>
                    <![CDATA[ Many of us use git on a daily basis. But how many of us know what goes on under the hood?  For example, what happens when we use git commit? What is stored between commits? Is it just a diff between the current and previous commit? If so, how ]]>
                </description>
                <link>https://www.freecodecamp.org/news/git-internals-objects-branches-create-repo/</link>
                <guid isPermaLink="false">66c17c2258ee0865d2671b59</guid>
                
                    <category>
                        <![CDATA[ Git ]]>
                    </category>
                
                    <category>
                        <![CDATA[ handbook ]]>
                    </category>
                
                    <category>
                        <![CDATA[ version control ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Omer Rosenbaum ]]>
                </dc:creator>
                <pubDate>Mon, 14 Dec 2020 22:30:27 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/07/A-Visual-Guide-to-Git-Internals-Book-Cover--1-.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Many of us use <code>git</code> on a daily basis. But how many of us know what goes on under the hood? </p>
<p>For example, what happens when we use <code>git commit</code>? What is stored between commits? Is it just a diff between the current and previous commit? If so, how is the diff encoded? Or is an entire snapshot of the repo stored each time? What really happens when we use <code>git init</code> ?</p>
<p>Many people who use <code>git</code> don’t know the answers to the questions above. But does it really matter? </p>
<p>First, as professionals, we should strive to understand the tools we use, especially if we use them all the time — like <code>git</code> . </p>
<p>But even more acutely, I've found that understanding how git actually works is useful in many scenarios — whether it’s resolving merge conflicts, looking to conduct an interesting rebase, or even just when something goes slightly wrong.</p>
<p>You’ll benefit from this post if you’re experienced enough with <code>git</code> to feel comfortable with commands such as <code>git pull</code> ,<code>git push</code> ,<code>git add</code> or <code>git commit</code>. </p>
<p>Still, we will start with an overview to make sure we are on the same page regarding the mechanisms of <code>git</code>, and specifically, the terms used throughout this post.</p>
<p>I also uploaded a YouTube series covering this post — you are welcome to watch it <a target="_blank" href="https://www.youtube.com/playlist?list=PL9lx0DXCC4BNUby5H58y6s2TQVLadV8v7">here</a>.</p>
<h1 id="heading-what-to-expect-from-this-tutorial">What to expect from this tutorial</h1>
<p>We will get a rare understanding of what goes on under the hood of what we do almost daily. </p>
<p>We will start by covering objects — <strong>blobs, trees,</strong> and <strong>commits.</strong> We will then briefly discuss <strong>branches</strong> and how they are implemented. We will dive into the <strong>working directory, staging area</strong> and <strong>repository</strong>. </p>
<p>And we will make sure we understand how these terms relate to the <code>git</code> commands we know and use to create a new repository.</p>
<p>Next, will create a repository from scratch — without using <code>git init</code>, <code>git add</code>, or <code>git commit</code>. This will allow us to <strong>deepen our understanding of what is happening under the hood</strong> when we work with <code>git</code>. </p>
<p>We will also create new branches, switch branches, and create additional commits — all without using <code>git branch</code> or <code>git checkout</code>.</p>
<p>By the end of this post, <strong>you will feel like you <em>understand</em></strong> <code>**git**</code>. Are you up for it? 😎</p>
<h1 id="heading-git-objects-blob-tree-and-commit">Git Objects — blob, tree and commit</h1>
<p>It is very useful to think about <code>git</code> as maintaining a file system, and specifically — snapshots of that system in time.</p>
<p>A file system begins with a <em>root directory</em> (in UNIX-based systems, <code>/</code>), which usually contains other directories (for example, <code>/usr</code> or <code>/bin</code>). These directories contain other directories, and/or files (for example, <code>/usr/1.txt</code>).</p>
<p>In <code>git</code>, the contents of files are stored in objects called <strong>blobs</strong>, binary large objects.</p>
<p>The difference between <strong>blobs</strong> and files is that files also contain meta-data. For example, a file “remembers” when it was created, so if you move that file into another directory, its creation time remains the same. </p>
<p><strong>Blobs</strong>, on the other hand, are just contents — binary streams of data. A <strong>blob</strong> doesn’t register its creation date, its name, or anything but its contents.</p>
<p>Every <strong>blob</strong> in <code>git</code> is identified by its <a target="_blank" href="https://en.wikipedia.org/wiki/SHA-1">SHA-1 hash</a>. SHA-1 hashes consist of 20 bytes, usually represented by 40 characters in hexadecimal form. Throughout this post we will sometimes show just the first characters of that hash.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-34.png" alt="Blobs have SHA-1 hashes associated with them" width="600" height="400" loading="lazy"></p>
<p>In <code>git</code>, the equivalent of a directory is a <strong>tree</strong>. A <strong>tree</strong> is basically a directory listing, referring to <strong>blobs</strong> as well as other <strong>trees</strong>. </p>
<p><strong>Trees</strong> are identified by their SHA-1 hashes as well. Referring to these objects, either <strong>blobs</strong> or other <strong>trees</strong>, happens via the SHA-1 hash of the objects.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-35.png" alt="A tree is a directory listing" width="600" height="400" loading="lazy"></p>
<p>Note that the <strong>tree</strong> <strong>CAFE7</strong> refers to the <strong>blob F92A0</strong> as <em>pic.png.</em> In another <strong>tree</strong>, that same <strong>blob</strong> may have another name.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-36.png" alt="A tree may contain sub-trees, as well as blobs" width="600" height="400" loading="lazy"></p>
<p>The diagram above is equivalent to a file system with a root directory that has one file at <code>/test.js</code>, and a directory named <code>/docs</code> with two files: <code>/docs/pic.png</code> and <code>/docs/1.txt</code>.</p>
<p>Now it’s time to take a snapshot of that file system — and store all the files that existed at that time, along with their contents. </p>
<p>In <code>git</code>, a snapshot is a <strong>commit</strong>. A <strong>commit</strong> object includes a pointer to the main <strong>tree</strong> (the root directory), as well as other meta-data such as the <strong>committer</strong>, a <strong>commit</strong> message and the <strong>commit</strong> time. </p>
<p>In most cases, a <strong>commit</strong> also has one or more parent <strong>commits</strong> — the previous snapshot(s). Of course, <strong>commit</strong> objects are also identified by their SHA-1 hashes. These are the hashes we are used to seeing when we use <code>git log</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-37.png" alt="A commit is a snapshot in time. It refers to the root tree. As this is the first commit, it has no parent(s)." width="600" height="400" loading="lazy"></p>
<p>Every <strong>commit</strong> holds the <em>entire snapshot</em>, not just diffs from the previous <strong>commit(s)</strong>.</p>
<p>How can that work? Doesn’t that mean that we have to store a lot of data every commit? </p>
<p>Let’s examine what happens if we change the contents of a file. Say that we edit <code>1.txt</code>, and add an exclamation mark — that is, we changed the content from <code>HELLO WORLD</code>, to <code>HELLO WORLD!</code>.</p>
<p>Well, this change would mean that we have a new <strong>blob,</strong> with a new SHA-1 hash. This makes sense, as <code>sha1("HELLO WORLD")</code> is different from <code>sha1("HELLO WORLD!")</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-38.png" alt="Changing the blob results in a new SHA-1" width="600" height="400" loading="lazy"></p>
<p>Since we have a new hash, then the <strong>tree</strong>’s listing should also change. After all, our <strong>tree</strong> no longer points to <strong>blob 73D8A</strong>, but rather <strong>blob 62E7A</strong> instead. As we change the <strong>tree</strong>’s contents, we also change its hash.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-39.png" alt="Image" width="600" height="400" loading="lazy">
<em>The tree that points to the changed blob needs to change as well</em></p>
<p>And now, since the hash of that <strong>tree</strong> is different, we also need to change the parent <strong>tree</strong> — as the latter no longer points to <strong>tree CAFE7</strong>, but rather <strong>tree 24601</strong>. Consequently, the <strong>parent</strong> <strong>tree</strong> will also have a new hash.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-40.png" alt="The root tree also changes, and so does its hash." width="600" height="400" loading="lazy"></p>
<p>Almost ready to create a new <strong>commit</strong> object, and it seems like we are going to store a lot of data — the entire file system, once more! But is that really necessary? </p>
<p>Actually, some objects, specifically <strong>blob</strong> objects, haven’t changed since the previous commit — <strong>blob F92A0</strong> remained intact, and so did <strong>blob F00D1.</strong></p>
<p>So this is the trick — as long as an object doesn’t change, we don’t store it again. In this case, we don’t need to store <strong>blob F92A0</strong> and <strong>blob F00D1</strong> once more. We only refer to them by their hash values. We can then create our <strong>commit</strong> object.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-41.png" alt="Image" width="600" height="400" loading="lazy">
<em>Blobs that remained intact are referenced by their hash values</em></p>
<p>Since this <strong>commit</strong> is not the first <strong>commit</strong>, it has a parent — <strong>commit A1337</strong>.</p>
<h4 id="heading-so-to-recap-we-introduced-three-git-objects">So to recap, we introduced three git objects:</h4>
<ul>
<li><strong>blob —</strong> contents of a file.</li>
<li><strong>tree</strong> — a directory listing (of <strong>blobs</strong> and <strong>trees</strong>).</li>
<li><strong>commit</strong> — a snapshot of the working tree.</li>
</ul>
<p>Let us consider the hashes of these objects for a bit. Let’s say I wrote the string <code>git is awesome!</code> and created a <strong>blob</strong> from it. You did the same on your system. Would we have the same hash?</p>
<p>The answer is — Yes. Since the <strong>blobs</strong> consist of the same data, they’ll have the same SHA-1 values.</p>
<p>What if I made a <strong>tree</strong> that references the <strong>blob</strong> of <code>git is awesome!</code>, and gave it a specific name and metadata, and you did exactly the same on your system. Would we have the same hash?</p>
<p>Again, yes. Since the <strong>trees</strong> objects are the same, they would have the same hash.</p>
<p>What if I created a <strong>commit</strong> of that <strong>tree</strong> with the commit message <code>Hello</code>, and you did the same on your system. Would we have the same hash?</p>
<p>In this case, the answer is — No. Even though our <strong>commit</strong> objects refer to the same <strong>tree</strong>, they have different <strong>commit</strong> details — time, committer etc.</p>
<h1 id="heading-branches-in-git">Branches in Git</h1>
<p><strong>A branch is just a named reference to a commit</strong>.</p>
<p>We could always reference a <strong>commit</strong> by its SHA-1 hash, but humans usually prefer other forms to name objects. A <strong>branch</strong> is one way to reference a <strong>commit</strong>, but it’s really just that. </p>
<p>In most repositories, the main line of development is done in a branch called <code>master</code>. This is just a name, and it’s created when we use <code>git init</code>, making it is widely used. However, it’s by no means special, and we could use any other name we’d like. </p>
<p>Typically, the branch points to the latest <strong>commit</strong> in the line of development we are currently working on.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-42.png" alt="A branch is just a named reference to a commit" width="600" height="400" loading="lazy"></p>
<p>To create another branch, we usually use the <code>git branch</code> command. By doing that, we actually create another pointer. So if we create a branch called <code>test</code>, by using <code>git branch test</code>, we are actually creating another pointer that points to the same <strong>commit</strong> as the branch we are currently on.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-43.png" alt="Image" width="600" height="400" loading="lazy">
<em>Using <code>git branch</code> creates another pointer</em></p>
<p>How does <code>git</code> know what branch we’re currently on? It keeps a special pointer called <code>HEAD</code>. Usually, <code>HEAD</code> points to a branch, which in turns points to a <strong>commit</strong>. In some cases, <code>HEAD</code> can also point to a <strong>commit</strong> directly, but we won’t focus on that.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-44.png" alt="Image" width="600" height="400" loading="lazy">
<em>HEAD points to the branch we are currently on.</em></p>
<p>To switch the active branch to be <code>test</code>, we can use the command <code>git checkout test</code>. Now we can already guess what this command actually does — it just changes <code>HEAD</code> to point to <code>test</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-45.png" alt="Image" width="600" height="400" loading="lazy">
<em><code>git checkout test</code> changes where <code>HEAD</code> points</em></p>
<p>We could also use <code>git checkout -b test</code> before creating the <code>test</code> branch, which is the equivalent of running <code>git branch test</code> to create the branch, and then <code>git checkout test</code> to move <code>HEAD</code> to point to the new branch.</p>
<p>What happens if we make some changes and create a new <strong>commit</strong> using <code>git commit</code>? Which branch will the new <strong>commit</strong> be added to? </p>
<p>The answer is the <code>test</code> branch, as this is the active branch (since <code>HEAD</code> points to it). Afterwards, the <code>test</code> pointer will move to the newly added <strong>commit</strong>. Note that <code>HEAD</code> still points to <code>test</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-46.png" alt="Image" width="600" height="400" loading="lazy">
<em>Every time we use <code>git commit</code>, the branch pointer moves to the newly created commit.</em></p>
<p>So if we go back to master by <code>git checkout master</code>, we move <code>HEAD</code> to point to <code>master</code> again.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-47.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Now, if we create another <strong>commit</strong>, it will be added to the <code>master</code> branch (and its parent would be <strong>commit B2424</strong>).</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-48.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h1 id="heading-how-to-record-changes-in-git">How to Record Changes in Git</h1>
<p>Usually, when we work on our source code we work from a <strong>working dir</strong>. A <strong>working dir(ectrory)</strong> (or <strong>working tree</strong>) is any directory on our file system which has a <strong>repository</strong> associated with it. It contains the folders and files of our project, and also a directory called <code>.git</code> that we will talk more about later.</p>
<p>After we make some changes, we want to record them in our <strong>repository</strong>. A <strong>repository</strong> (in short: <strong>repo</strong>) is a collection of <strong>commits</strong>, each of which is an archive of what the project’s <strong>working tree</strong> looked like at a past date, whether on our machine or someone else’s. </p>
<p>A <strong>repository</strong> also includes things other than our code files, such as <code>HEAD</code>, branches, and so on.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-49.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Unlike other, similar tools you may have used, <code>git</code> does not commit changes from the <strong>working tree</strong> directly into the <strong>repository</strong>. Instead, changes are first registered in something called the <strong>index</strong>, or the <strong>staging area</strong>. </p>
<p>Both of these terms refer to the same thing, and they are used often in <code>git</code>’s documentation. We will use these terms interchangeably throughout this post.</p>
<p>When we <code>checkout</code> a branch, <code>git</code> populates the <strong>index</strong> with all the file contents that were last checked out into our <strong>working directory</strong> and what they looked like when they were originally checked out. When we use <code>git commit</code>, the <strong>commit</strong> is created based on the state of the <strong>index</strong>.</p>
<p>The use of the <strong>index</strong> allows us to carefully prepare each <strong>commit</strong>. For example, we may have two files with changes since our last <strong>commit</strong> in our <strong>working dir</strong>. We may only add one of them to the <strong>index</strong> (using <code>git add</code>), and then use <code>git commit</code> to record this change only.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-50.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Files in our <strong>working directory</strong> can be in one of two states: <strong>tracked</strong> or <strong>untracked</strong>.</p>
<p><strong>Tracked files</strong> are files that <code>git</code> knows about. They either were in the last snapshot (<strong>commit</strong>), or they are <strong>staged</strong> now (that is, they are in the <strong>staging area</strong>).</p>
<p><strong>Untracked files</strong> are everything else — any files in our <strong>working directory</strong> that were not in our last snapshot (<strong>commit</strong>) and are not in our <strong>staging area</strong>.</p>
<h1 id="heading-how-to-create-a-repo-the-conventional-way">How to Create a Repo — The Conventional Way</h1>
<p>Let’s make sure that we understand how the terms we’ve introduced relate to the process of creating a <strong>repository</strong>. This is just a quick high-level view, before we dive much deeper into this process.</p>
<p>Note — most posts with shell commands show UNIX commands. I will provide commands for both Windows and UNIX, with screenshots from Windows, for the sake of variance. When the commands are exactly the same, I will provide them only once.</p>
<p>We will initialize a new <strong>repository</strong> using <code>git init repo_1</code>, and then change our directory to that of the repository using <code>cd repo_1</code>. By using <code>tree /f .git</code> we can see that running <code>git init</code> resulted in quite a few sub-directories inside <code>.git</code>. (The flag <code>/f</code> includes files in <code>tree</code>’s output).</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-51.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Let's create a file inside the <code>repo_1</code> directory:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-52.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>On a Linux system:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-53.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>This file is within our <strong>working directory</strong>. Yet, since we haven’t added it to the <strong>staging area</strong>, it is currently <strong>untracked</strong>. Let's verify using <code>git status</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-54.png" alt="Image" width="600" height="400" loading="lazy">
<em>The new file is untracked as we haven’t added it to the staging area, and it wasn’t included in a previous commit</em></p>
<p>We can now add this file to the <strong>staging area</strong> by using <code>git add new_file.txt</code>. We can verify that it has been staged by running <code>git status</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-55.png" alt="Image" width="600" height="400" loading="lazy">
<em>Adding the new file to the staging area</em></p>
<p>We can now create a <strong>commit</strong> using <code>git commit</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-56.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Has something changed within <code>.git</code> directory? Let’s run <code>tree /f .git</code> to check:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-57.png" alt="Image" width="600" height="400" loading="lazy">
<em>A lot of things have changed within <code>.git</code></em></p>
<p>Apparently, quite a lot has changed. It's time to dive deeper into the structure of <code>.git</code> and understand what is going on under the hood when we run <code>git init</code>, <code>git add</code> or <code>git commit</code>.</p>
<h1 id="heading-time-to-get-hard-core">Time to get hard core</h1>
<p>So far we've covered some Git fundamentals, and now we’re ready to really <em>Git going.</em></p>
<p>In order to deeply understand how <code>git</code> works, we will create a <strong>repository</strong>, but this time — we'll build it from scratch.</p>
<p>We won’t use <code>git init</code>, <code>git add</code> or <code>git commit</code> which will enable us to get a better hands-on understanding of the process.</p>
<h1 id="heading-how-to-set-up-git">How to Set Up <code>.git</code></h1>
<p>Let’s create a new directory, and run <code>git status</code> within it:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-106.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Alright, so <code>git</code> seems unhappy as we don’t have a <code>.git</code> folder. The natural thing to do would be to simply create that directory:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-107.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Apparently, creating a <code>.git</code> directory is just not enough. We need to add some content to that directory.</p>
<p><strong>A</strong> <strong>git repository has two</strong> main <strong>components</strong>:</p>
<ol>
<li>A collection of objects — <strong>blobs</strong>, <strong>trees,</strong> and <strong>commits</strong>.</li>
<li>A system of naming those objects — called <strong>references</strong>.</li>
</ol>
<p>A <strong>repository</strong> may also contain other things, such as git hooks, but at the very least — it must include objects and references.</p>
<p>Let’s create a directory for the objects at <code>.git\objects</code> and a directory for the references (in short: <strong>refs</strong>) at <code>.git\refs</code> (on UNIX -based systems — <code>.git/objects</code> and <code>.git/refs</code>, respectively).</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-108.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>One type of reference is <strong>branches</strong>. Internally, <code>git</code> calls <strong>branches</strong> by the name <strong>heads</strong>. So we will create a directory for them — <code>.git\refs\heads</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-109.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>This still doesn’t change our <code>git status</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-110.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>How does <code>git</code> know where to start when looking for a <strong>commit</strong> in the <strong>repository</strong>? As I explained earlier, it looks for <code>HEAD</code>, which points to the current active branch (or <strong>commit</strong>, in some cases). </p>
<p>So, we need to create the <code>HEAD</code>, which is just a file residing at <code>.git\HEAD</code>. We can apply the following:</p>
<p>On Windows: <code>&gt; echo ref: refs/heads/master &gt; .git\HEAD</code></p>
<p>On UNIX: <code>$ echo "ref: refs/heads/master" &gt; .git/HEAD</code></p>
<p>⭐ So we now know how <code>HEAD</code> is implemented — it’s simply a file, and its contents describe what it points to.</p>
<p>Following the command above, <code>git status</code> seems to change its mind:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-111.png" alt="Image" width="600" height="400" loading="lazy">
<em>HEAD is just a file</em></p>
<p>Notice that <code>git</code> believes we are on a branch called <code>master</code>, even though we haven’t created this branch. As mentioned before, <code>master</code> is just a name. We could also make <code>git</code> believe we are on a branch called <code>banana</code> if we wanted to:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-112.png" alt="Image" width="600" height="400" loading="lazy">
<em>🍌</em></p>
<p>We will switch back to <code>master</code> for the rest of this post, just to adhere to the normal convention.</p>
<p>Now that we have our <code>.git</code> directory ready, can we work our way to make a <strong>commit</strong> (again, without using <code>git add</code> or <code>git commit</code>).</p>
<h1 id="heading-plumbing-vs-porcelain-commands-in-git">Plumbing vs Porcelain Commands in Git</h1>
<p>At this point, it would be helpful to make a distinction between two types of <code>git</code> commands: <strong>plumbing</strong> and <strong>porcelain</strong>. The application of the terms oddly comes from toilets (yeah, these — 🚽), traditionally made of porcelain, and the infrastructure of plumbing (pipes and drains). </p>
<p>We can say that the porcelain layer provides a user-friendly interface to the plumbing. Most people only deal with the porcelain. Yet, when things go (terribly) wrong, and someone wants to understand why, they would have to roll-up their sleeves to check the plumbing. (Note: these terms are not mine, they are used very widely in <code>git</code>).</p>
<p><code>git</code> uses this terminology as an analogy to separate the low-level commands that users don’t usually need to use directly (“plumbing” commands) from the more user-friendly high level commands (“porcelain” commands).</p>
<p>So far, we have dealt with porcelain commands — <code>git init</code>, <code>git add</code> or <code>git commit</code>. Next, we transition to plumbing commands.</p>
<h1 id="heading-how-to-create-objects-in-git">How to Create Objects in Git</h1>
<p>Let's start with creating an object and writing it into the objects’ database of <code>git</code>, residing within <code>.git\objects</code>. We'll find the SHA-1 hash value of a <strong>blob</strong> by using our first plumbing command, <code>git hash-object</code>, in the following way:</p>
<p>On Windows:</p>
<p><code>&gt; echo git is awesome | git hash-object --stdin</code></p>
<p>On UNIX:</p>
<p><code>$ echo "git is awesome" | git hash-object --stdin</code></p>
<p>By using <code>--stdin</code> we are instructing <code>git hash-object</code> to take its input from the standard input. This will provide us with the relevant hash value. </p>
<p>In order to actually write that <strong>blob</strong> into <code>git</code>’s object database, we can simply add the <code>-w</code> switch for <code>git hash-object</code>. Then, we can check the contents of the <code>.git</code> folder, and see that they have changed.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-113.png" alt="Image" width="600" height="400" loading="lazy">
<em>Writing a blob to the objects’ database</em></p>
<p>We can now see that the hash of our <strong>blob</strong> is — <code>54f6...36</code>. We can also see that a directory has been created under <code>.git\objects</code>, a directory named <code>54</code>, and within it, a file by the name of <code>f6...36</code>. </p>
<p>So <code>git</code> actually takes the first two characters of the SHA-1 hash and uses them as the name of a directory. The remaining characters are used as the filename for the file that actually contains the <strong>blob</strong>.</p>
<p>Why is that so? Consider a fairly big repository, one that has 300,000 objects (<strong>blobs</strong>, <strong>trees</strong>, and <strong>commits</strong>) in its database. To look up a hash inside that list of 300,000 hashes can take a while. Thus, <code>git</code> simply divides that problem by 256. </p>
<p>To look up the hash above, <code>git</code> would first look for the directory named <code>54</code> inside the directory <code>.git\objects</code>, which may have up to 256 directories (<code>00</code> through <code>FF</code>). Then, it will search that directory, narrowing down the search as it goes.</p>
<p>Back to our process of generating a <strong>commit</strong>. We have now created an object. What is the type of that object? We can use another plumbing command, <code>git cat-file -t</code> (<code>-t</code> stands for “type”), to check that out:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-114.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Not surprisingly, this object is a <strong>blob</strong>. We can also use <code>git cat-file -p</code> (<code>-p</code> stands for “pretty-print”) to see its contents:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-115.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>This process of creating a <strong>blob</strong> usually happens when we add something to the <strong>staging area</strong> — that is, when we use <code>git add</code>. </p>
<p>Remember that <code>git</code> creates a <strong>blob</strong> of the <em>entire</em> file that is staged. Even if a single character is modified or added (as we added <code>!</code> in our example before), the file has a new <strong>blob</strong> with a new <strong>hash</strong>.</p>
<p>Will there be any change to <code>git status</code>?</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-116.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Apparently, no. Adding a <strong>blob</strong> object to <code>git</code>’s internal database doesn’t change the status, as <code>git</code> doesn’t know of any tracked or untracked files at this stage. </p>
<p>We need to track this file — add it to the <strong>staging area</strong>. To do that, we can use the plumbing command <code>git update-index</code>, like so: <code>git update-index --add --cacheinfo 100644 &lt;blob-hash&gt; &lt;filename&gt;</code>.</p>
<p>Note: (The <code>cacheinfo</code> is a 16-bit file mode <a target="_blank" href="https://github.com/git/git/blob/master/Documentation/technical/index-format.txt">as stored by git</a>, following the layout of <a target="_blank" href="http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_stat.h.html">POSIX types and modes</a>. This is not within the scope of this post).</p>
<p>Running the command above will result in a change to <code>.git</code>'s contents:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-117.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Can you spot the change? A new file by the name of <code>index</code> was created. This is it — the famous <strong>index</strong> (or <strong>staging area</strong>), is basically a file that resides within <code>.git\index</code>.</p>
<p>So now that our <strong>blob</strong> has been added to the <strong>index</strong>, we expect <code>git status</code> to look different, like this:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-118.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>That’s interesting! Two things happened here.</p>
<p>First, we can see that <code>new_file.txt</code> appears in green, in the <code>Changes to be committed</code> area. That is so because the <strong>index</strong> now has <code>new_file.txt</code>, waiting to be committed.</p>
<p>Second, we can see that <code>new_file.txt</code> appears in red — because <code>git</code> believes the <em>file</em> <code>my_file.txt</code> has been deleted, and the fact that the file has been deleted is not staged. </p>
<p>This happens as we added the <strong>blob</strong> with the contents <code>git is awesome</code> to the objects’ database, and told the <strong>index</strong> that the file <code>my_file.txt</code> has the contents of that <strong>blob</strong>, but we never actually created that file. </p>
<p>We can easily solve this by taking the contents of the <strong>blob</strong>, and writing them to our file system, to a file called <code>my_file.txt</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-119.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>As a result, it will no longer appear in red by <code>git status</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-120.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>So now it’s time to create a <strong>commit</strong> object from our <strong>staging area</strong>. As explained above, a <strong>commit</strong> object has a reference to a <strong>tree</strong>, so we need to create a <strong>tree</strong>. </p>
<p>We can do this with the command <code>git write-tree</code>, which records the contents of the <strong>index</strong> in a <strong>tree</strong> object. Of course, we can use <code>git cat-file -t</code> to see that it’s indeed a <strong>tree:</strong></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-121.png" alt="Image" width="600" height="400" loading="lazy">
<em>Creating a tree object of the index</em></p>
<p>And we can use <code>git cat-file -p</code> to see its contents:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-122.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Great, so we created a <strong>tree</strong>, and now we need to create a <strong>commit</strong> object that references this <strong>tree</strong>. To do that, we can use <code>git commit-tree &lt;tree-hash&gt; -m &lt;commit message&gt;</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-123.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>You should now feel comfortable with the commands used to check the created object’s type, and print its contents:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-124.png" alt="Image" width="600" height="400" loading="lazy">
<em>Creating a commit object</em></p>
<p>Note that this <strong>commit</strong> doesn’t have a <strong>parent</strong>, because it’s the first <strong>commit</strong>. When we add another <strong>commit</strong> we will have to declare its <strong>parent</strong> — we will do so later.</p>
<p>The last hash that we got — <code>80e...8f</code> – is a <strong>commit</strong>’s hash. We are actually very used to using these hashes — we look at them all the time. Note that this <strong>commit</strong> owns a <strong>tree</strong> object, with its own hash, which we rarely specify explicitly.</p>
<p>Will something change in <code>git status</code>?</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-125.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Nope 🤔.</p>
<p>Why is that? Well, to know that our file has been committed, <code>git</code> needs to know about the latest <strong>commit</strong>. How does <code>git</code> do that? It goes to the <code>HEAD</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-126.png" alt="Image" width="600" height="400" loading="lazy">
<em>Looking at <code>HEAD</code> on Windows</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-127.png" alt="Image" width="600" height="400" loading="lazy">
<em>Looking at <code>HEAD</code> on UNIX</em></p>
<p><code>HEAD</code> points to <code>master</code>, but what is <code>master</code>? We haven’t really created it yet. </p>
<p>As we explained earlier in this post, a branch is simply a named reference to a <strong>commit</strong>. And in this case, we would like <code>master</code> to refer to the <strong>commit</strong> with the hash <code>80e8ed4fb0bfc3e7ba88ec417ecf2f6e6324998f</code>. </p>
<p>We can achieve this by simply creating a file at <code>\refs\heads\master</code>, with the contents of this hash, like so:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-128.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>⭐ In sum, a <strong>branch</strong> is just a file inside <code>.git\refs\heads</code>, containing a hash of the <strong>commit</strong> it refers to.</p>
<p>Now, finally, <code>git status</code> and <code>git log</code> seem to appreciate our efforts:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-129.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>We have successfully created a <strong>commit</strong> without using porcelain commands! How cool is that? 🎉</p>
<h1 id="heading-how-to-work-with-branches-in-git-under-the-hood">How to Work with Branches in Git — Under the Hood</h1>
<p>Just as we’ve created a <strong>repository</strong> and a <strong>commit</strong> without using <code>git init</code>, <code>git add</code> or <code>git commit</code>, now we will create and switch between <strong>branches</strong> without using porcelain commands (<code>git branch</code> or <code>git checkout</code>). </p>
<p>It’s perfectly understandable if you are excited, I am too 🙂</p>
<p><strong>Let’s start:</strong></p>
<p>So far we only have one <strong>branch</strong>, named <code>master</code>. To create another one with the name <code>test</code> (as the equivalent of <code>git branch test</code>), we would need to simply create a file named <code>test</code> within <code>.git\refs\heads</code>, and the contents of that file would be the same <strong>commit</strong>’s hash as the <code>master</code> points to.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-130.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>If we use <code>git log</code>, we can see that this is indeed the case — both <code>master</code> and <code>test</code> point to this <strong>commit</strong>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-131.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Let’s also switch to our newly created branch (the equivalent of <code>git checkout test</code>). For that, we should change <code>HEAD</code> to point to our new branch:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-132.png" alt="Image" width="600" height="400" loading="lazy">
<em>Switching to branch <code>test</code> by changing <code>HEAD</code></em></p>
<p>As we can see, both <code>git status</code> and <code>git log</code> confirm that <code>HEAD</code> now points to <code>test</code>, which is, therefore, the active branch.</p>
<p>We can now use the commands we have already used to create another file and add it to the index:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-133.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Using the commands above, we have created a file named <code>test.txt</code>, with the content of <code>Testing</code>, created a corresponding <strong>blob,</strong> and added it to the <strong>index</strong>. We also created a <strong>tree</strong> representing the <strong>index</strong>.</p>
<p>It’s now time to create a <strong>commit</strong> referencing this <strong>tree</strong>. This time, we should also specify the <em>parent</em> of this <strong>commit</strong> — which would be the previous <strong>commit</strong>. We specify the parent using the <code>-p</code> switch of <code>git commit-tree</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-136.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>We have just created a <strong>commit</strong>, with a <strong>tree</strong> as well as a parent, as we can see:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-139.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Will <code>git log</code> show us the new <strong>commit</strong>?</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-138.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>As we can see, <code>git log</code> doesn’t show anything new. Why is that? 🤔 Remember that <code>git log</code> traces the <strong>branches</strong> to find relevant commits to show. It shows us now <code>test</code> and the <strong>commit</strong> it points to, and it also shows <code>master</code> which points to the same <strong>commit</strong>. </p>
<p>That’s right — we need to change <code>test</code> to point to our new <strong>commit</strong>. We can do that by simply changing the contents of <code>.git\refs\heads\test</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/12/image-140.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>It worked! 🎉🥂</p>
<p><code>git log</code> goes to <code>HEAD</code>, which tells it to go to the branch <code>test</code>, which points to <strong>commit</strong> <code>465...5e</code>, which links back to its parent <strong>commit</strong> <code>80e...8f</code>.</p>
<p>Feel free to admire the beauty, we <em>git</em> you. 😊</p>
<h1 id="heading-summary">Summary</h1>
<p>This post introduced you to the internals of <code>git</code>. We started by covering the basic objects — <strong>blobs</strong>, <strong>trees,</strong> and <strong>commits</strong>. </p>
<p>We learned that a <strong>blob</strong> holds the contents of a file. A <strong>tree</strong> is a directory-listing, containing <strong>blobs</strong> and/or sub-<strong>trees</strong>. A <strong>commit</strong> is a snapshot of our working directory, with some meta-data such as the time or the commit message. </p>
<p>We then discussed <strong>branches</strong> and explained that they are nothing but a named reference to a <strong>commit</strong>.</p>
<p>We went on to describe the <strong>working directory</strong>, a directory that has a repository associated with it, the <strong>staging area (index)</strong> which holds the <strong>tree</strong> for the next <strong>commit</strong>, and the <strong>repository</strong>, which is a collection of <strong>commits</strong>. </p>
<p>We clarified how these terms relate to <code>git</code> commands we know by creating a new repository and committing a file using the well-known <code>git init</code>, <code>git add</code>, and <code>git commit</code>.</p>
<p>Then, we fearlessly deep-dived into <code>git</code>. We stopped using porcelain commands and switched to plumbing commands. </p>
<p>By using <code>echo</code> and low-level commands such as <code>git hash-object</code>, we were able to create a <strong>blob</strong>, add it to the <strong>index</strong>, create a <strong>tree</strong> of the <strong>index</strong>, and create a <strong>commit</strong> object pointing to that <strong>tree</strong>. </p>
<p>We were also able to create and switch between <strong>branches</strong>. Kudos to those of you who tried this on their own!👏</p>
<p>Hopefully, after following this post you feel you’ve deepened your understanding of what is happening under the hood when working with <code>git</code>.</p>
<p><strong>Thanks for reading!</strong> If you enjoyed this article, you can read more on this topic on the <a target="_blank" href="http://swimm.io/">s</a>wimm.io blog.</p>
<p><em><a target="_blank" href="https://www.linkedin.com/in/omer-rosenbaum-034a08b9/">Omer Rosenbaum</a>, <a target="_blank" href="https://swimm.io/">Swimm</a>’s Chief Technology Officer. Cyber training expert and Founder of Checkpoint Security Academy. Author of</em> <a target="_blank" href="https://data.cyber.org.il/networks/networks.pdf"><em>Computer Networks (in Hebrew)</em></a><em>.</em> </p>
<p><em>Visit My</em> <a target="_blank" href="https://www.youtube.com/watch?v=79jlgESHzKQ&amp;list=PL9lx0DXCC4BMS7dB7vsrKI5wzFyVIk2Kg"><em>YouTube Channel</em></a><em>.</em></p>
<hr>
<h1 id="heading-additional-references">Additional References</h1>
<p>A lot has been written and said about <code>git</code>. Specifically, I found these references to be useful:</p>
<ul>
<li><a target="_blank" href="https://www.youtube.com/playlist?list=PL9lx0DXCC4BNUby5H58y6s2TQVLadV8v7">Git Internals YouTube playlist — by Brief</a></li>
<li><a target="_blank" href="https://www.youtube.com/watch?v=MYP56QJpDr4">Tim Berglund’s lecture — “Git From the Bits Up”</a></li>
<li><a target="_blank" href="https://jwiegley.github.io/git-from-the-bottom-up/">Git from the Bottom Up — by John Wiegley</a></li>
<li><a target="_blank" href="http://www.gelato.unsw.edu.au/archives/git/0512/13748.html">as promised, docs: git for the confused</a></li>
<li><a target="_blank" href="https://git-scm.com/book/en/v2/Git-Internals-Git-Objects">Git Internals — Git Objects — from Pro Git book, by Scott Chacon and Ben Straub</a></li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Mutable vs Immutable Objects in Python – A Visual and Hands-On Guide ]]>
                </title>
                <description>
                    <![CDATA[ Python is an awesome language. Because of its simplicity, many people choose it as their first programming language.  Experienced programmers use Python all the time as well, thanks to its wide community, abundance of packages, and clear syntax. But ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/mutable-vs-immutable-objects-python/</link>
                <guid isPermaLink="false">66c17c4058ee0865d2671b5f</guid>
                
                    <category>
                        <![CDATA[ pythonic programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ immutability ]]>
                    </category>
                
                    <category>
                        <![CDATA[ mutable ]]>
                    </category>
                
                    <category>
                        <![CDATA[ object ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Omer Rosenbaum ]]>
                </dc:creator>
                <pubDate>Wed, 11 Nov 2020 19:01:31 +0000</pubDate>
                <media:content url="https://cdn-media-2.freecodecamp.org/w1280/5f9c95a1740569d1a4ca0dd3.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Python is an awesome language. Because of its simplicity, many people choose it as their first programming language. </p>
<p>Experienced programmers use Python all the time as well, thanks to its wide community, abundance of packages, and clear syntax.</p>
<p>But there's one issue that seems to confuse beginners as well as some experienced developers: Python objects. Specifically, the difference between <strong>mutable</strong> and <strong>immutable</strong> objects.</p>
<p>In this post we will deepen our knowledge of Python objects, learn the difference between <strong>mutable</strong> and <strong>immutable</strong> objects, and see how we can use the <strong>interpreter</strong> to better understand how Python operates. </p>
<p>We will use important functions and keywords such as <code>id</code> and <code>is</code>, and we'll understand the difference between <code>x == y</code> and <code>x is y</code>.</p>
<p>Are you up for it? Let's get started.</p>
<h1 id="heading-in-python-everything-is-an-object">In Python, everything is an object</h1>
<p>Unlike other programming languages where the language <em>supports</em> objects, in Python really <strong>everything</strong> is an object – including integers, lists, and even functions.</p>
<p>We can use our interpreter to verify that:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>isinstance(<span class="hljs-number">1</span>, object)
<span class="hljs-literal">True</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>isinstance(<span class="hljs-literal">False</span>, object)
<span class="hljs-literal">True</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">my_func</span>():</span>
   <span class="hljs-keyword">return</span> <span class="hljs-string">"hello"</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>isinstance(my_func, object)
<span class="hljs-literal">True</span>
</code></pre>
<p>Python has a built-in function, <code>id</code>, which returns the address of an object in memory. For example:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>x = <span class="hljs-number">1</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>id(x)
<span class="hljs-number">1470416816</span>
</code></pre>
<p>Above, we created an <strong>object</strong> by the name of <code>x</code>, and assigned it the value of <code>1</code>. We then used <code>id(x)</code> and discovered that this object is found at the address <code>1470416816</code> in memory.</p>
<p>This allows us to check interesting things about Python. Let's say we create two variables in Python – one by the name of <code>x</code>, and one by the name of <code>y</code> – and assign them the same value. For example, here:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>x = <span class="hljs-string">"I love Python!"</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>y = <span class="hljs-string">"I love Python!"</span>
</code></pre>
<p>We can use the equality operator (<code>==</code>) to verify that they indeed have the same value in Python's eyes:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>x == y
<span class="hljs-literal">True</span>
</code></pre>
<p>But are these the same object in memory? In theory, there can be two very different scenarios here. </p>
<p>According to scenario <strong>(1)</strong>, we really have two different objects, one by the name of <code>x</code>, and another by the name of <code>y</code>, that just happen to have the same value. </p>
<p>Yet, it could also be the case that Python actually stores here only one object, which has two names that reference it – as shown in scenario <strong>(2)</strong>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/image-19.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>We can use the <code>id</code> function introduced above to check this:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>x = <span class="hljs-string">"I love Python!"</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>y = <span class="hljs-string">"I love Python!"</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>x == y
<span class="hljs-literal">True</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>id(x)
<span class="hljs-number">52889984</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>id(y)
<span class="hljs-number">52889384</span>
</code></pre>
<p>So as we can see, Python's behavior matches scenario (1) described above. Even though <code>x == y</code> in this example (that is, <code>x</code> and <code>y</code> have the same <em>values</em>), they are different objects in memory. This is because <code>id(x) != id(y)</code>, as we can verify explicitly:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>id(x) == id(y)
<span class="hljs-literal">False</span>
</code></pre>
<p>There is a shorter way to make the comparison above, and that is to use Python's <code>is</code> operator. Checking whether <code>x is y</code> is the same as checking <code>id(x) == id(y)</code>, which means whether <code>x</code> and <code>y</code> are the same object in memory:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>x == y
<span class="hljs-literal">True</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>id(x) == id(y)
<span class="hljs-literal">False</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>x <span class="hljs-keyword">is</span> y
<span class="hljs-literal">False</span>
</code></pre>
<p>This sheds light on the important difference between the equality operator <code>==</code> and the identity operator <code>is</code>. </p>
<p>As you can see in the example above, it is completely possible for two names in Python (<code>x</code> and <code>y</code>) to be bound to two different objects (and thus, <code>x is y</code> is <code>False</code>), where these two objects have the same value (so <code>x == y</code> is <code>True</code>).</p>
<p>How can we create another variable that points to the same object that <code>x</code> is pointing to? We can simply use the assignment operator <code>=</code>, like so:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>x = <span class="hljs-string">"I love Python!"</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>z = x
</code></pre>
<p>To verify that they indeed point to the same object, we can use the <code>is</code> operator:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>x <span class="hljs-keyword">is</span> z
<span class="hljs-literal">True</span>
</code></pre>
<p>Of course, this means they have the same address in memory, as we can verify explicitly by using <code>id</code>:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>id(x)
<span class="hljs-number">54221824</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>id(z)
<span class="hljs-number">54221824</span>
</code></pre>
<p>And, of course, they have the same value, so we expect <code>x == z</code> to return <code>True</code> as well:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>x == z
<span class="hljs-literal">True</span>
</code></pre>
<h1 id="heading-mutable-and-immutable-objects-in-python">Mutable and immutable objects in Python</h1>
<p>We have said that everything in Python is an object, yet there is an important distinction between objects. Some objects are <strong>mutable</strong> while some are <strong>immutable</strong>. </p>
<p>As I mentioned before, this fact causes confusion for many people who are new to Python, so we are going to make sure it's clear.</p>
<h2 id="heading-immutable-objects-in-python">Immutable objects in Python</h2>
<p>For some types in Python, once we have created instances of those types, they never change. They are <strong>immutable</strong>. </p>
<p>For example, <code>int</code> objects are immutable in Python. What will happen if we try to change the value of an <code>int</code> object?</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>x = <span class="hljs-number">24601</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>x
<span class="hljs-number">24601</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>x = <span class="hljs-number">24602</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>x
<span class="hljs-number">24602</span>
</code></pre>
<p>Well, it seems that we changed <code>x</code> successfully. This is exactly where many people get confused. What exactly happened under the hood here? Let's use <code>id</code> to further investigate:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>x = <span class="hljs-number">24601</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>x
<span class="hljs-number">24601</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>id(x)
<span class="hljs-number">1470416816</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>x = <span class="hljs-number">24602</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>x
<span class="hljs-number">24602</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>id(x)
<span class="hljs-number">1470416832</span>
</code></pre>
<p>So we can see that by assigning <code>x = 24602</code>, we didn't change the value of the object that <code>x</code> had been bound to before. Rather, we created a new object, and bound the name <code>x</code> to it. </p>
<p>So after assigning <code>24601</code> to <code>x</code> by using <code>x = 24601</code>, we had the following state:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/image-46.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>And after using <code>x = 24602</code>, we created a new object, and bound the name <code>x</code> to this new object. The other object with the value of <code>24601</code> is no longer reachable by <code>x</code> (or any other name in this case):</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/image-47.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Whenever we assign a new value to a name (in the above example - <code>x</code>) that is bound to an <code>int</code> object, we actually change the binding of that name to another object. </p>
<p>The same applies for <code>tuple</code>s, strings (<code>str</code> objects), and <code>bool</code>s as well. In other words, <code>int</code> (and other number types such as <code>float</code>), <code>tuple</code>, <code>bool</code>, and <code>str</code> objects are <strong>immutable</strong>.</p>
<p>Let's test this hypothesis. What happens if we create a <code>tuple</code> object, and then give it a different value? </p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>my_tuple = (<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>id(my_tuple)
<span class="hljs-number">54263304</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>my_tuple = (<span class="hljs-number">3</span>, <span class="hljs-number">4</span>, <span class="hljs-number">5</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>id(my_tuple)
<span class="hljs-number">56898184</span>
</code></pre>
<p>Just like an <code>int</code> object, we can see that our assignment actually changed the object that the name <code>my_tuple</code> is bound to.</p>
<p>What happens if we try to change one of the <code>tuple</code>'s elements?</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>my_tuple[<span class="hljs-number">0</span>] = <span class="hljs-string">'a new value'</span>
Traceback (most recent call last):
  File <span class="hljs-string">"&lt;stdin&gt;"</span>, line <span class="hljs-number">1</span>, <span class="hljs-keyword">in</span> &lt;module&gt;
TypeError: <span class="hljs-string">'tuple'</span> object does <span class="hljs-keyword">not</span> support item assignment
</code></pre>
<p>As we can see, Python doesn't allow us to modify <code>my_tuple</code>'s contents, as it is immutable.</p>
<h2 id="heading-mutable-objects-in-python">Mutable objects in Python</h2>
<p>Some types in Python can be modified after creation, and they are called <strong>mutable</strong>. For example, we know that we can modify the contents of a <code>list</code> object:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>my_list = [<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>my_list[<span class="hljs-number">0</span>] = <span class="hljs-string">'a new value'</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>my_list
[<span class="hljs-string">'a new value'</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>]
</code></pre>
<p>Does that mean we actually created a new object when assigning a new value to the first element of <code>my_list</code>? Again, we can use <code>id</code> to check:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>my_list = [<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>id(my_list)
<span class="hljs-number">55834760</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>my_list
[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>]

<span class="hljs-meta">&gt;&gt;&gt; </span>my_list[<span class="hljs-number">0</span>] = <span class="hljs-string">'a new value'</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>id(my_list)
<span class="hljs-number">55834760</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>my_list
[<span class="hljs-string">'a new value'</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>]
</code></pre>
<p>So our first assignment <code>my_list = [1, 2, 3]</code> created an object in the address <code>55834760</code>, with the values of <code>1</code>, <code>2</code>, and <code>3</code>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/image-22.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>We then modified the first element of this <code>list</code> object using <code>my_list[0] = 'a new value'</code>, that is - without creating a new <code>list</code> object:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/image-23.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Now, let us create two names – <code>x</code> and <code>y</code>, both bound to the same <code>list</code> object. We can verify that either by using <code>is</code>, or by explicitly checking their <code>id</code>s:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>x = y = [<span class="hljs-number">1</span>, <span class="hljs-number">2</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>x <span class="hljs-keyword">is</span> y
<span class="hljs-literal">True</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>id(x)
<span class="hljs-number">18349096</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>id(y)
<span class="hljs-number">18349096</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>id(x) == id(y)
<span class="hljs-literal">True</span>
</code></pre>
<p>What happens now if we use <code>x.append(3)</code>? That is, if we add a new element (<code>3</code>) to the object by the name of <code>x</code>?</p>
<p>Will <code>x</code> by changed? Will <code>y</code>?</p>
<p>Well, as we already know, they are basically two names of the same object:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/image-28.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Since this object is changed, when we check its names we can see the new value:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>x.append(<span class="hljs-number">3</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>x
[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>]

<span class="hljs-meta">&gt;&gt;&gt; </span>y
[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>]
</code></pre>
<p>Note that <code>x</code> and <code>y</code> have the same <code>id</code> as before – as they are still bound to the same <code>list</code> object:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>id(x)
<span class="hljs-number">18349096</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>id(y)
<span class="hljs-number">18349096</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/image-27.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>In addition to <code>list</code>s, other Python types that are mutable include <code>set</code>s and <code>dict</code>s.</p>
<h1 id="heading-implications-for-dictionary-keys-in-python">Implications for dictionary keys in Python</h1>
<p>Dictionaries (<code>dict</code> objects) are commonly used in Python. As a quick reminder, we define them like so:</p>
<pre><code class="lang-python">my_dict = {<span class="hljs-string">"name"</span>: <span class="hljs-string">"Omer"</span>, <span class="hljs-string">"number_of_pets"</span>: <span class="hljs-number">1</span>}
</code></pre>
<p>We can then access a specific element by its key name:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>my_dict[<span class="hljs-string">"name"</span>]
<span class="hljs-string">'Omer'</span>
</code></pre>
<p>Dictionaries are <strong>mutable</strong>, so we can change their content after creation. At any given moment, a key in the dictionary can point to one element only:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>my_dict[<span class="hljs-string">"name"</span>] = <span class="hljs-string">"John"</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>my_dict[<span class="hljs-string">"name"</span>]
<span class="hljs-string">'John'</span>
</code></pre>
<p>It is interesting to note that a <strong>dictionary's keys must be immutable</strong>:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>my_dict = {[<span class="hljs-number">1</span>,<span class="hljs-number">2</span>]: <span class="hljs-string">"Hello"</span>}
Traceback (most recent call last):
  File <span class="hljs-string">"&lt;stdin&gt;"</span>, line <span class="hljs-number">1</span>, <span class="hljs-keyword">in</span> &lt;module&gt;
TypeError: unhashable type: <span class="hljs-string">'list'</span>
</code></pre>
<p>Why is that so? </p>
<p>Let's consider the following hypothetical scenario (note: the snippet below can't really be run in Python):</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>x = [<span class="hljs-number">1</span>, <span class="hljs-number">2</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>y = [<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>my_dict = {x: <span class="hljs-string">'a'</span>, y: <span class="hljs-string">'b'</span>}
</code></pre>
<p>So far, things don't seem that bad. We'd assume that if we access <code>my_dict</code> with the key of <code>[1, 2]</code>, we will get the corresponding value of <code>'a'</code>, and if we access the key <code>[1, 2, 3]</code>, we will get the value <code>'b'</code>. </p>
<p>Now, what would happen if we attempted to use:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>x.append(<span class="hljs-number">3</span>)
</code></pre>
<p>In this case, <code>x</code> would have the value of <code>[1, 2, 3]</code>, and <code>y</code> would also have the value of <code>[1, 2, 3]</code>. What should we get when we ask for <code>my_dict[[1, 2, 3]]</code>? Will it be <code>'a'</code> or <code>'b'</code>? To avoid such cases, Python simply doesn't allow dictionary keys to be mutable.</p>
<h1 id="heading-taking-things-a-bit-further">Taking things a bit further</h1>
<p>Let's try to apply our knowledge to a case that is a bit more interesting.</p>
<p>Below, we define a <code>list</code> (a <strong>mutable</strong> object) and a <code>tuple</code> (an <strong>immutable</strong> object). The <code>list</code> includes a <code>tuple</code>, and the <code>tuple</code> includes a <code>list</code>:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>my_list = [(<span class="hljs-number">1</span>, <span class="hljs-number">1</span>), <span class="hljs-number">2</span>, <span class="hljs-number">3</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>my_tuple = ([<span class="hljs-number">1</span>, <span class="hljs-number">1</span>], <span class="hljs-number">2</span>, <span class="hljs-number">3</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>type(my_list)
&lt;<span class="hljs-class"><span class="hljs-keyword">class</span> '<span class="hljs-title">list</span>'&gt;

&gt;&gt;&gt; <span class="hljs-title">type</span>(<span class="hljs-params">my_list[<span class="hljs-number">0</span>]</span>)
&lt;<span class="hljs-title">class</span> '<span class="hljs-title">tuple</span>'&gt;

&gt;&gt;&gt; <span class="hljs-title">type</span>(<span class="hljs-params">my_tuple</span>)
&lt;<span class="hljs-title">class</span> '<span class="hljs-title">tuple</span>'&gt;

&gt;&gt;&gt; <span class="hljs-title">type</span>(<span class="hljs-params">my_tuple[<span class="hljs-number">0</span>]</span>)
&lt;<span class="hljs-title">class</span> '<span class="hljs-title">list</span>'&gt;</span>
</code></pre>
<p>So far so good. Now, try to think for yourself – what will happen when we try to execute each of the following statements?</p>
<p>(1) <code>&gt;&gt;&gt; my_list[0][0] = 'Changed!'</code></p>
<p>(2) <code>&gt;&gt;&gt; my_tuple[0][0] = 'Changed!'</code></p>
<p>In statement (1), what we are trying to do is change <code>my_list</code>'s first element, that is, a <code>tuple</code>. Since a <code>tuple</code> is <strong>immutable</strong>, this attempt is destined to fail:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>my_list[<span class="hljs-number">0</span>][<span class="hljs-number">0</span>] = <span class="hljs-string">'Changed!'</span>
Traceback (most recent call last):
  File <span class="hljs-string">"&lt;stdin&gt;"</span>, line <span class="hljs-number">1</span>, <span class="hljs-keyword">in</span> &lt;module&gt;
TypeError: <span class="hljs-string">'tuple'</span> object does <span class="hljs-keyword">not</span> support item assignment
</code></pre>
<p>Note that what we were trying to do is <em>not</em> change the list, but rather – change the contents of its first element. </p>
<p>Let's consider statement (2). In this case, we are accessing <code>my_tuple</code>'s first element, which happens to be a <code>list</code>, and modify it. Let's further investigate this case and look at the addresses of these elements:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>my_tuple = ([<span class="hljs-number">1</span>, <span class="hljs-number">1</span>], <span class="hljs-number">2</span>, <span class="hljs-number">3</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>id(my_tuple)
<span class="hljs-number">20551816</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>type(my_tuple[<span class="hljs-number">0</span>])
&lt;<span class="hljs-class"><span class="hljs-keyword">class</span> '<span class="hljs-title">list</span>'&gt;

&gt;&gt;&gt; <span class="hljs-title">id</span>(<span class="hljs-params">my_tuple[<span class="hljs-number">0</span>]</span>)
20446248</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/image-29.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>When we change <code>my_tuple[0][0]</code>, we do not really change <code>my_tuple</code> at all! Indeed, after the change, <code>my_tuple</code>'s first element will still be the object whose address in memory is <code>20446248</code>. We do, however, change the value of that object:</p>
<pre><code class="lang-python"><span class="hljs-meta">&gt;&gt;&gt; </span>my_tuple[<span class="hljs-number">0</span>][<span class="hljs-number">0</span>] = <span class="hljs-string">'Changed!'</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>id(my_tuple)
<span class="hljs-number">20551816</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>id(my_tuple[<span class="hljs-number">0</span>])
<span class="hljs-number">20446248</span>

<span class="hljs-meta">&gt;&gt;&gt; </span>my_tuple
([<span class="hljs-string">'Changed!'</span>, <span class="hljs-number">1</span>], <span class="hljs-number">2</span>, <span class="hljs-number">3</span>)
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/image-48.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Since we only modified the value of <code>my_tuple[0]</code>, which is a mutable <code>list</code> object, this operation was indeed allowed by Python.</p>
<h1 id="heading-recap">Recap</h1>
<p>In this post we learned about Python objects. We said that in Python <strong>everything is an object</strong>, and got to use <code>id</code> and <code>is</code> to deepen our understanding of what's happening under the hood when using Python to create and modify objects.</p>
<p>We also learned the difference between <strong>mutable</strong> objects, that can be modified after creation, and <strong>immutable</strong> objects, which cannot. </p>
<p>We saw that when we ask Python to modify an immutable object that is bound to a certain name, we actually create a new object and bind that name to it.</p>
<p>We then learned why dictionary keys have to be <strong>immutable</strong> in Python.</p>
<p>Understanding how Python "sees" objects is a key to becoming a better Python programmer. I hope this post has helped you on your journey to mastering Python.</p>
<p><a target="_blank" href="https://www.linkedin.com/in/omer-rosenbaum-034a08b9/"><em>Omer Rosenbaum</em></a><em>,</em> <a target="_blank" href="https://swimm.io/"><em>Swimm</em></a><em>’s Chief Technology Officer. Cyber training expert and Founder of Checkpoint Security Academy. Author of</em> <a target="_blank" href="https://data.cyber.org.il/networks/networks.pdf"><em>Computer Networks (in Hebrew)</em></a><em>. Visit My</em> <a target="_blank" href="https://www.youtube.com/watch?v=79jlgESHzKQ&amp;list=PL9lx0DXCC4BMS7dB7vsrKI5wzFyVIk2Kg"><em>YouTube Channel</em></a><em>.</em></p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
