<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Kamal Kishore - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Kamal Kishore - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Fri, 22 May 2026 17:39:04 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/author/kamalct/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Solve 5 Common RAG Failures with Knowledge Graphs ]]>
                </title>
                <description>
                    <![CDATA[ You may have built a Retrieval-Augmented Generation (RAG) pipeline to connect a vector store to a powerful LLM. And RAG pipelines are incredibly effective at grounding models in factual, up-to-date knowledge. But if you've worked with them long enoug... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-solve-5-common-rag-failures-with-knowledge-graphs/</link>
                <guid isPermaLink="false">6915f73887b014aa0a104567</guid>
                
                    <category>
                        <![CDATA[ RAG  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ knowledge graph ]]>
                    </category>
                
                    <category>
                        <![CDATA[ llm ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Kamal Kishore ]]>
                </dc:creator>
                <pubDate>Thu, 13 Nov 2025 15:20:24 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1762904270014/5ebeec2b-0823-4f59-bdd7-bf37cb68a978.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>You may have built a Retrieval-Augmented Generation (RAG) pipeline to connect a vector store to a powerful LLM. And RAG pipelines are incredibly effective at grounding models in factual, up-to-date knowledge. But if you've worked with them long enough, you've likely hit a wall.</p>
<p>The system is great at answering "What is X?" but falls apart when you ask, "How does X relate to Y, and what happened after Z?".</p>
<p>The problem is that standard RAG, by its very nature, breaks context. It chops documents into isolated chunks, finds them based on semantic similarity, and hopes the LLM can piece the puzzle back together. This approach is blind to the relational context—the web of timelines, causes, and connections—that gives facts their meaning.</p>
<p>When queries require synthesizing information across multiple documents or complex, multi-step reasoning, standard RAG fails.</p>
<p>In this article, I’ll give you a practical, code-first guide to solving this problem. We'll move beyond simple vector search by implementing a robust, graph-based pattern to build more reliable, knowledge-aware systems.</p>
<h2 id="heading-table-of-contents">Table of Contents:</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-brittle-baseline-our-standard-rag-setup">The Brittle Baseline: Our Standard RAG Setup</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-a-more-robust-implementation-the-knowledgegraph">A More Robust Implementation: The KnowledgeGraph</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-a-knowledge-graph">What is a Knowledge Graph?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-is-this-more-effective">Why is this More Effective?</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-5-rag-failures-and-their-graph-based-solutions">5 RAG Failures and Their Graph-Based Solutions</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-pattern-1-the-multi-hop-failure">Pattern 1: The Multi-Hop Failure</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-pattern-2-the-causal-synthesis-failure">Pattern 2: The Causal Synthesis Failure</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-pattern-3-the-entity-ambiguity-trap">Pattern 3: The Entity Ambiguity Trap</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-pattern-4-the-contradictory-information-failure">Pattern 4: The Contradictory Information Failure</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-pattern-5-the-implicit-relationship-hallucination">Pattern 5: The Implicit Relationship Hallucination</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-final-thoughts">Final Thoughts</a></p>
</li>
</ul>
<h3 id="heading-prerequisites">Prerequisites</h3>
<p>This is a practical, code-first guide intended for developers and engineers who have some experience with RAG. To follow along, you should have the following:</p>
<h4 id="heading-conceptual-knowledge">Conceptual Knowledge</h4>
<ul>
<li><p>A solid understanding of what Retrieval-Augmented Generation (RAG) is and its basic components (like vector stores and LLMs).</p>
</li>
<li><p>Familiarity with basic graph concepts (nodes, edges, and relationships) is also helpful.</p>
</li>
</ul>
<h4 id="heading-technical-setup">Technical Setup</h4>
<ul>
<li><p>A Python environment.</p>
</li>
<li><p>An active Google API Key to use the Gemini API.</p>
</li>
<li><p>The Python libraries <code>langchain</code>, <code>langchain_google_genai</code>, <code>faiss-cpu</code>, and <code>networkx</code> installed.</p>
</li>
</ul>
<h2 id="heading-the-brittle-baseline-our-standard-rag-setup">The Brittle Baseline: Our Standard RAG Setup</h2>
<p>First, let's establish our baseline. This is a standard, "naïve" RAG pipeline using LangChain and the Gemini API. It ingests a list of <code>Document</code> objects, embeds them, and uses a FAISS vector store to retrieve the top-k chunks to answer a question.</p>
<p>This <code>create_rag_chain</code> function will serve as our point of comparison.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Install necessary libraries</span>
<span class="hljs-comment"># !pip install -q -U langchain langchain_google_genai faiss-cpu networkx</span>

<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> networkx <span class="hljs-keyword">as</span> nx
<span class="hljs-keyword">from</span> collections <span class="hljs-keyword">import</span> defaultdict
<span class="hljs-keyword">from</span> langchain_google_genai <span class="hljs-keyword">import</span> GoogleGenerativeAI, GoogleGenerativeAIEmbeddings
<span class="hljs-keyword">from</span> langchain.vectorstores <span class="hljs-keyword">import</span> FAISS
<span class="hljs-keyword">from</span> langchain.schema.document <span class="hljs-keyword">import</span> Document
<span class="hljs-keyword">from</span> langchain.prompts <span class="hljs-keyword">import</span> PromptTemplate
<span class="hljs-keyword">from</span> langchain.schema.runnable <span class="hljs-keyword">import</span> RunnablePassthrough
<span class="hljs-keyword">from</span> langchain.schema.output_parser <span class="hljs-keyword">import</span> StrOutputParser

<span class="hljs-comment"># --- Configure API Key (example) ---</span>
<span class="hljs-comment"># from google.colab import userdata</span>
<span class="hljs-comment"># GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY') </span>
<span class="hljs-comment"># os.environ['GOOGLE_API_KEY'] = GOOGLE_API_KEY </span>

<span class="hljs-comment"># --- Initialize Models ---</span>
<span class="hljs-comment"># Make sure your API key is set in your environment</span>
llm = GoogleGenerativeAI(model=<span class="hljs-string">"gemini-1.5-pro-latest"</span>)
embeddings = GoogleGenerativeAIEmbeddings(model=<span class="hljs-string">"models/embedding-001"</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">create_rag_chain</span>(<span class="hljs-params">docs</span>):</span>
    <span class="hljs-string">"""Creates a simple RAG chain using FAISS as the vector store."""</span> 

    <span class="hljs-comment"># Create vector store from documents</span>
    vectorstore = FAISS.from_documents(docs, embeddings)
    <span class="hljs-comment"># K=3 means it will retrieve the top 3 most relevant chunks</span>
    retriever = vectorstore.as_retriever(search_kwargs={<span class="hljs-string">"k"</span>: <span class="hljs-number">3</span>})

    template = <span class="hljs-string">"""
    Answer the following question based ONLY on the context provided.
    If the context doesn't contain the answer, say "I don't have enough information from the context."

    CONTEXT:
    {context}

    QUESTION:
    {question}
    """</span>

    prompt = PromptTemplate.from_template(template)

    <span class="hljs-comment"># Build the chain</span>
    rag_chain = (
        {<span class="hljs-string">"context"</span>: retriever, <span class="hljs-string">"question"</span>: RunnablePassthrough()} 
        | prompt
        | llm 
        | StrOutputParser() 
    )

    <span class="hljs-keyword">return</span> rag_chain
</code></pre>
<h2 id="heading-a-more-robust-implementation-the-knowledgegraph">A More Robust Implementation: The KnowledgeGraph</h2>
<h3 id="heading-what-is-a-knowledge-graph">What is a Knowledge Graph?</h3>
<p>At its core, a knowledge graph (KG) is a way of storing data as a network of nodes and edges.</p>
<ul>
<li><p><strong>Nodes</strong> represent entities: <code>people</code>, <code>companies</code>, <code>concepts</code>, or <code>events</code>.</p>
</li>
<li><p><strong>Edges</strong> represent the explicit, labeled relationships between them: <code>ceo_of</code>, <code>attended</code>, or <code>partners_with</code>.</p>
</li>
</ul>
<p>Instead of storing a document like "Jim Farley is the CEO of Ford," you store two nodes (<code>Jim Farley</code>, <code>Ford</code>) connected by a directed edge (<code>ceo_of</code>).</p>
<h3 id="heading-why-is-this-more-effective">Why is this More Effective?</h3>
<p>This structure is more effective because it preserves and makes relationships a first-class citizen.</p>
<p>Standard RAG relies on "semantic similarity". It's good at finding text chunks that <em>sound like</em> your query. But it’s "blind to the relational context" – the very thing you need for complex questions.</p>
<p>The graph-based approach solves this. When a query requires multi-step reasoning, you don't just search for similar text. You traverse a structured, explicit path in the graph. This allows the system to:</p>
<ol>
<li><p><strong>Follow chains of logic:</strong> It can answer multi-hop questions by finding a literal path from one node to another (for example, <code>F-150</code> → <code>made_by</code> → <code>Ford</code> → <code>ceo</code> → <code>Jim Farley</code>).</p>
</li>
<li><p><strong>Disambiguate entities:</strong> It can use node attributes (like <code>type: "company"</code>) to distinguish between two entities with the same name.</p>
</li>
<li><p><strong>Resolve contradictions:</strong> It can store metadata (like dates) directly <em>on the edge</em> to programmatically determine the most current fact.</p>
</li>
</ol>
<p>You move from "guessing from a cloud of semantically similar text" to querying a "global memory" of how facts are explicitly connected.</p>
<p>Here is the practical implementation of our <code>KnowledgeGraph</code>. This class uses <code>networkx</code> to store the nodes and edges we just discussed, and includes specific methods to run the structured query patterns needed to solve our RAG failures.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">KnowledgeGraph</span>:</span>
    <span class="hljs-string">"""
    A wrapper around networkx.DiGraph to store and query
    explicit entities and their relationships.
    """</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
        self.graph = nx.DiGraph() 

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">add_data</span>(<span class="hljs-params">self, nodes=None, edges=None</span>):</span>
        <span class="hljs-string">"""Populates the graph with nodes and edges."""</span>
        <span class="hljs-keyword">if</span> nodes:
            <span class="hljs-keyword">for</span> node, attrs <span class="hljs-keyword">in</span> nodes:
                self.graph.add_node(node, **attrs) 
        <span class="hljs-keyword">if</span> edges:
            <span class="hljs-keyword">for</span> u, v, attrs <span class="hljs-keyword">in</span> edges:
                self.graph.add_edge(u, v, **attrs) 

    <span class="hljs-comment"># --- Query Patterns ---</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">query_multi_hop_path</span>(<span class="hljs-params">self, source, target</span>):</span>
        <span class="hljs-string">"""
        Pattern 1: Solves multi-hop queries by finding a path.
        """</span>
        <span class="hljs-keyword">try</span>:
            path = nx.shortest_path(self.graph, source=source, target=target) 
            <span class="hljs-comment"># Format the answer based on the discovered path</span>
            <span class="hljs-keyword">return</span> <span class="hljs-string">f"<span class="hljs-subst">{path[<span class="hljs-number">-2</span>]}</span> attended <span class="hljs-subst">{path[<span class="hljs-number">-1</span>]}</span>."</span> 
        <span class="hljs-keyword">except</span> nx.NetworkXNoPath:
            <span class="hljs-keyword">return</span> <span class="hljs-string">"Could not find a connection."</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">query_with_conflict_resolution</span>(<span class="hljs-params">self, entity, relation, time_attr=<span class="hljs-string">"year"</span></span>):</span>
        <span class="hljs-string">"""
        Pattern 4: Resolves contradictions using metadata (like timestamps)
        stored on the edges.
        """</span>
        candidates = []
        <span class="hljs-keyword">for</span> neighbor <span class="hljs-keyword">in</span> self.graph.neighbors(entity):
            edge_data = self.graph.get_edge_data(entity, neighbor) 
            <span class="hljs-keyword">if</span> edge_data.get(<span class="hljs-string">"label"</span>) == relation: 
                candidates.append((neighbor, edge_data.get(time_attr, <span class="hljs-number">0</span>))) 

        <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> candidates: 
            <span class="hljs-keyword">return</span> <span class="hljs-string">"No information found."</span> 

        <span class="hljs-comment"># Sort by the time attribute, descending, and take the latest</span>
        latest = sorted(candidates, key=<span class="hljs-keyword">lambda</span> item: item[<span class="hljs-number">1</span>], reverse=<span class="hljs-literal">True</span>)[<span class="hljs-number">0</span>] 
        <span class="hljs-keyword">return</span> <span class="hljs-string">f"<span class="hljs-subst">{latest[<span class="hljs-number">0</span>]}</span> (as of <span class="hljs-subst">{latest[<span class="hljs-number">1</span>]}</span>)"</span> 

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">query_disambiguated</span>(<span class="hljs-params">self, entity_name, entity_type, attribute_key</span>):</span>
        <span class="hljs-string">"""
        Pattern 3: Uses node 'type' attributes to disambiguate
        entities with the same name.
        """</span>
        <span class="hljs-keyword">for</span> node, attrs <span class="hljs-keyword">in</span> self.graph.nodes(data=<span class="hljs-literal">True</span>): 
            <span class="hljs-comment"># Find the node that matches both name and type</span>
            <span class="hljs-keyword">if</span> entity_name <span class="hljs-keyword">in</span> node <span class="hljs-keyword">and</span> attrs.get(<span class="hljs-string">"type"</span>) == entity_type: 
                <span class="hljs-comment"># Return the requested attribute</span>
                year = attrs[<span class="hljs-string">'year'</span>]
                product = attrs[attribute_key]
                <span class="hljs-keyword">return</span> <span class="hljs-string">f"<span class="hljs-subst">{node}</span>'s first product was the <span class="hljs-subst">{product}</span> in <span class="hljs-subst">{year}</span>."</span> 
        <span class="hljs-keyword">return</span> <span class="hljs-string">"Cannot disambiguate entity."</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">query_explicit_relation</span>(<span class="hljs-params">self, source_node, relation_label</span>):</span>
        <span class="hljs-string">"""
        Pattern 5: Finds partners based on an explicit edge label,
        preventing semantic 'bleed-over' from unrelated entities.
        """</span>
        partners = [
            v <span class="hljs-keyword">for</span> u, v, data <span class="hljs-keyword">in</span> self.graph.edges(data=<span class="hljs-literal">True</span>) 
            <span class="hljs-keyword">if</span> u == source_node <span class="hljs-keyword">and</span> data.get(<span class="hljs-string">'label'</span>) == relation_label
        ] 

        <span class="hljs-keyword">if</span> partners:
            <span class="hljs-keyword">return</span> <span class="hljs-string">f"<span class="hljs-subst">{source_node}</span> partnered with <span class="hljs-subst">{<span class="hljs-string">', '</span>.join(partners)}</span>."</span> 
        <span class="hljs-keyword">return</span> <span class="hljs-string">f"No partners found for <span class="hljs-subst">{source_node}</span>."</span>

<span class="hljs-comment"># A helper function for Pattern 2 (Causal Rules)</span>
<span class="hljs-comment"># This logic is more rule-based but can be backed by a graph</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">query_causal_chain</span>(<span class="hljs-params">facts</span>):</span>
    <span class="hljs-string">"""
    Pattern 2: Synthesizes a direct conclusion by following a
    chain of causal rules.
    """</span>
    <span class="hljs-keyword">try</span>:
        <span class="hljs-keyword">if</span> facts[<span class="hljs-string">"John"</span>][<span class="hljs-string">"takes"</span>] == <span class="hljs-string">"aspirin"</span>: 
            <span class="hljs-keyword">if</span> facts[<span class="hljs-string">"aspirin"</span>][<span class="hljs-string">"is_a"</span>] == <span class="hljs-string">"blood thinner"</span>: 
                <span class="hljs-keyword">if</span> facts[<span class="hljs-string">"blood thinner"</span>][<span class="hljs-string">"risk_for"</span>] == <span class="hljs-string">"surgery"</span>:
                    <span class="hljs-keyword">return</span> <span class="hljs-string">"John is NOT safe due to increased bleeding risk from aspirin, a blood thinner."</span>
    <span class="hljs-keyword">except</span> KeyError:
        <span class="hljs-keyword">pass</span> <span class="hljs-comment"># Fall through to default</span>
    <span class="hljs-keyword">return</span> <span class="hljs-string">"Insufficient information to determine risk."</span>
</code></pre>
<h2 id="heading-5-rag-failures-and-their-graph-based-solutions">5 RAG Failures and Their Graph-Based Solutions</h2>
<p>Let's run five scenarios to see how our standard RAG chain performs against our new <code>KnowledgeGraph</code>.</p>
<h3 id="heading-pattern-1-the-multi-hop-failure">Pattern 1: The Multi-Hop Failure</h3>
<p>The multi-hop failure occurs when an answer requires connecting multiple, separate facts – a chain of reasoning that RAG often breaks.</p>
<ul>
<li><p><strong>Query:</strong> "Which university did the CEO of the company that makes the F-150 attend?"</p>
</li>
<li><p><strong>Problem:</strong> A standard retriever might get chunks for <code>F-150 -&gt; Ford</code> and <code>Jim Farley -&gt; CEO</code>, but miss the <code>Jim Farley -&gt; Georgetown</code> chunk. The chain is broken.</p>
</li>
</ul>
<h4 id="heading-why-the-naive-rag-fails">Why the Naïve RAG Fails</h4>
<p>The retriever's job is to find the <code>top-k=3</code> chunks that are <strong>semantically similar</strong> to the entire query. When the user asks, "Which university did the CEO of the company that makes the F-150 attend?", the retriever will search our 6-document list and will likely retrieve:</p>
<ol>
<li><p>The chunk about the <strong>University of Michigan</strong> (because of the words "university" and "car companies").</p>
</li>
<li><p>The chunk about <strong>Jim Farley</strong> (because of "CEO," "Ford," and "F-150 line").</p>
</li>
<li><p>The chunk about the <strong>F-150 engine options</strong> (because of "F-150").</p>
</li>
</ol>
<p>The <code>top-k=3</code> context handed to the LLM is now full of irrelevant facts. The one chunk that contains the <em>actual</em> answer ("...Mr Farley... from Georgetown University") is semantically too far from the main query and is <strong>never retrieved</strong>. The LLM fails not because it's unintelligent, but because it was never given the correct piece of the puzzle.</p>
<h4 id="heading-why-the-graphrag-succeeds">Why the GraphRAG Succeeds</h4>
<p>The knowledge graph doesn't care about semantic similarity. It performs a deterministic traversal of explicit, verified relationships.</p>
<p>We ask for the <em>path</em> from the <code>F-150</code> node to the <code>Georgetown University</code> node. The graph follows the chain we defined: <code>F-150</code> → <code>made_by</code> → <code>Ford Motor Company</code> → <code>ceo</code> → <code>Jim Farley</code> → <code>attended</code> → <code>Georgetown University</code>. It can't fail or be distracted by the "noise" documents because it's not searching – it's <strong>navigating</strong> a pre-built map.</p>
<pre><code class="lang-python"><span class="hljs-comment"># --Naive RAG</span>
docs_s1 = [
    <span class="hljs-comment"># --- The 3 "Answer" Chunks ---</span>
    Document(page_content=<span class="hljs-string">"The Ford F-150 is a full-size pickup truck made by Ford Motor Company."</span>),
    Document(page_content=<span class="hljs-string">"Jim Farley is the current CEO of Ford Motor Company."</span>),
    Document(page_content=<span class="hljs-string">"Mr. Farley received his undergraduate degree from Georgetown University."</span>),

    <span class="hljs-comment"># --- The 3 "Noise" Chunks (to distract the retriever) ---</span>
    Document(page_content=<span class="hljs-string">"The University of Michigan is renowned for its automotive engineering program, which partners with many car companies."</span>),
    Document(page_content=<span class="hljs-string">"The F-150 comes with several engine options, including a powerful 3.5L EcoBoost V6."</span>),
    Document(page_content=<span class="hljs-string">"Mary Barra, the CEO of General Motors, is a major competitor to Ford and its F-150 line."</span>)
]
query_s1 = <span class="hljs-string">"Which university did the CEO of the company that makes the F-150 attend?"</span>
rag_chain_s1 = create_rag_chain(docs_s1) <span class="hljs-comment"># This uses top_k=3</span>
print(<span class="hljs-string">f"Naive RAG Answer: <span class="hljs-subst">{rag_chain_s1.invoke(query_s1)}</span>"</span>)
<span class="hljs-comment">#</span>
<span class="hljs-comment"># GraphRAG Pattern</span>
graph_s1 = KnowledgeGraph()
edges_s1 = [
    (<span class="hljs-string">"F-150"</span>, <span class="hljs-string">"Ford Motor Company"</span>, {<span class="hljs-string">"label"</span>: <span class="hljs-string">"made_by"</span>}),
    (<span class="hljs-string">"Ford Motor Company"</span>, <span class="hljs-string">"Jim Farley"</span>, {<span class="hljs-string">"label"</span>: <span class="hljs-string">"ceo"</span>}),
    (<span class="hljs-string">"Jim Farley"</span>, <span class="hljs-string">"Georgetown University"</span>, {<span class="hljs-string">"label"</span>: <span class="hljs-string">"attended"</span>}),
]
graph_s1.add_data(edges=edges_s1)
print(<span class="hljs-string">f"GraphRAG Answer: <span class="hljs-subst">{graph_s1.query_multi_hop_path(<span class="hljs-string">'F-150'</span>, <span class="hljs-string">'Georgetown University'</span>)}</span>"</span>)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">Naive RAG Answer: I don't have enough information from the context.
GraphRAG Answer: Jim Farley attended Georgetown University.
</code></pre>
<h3 id="heading-pattern-2-the-causal-synthesis-failure">Pattern 2: The Causal Synthesis Failure</h3>
<p>This is the failure to move from retrieval to synthesis. RAG lists facts but can't combine them to form a new conclusion.</p>
<ul>
<li><p><strong>Query:</strong> "Is John safe to undergo surgery while on aspirin?"</p>
</li>
<li><p><strong>Problem:</strong> RAG will retrieve "John takes aspirin," "Aspirin is a blood thinner," and "Blood thinners increase surgery risk." But it will fail to synthesize these into a direct "No, it's not safe" answer.</p>
</li>
</ul>
<h4 id="heading-why-the-naive-rag-fails-1">Why the Naïve RAG Fails</h4>
<p>The retriever searches for chunks that are semantically similar to the query: "John," "safe," "surgery," and "aspirin." In a real document base, it's highly likely to retrieve distracting, topically-related "noise" chunks.</p>
<p>In our example, the <code>top-k=3</code> chunks it retrieves might be:</p>
<ol>
<li><p>"John is currently taking daily low-dose aspirin." (Relevant: "John," "aspirin")</p>
</li>
<li><p>"Pre-surgery safety checks are standard procedure..." (Relevant: "surgery safety")</p>
</li>
<li><p>"John is otherwise in good health and is cleared for the procedure..." (Relevant: "John," "safe," "procedure")</p>
</li>
</ol>
<p>The key causal link ("Aspirin... is considered a blood thinner") is semantically less similar to the <em>full query</em> and gets pushed out of the <code>top-k=3</code> context. The LLM is then given incomplete information. It sees "John takes aspirin" and "John is cleared," so it provides a weak, hedged answer and cannot make the correct logical leap.</p>
<h4 id="heading-why-the-graphrag-succeeds-1">Why the GraphRAG Succeeds</h4>
<p>This approach doesn't use semantic search. It uses explicit logical rules (which could be backed by a causal graph). The <code>query_causal_chain</code> function is not searching for text – it's executing a pre-defined chain of logic:</p>
<ol>
<li><p><em>Fact:</em> Does John take aspirin? Yes.</p>
</li>
<li><p><em>Fact:</em> Is aspirin a blood thinner? Yes.</p>
</li>
<li><p><em>Fact:</em> Is a blood thinner a risk for surgery? Yes.</p>
</li>
<li><p><em>Conclusion:</em> Therefore, John is not safe.</p>
</li>
</ol>
<p>This deterministic, rule-based reasoning is immune to the "semantic noise" that distracts the naive RAG.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Naive RAG</span>
docs_s2 = [
    <span class="hljs-comment"># --- The 3 "Answer" Chunks ---</span>
    Document(page_content=<span class="hljs-string">"Aspirin reduces blood clotting and is considered a blood thinner."</span>),
    Document(page_content=<span class="hljs-string">"Patients on blood thinners have increased bleeding risk during surgery."</span>),
    Document(page_content=<span class="hljs-string">"John is currently taking daily low-dose aspirin."</span>),

    <span class="hljs-comment"># --- The 3 "Noise" Chunks (to distract the retriever) ---</span>
    Document(page_content=<span class="hljs-string">"John is otherwise in good health and is cleared for the procedure by his cardiologist."</span>),
    Document(page_content=<span class="hljs-string">"Pre-surgery safety checks are standard procedure and usually focus on anesthesia allergies."</span>),
    Document(page_content=<span class="hljs-string">"Aspirin is also commonly used to relieve minor aches and pains, but this is not why John takes it."</span>)
]
query_s2 = <span class="hljs-string">"Is John safe to undergo surgery while on aspirin?"</span>
rag_chain_s2 = create_rag_chain (docs_s2)
print(<span class="hljs-string">f"Naive RAG Answer: <span class="hljs-subst">{rag_chain_s2.invoke(query_s2)}</span>"</span>)

<span class="hljs-comment"># GraphRAG Pattern</span>
facts_s2 = {
    <span class="hljs-string">"aspirin"</span>: {<span class="hljs-string">"is_a"</span>: <span class="hljs-string">"blood thinner"</span>},
    <span class="hljs-string">"blood thinner"</span>: {<span class="hljs-string">"risk_for"</span>: <span class="hljs-string">"surgery"</span>},
    <span class="hljs-string">"John"</span>: {<span class="hljs-string">"takes"</span>: <span class="hljs-string">"aspirin"</span>},
}
print(<span class="hljs-string">f"GraphRAG Answer: <span class="hljs-subst">{query_causal_chain(facts_s2)}</span>"</span>)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">Naive RAG Answer: Based on the context, John is currently taking daily low-dose aspirin...
GraphRAG Answer: John is NOT safe due to increased bleeding risk from aspirin, a blood thinner.
</code></pre>
<h3 id="heading-pattern-3-the-entity-ambiguity-trap">Pattern 3: The Entity Ambiguity Trap</h3>
<p>Vector search struggles with polysemy (words with multiple meanings). It relies on local semantic context, which can easily be confused.</p>
<ul>
<li><p><strong>Query:</strong> "When did Apple release its first product?"</p>
</li>
<li><p><strong>Problem:</strong> The query "Apple" might retrieve documents for both Apple (company) and Apple (fruit), confusing the LLM.</p>
</li>
</ul>
<h4 id="heading-why-the-naive-rag-fails-2">Why the Naïve RAG Fails</h4>
<p>The query "When did Apple release its first product?" is semantically ambiguous. The vector retriever, which looks for <em>semantic closeness</em>, will be strongly attracted to the "noise" chunks we added about the fruit.</p>
<p>The <code>top-k=3</code> chunks it retrieves will likely be:</p>
<ol>
<li><p>"The 'Cosmic Crisp' is a new <strong>apple product</strong>... <strong>first released</strong>..." (Extremely high semantic similarity to "Apple releases its first product").</p>
</li>
<li><p>"The Granny Smith <strong>apple</strong>... is a popular <strong>product</strong>..."</p>
</li>
<li><p>"Many <strong>apple</strong> orchards <strong>release</strong> their new harvest..."</p>
</li>
</ol>
<p>The <em>correct</em> chunk ("The Apple I was introduced by Apple Inc...") is about a "company" and a specific "product" name. It might be semantically <em>less</em> similar to the general query than the "Cosmic Crisp" chunk. The LLM is then handed a context exclusively about fruits and confidently (but incorrectly) answers about the "Cosmic Crisp" apple.</p>
<h4 id="heading-why-the-graphrag-succeeds-2">Why the GraphRAG Succeeds</h4>
<p>The graph approach is immune to this ambiguity. The <code>query_disambiguated</code> function is <em>not</em> just searching for "Apple." It is explicitly looking for a node that matches two criteria: <code>name='Apple'</code> AND <code>type='company'</code>.</p>
<p>This query structurally guarantees that it finds the <code>Apple Inc.</code> node and ignores the <code>apple (fruit)</code> node, regardless of semantic similarity. It then reliably retrieves the <code>first_product</code> attribute from the correct node.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Naive RAG</span>
docs_s3 = [
    <span class="hljs-comment"># --- The "Answer" Chunks ---</span>
    Document(page_content=<span class="hljs-string">"The Apple was introduced by Apple Inc. in 1976."</span>),
    Document(page_content=<span class="hljs-string">"Apple Inc. is a technology company based in Cupertino."</span>),

    <span class="hljs-comment"># --- "Noise" Chunks (to create ambiguity) ---</span>
    Document(page_content=<span class="hljs-string">"The 'Cosmic Crisp' is a new apple product developed by Washington State University, first released to consumers in 2019."</span>),
    Document(page_content=<span class="hljs-string">"Apples (the fruit) were first cultivated in Central Asia thousands of years ago."</span>),
    Document(page_content=<span class="hljs-string">"The Granny Smith apple, first discovered in Australia, is a popular product for baking."</span>),
    Document(page_content=<span class="hljs-string">"Many apple orchards release their new harvest in the fall."</span>)
]
query_s3 = <span class="hljs-string">"When did Apple release its first product?"</span>
rag_chain_s3 = create_rag_chain(docs_s3)
print(<span class="hljs-string">f"Naive RAG Answer: <span class="hljs-subst">{rag_chain_s3.invoke(query_s3)}</span>"</span>)

<span class="hljs-comment"># GraphRAG Pattern</span>
graph_s3 = KnowledgeGraph()
nodes_s3 = [
    (<span class="hljs-string">"Apple Inc."</span>, {<span class="hljs-string">"type"</span>: <span class="hljs-string">"company"</span>, <span class="hljs-string">"first_product"</span>: <span class="hljs-string">"Apple I"</span>, <span class="hljs-string">"year"</span>: <span class="hljs-number">1976</span>}),
    (<span class="hljs-string">"apple"</span>, {<span class="hljs-string">"type"</span>: <span class="hljs-string">"fruit"</span>, <span class="hljs-string">"origin"</span>: <span class="hljs-string">"Central Asia"</span>}),
]
graph_s3.add_data(nodes=nodes_s3)
print(<span class="hljs-string">f"GraphRAG Answer: <span class="hljs-subst">{graph_s3.query_disambiguated(<span class="hljs-string">'Apple'</span>, <span class="hljs-string">'company'</span>, <span class="hljs-string">'first_product'</span>)}</span>"</span>)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-python">Naive RAG Answer: The <span class="hljs-string">'Cosmic Crisp'</span>, a new apple product, was first released to consumers <span class="hljs-keyword">in</span> <span class="hljs-number">2019.</span>
GraphRAG Answer: Apple Inc.<span class="hljs-string">'s first product was the Apple I in 1976.</span>
</code></pre>
<h3 id="heading-pattern-4-the-contradictory-information-failure">Pattern 4: The Contradictory Information Failure</h3>
<p>RAG is blind to knowledge conflicts. If it retrieves two or more contradictory facts, it can't resolve them using metadata like dates or source credibility. It will hedge, merge them into a false statement, or present all of them.</p>
<ul>
<li><p><strong>Query:</strong> "Who is the CEO of Twitter?"</p>
</li>
<li><p><strong>Problem:</strong> The retriever finds one chunk saying "Parag Agrawal (2022)" and another saying "Elon Musk (2023)". It may also find other related, confusing information. The LLM has no way to know which fact is the most current and authoritative.</p>
</li>
</ul>
<h4 id="heading-why-the-naive-rag-fails-3">Why the Naïve RAG Fails</h4>
<p>The query "Who is the CEO of Twitter?" is semantically similar to <em>all</em> documents containing the words "CEO" and "Twitter." In a real-world, evolving knowledge base, this is a recipe for disaster.</p>
<p>The <code>top-k=3</code> chunks our retriever finds will be a mess of contradictions:</p>
<ol>
<li><p>"In 2023, Elon Musk became the CEO of Twitter." (Correct, but old)</p>
</li>
<li><p>"In 2022, Parag Agrawal was the CEO of Twitter." (Old)</p>
</li>
<li><p>"Linda Yaccarino is the current CEO of X (formerly Twitter)..." (Also correct, but a different person/role).</p>
</li>
</ol>
<p>The LLM is handed three different, conflicting names for "CEO of Twitter" from different time periods. Because it is instructed to answer <em>only</em> from the context and has no mechanism to identify which fact is the most recent, it cannot give a single, confident answer. It’s forced to list the conflicts it found.</p>
<h4 id="heading-why-the-graphrag-succeeds-3">Why the GraphRAG Succeeds</h4>
<p>The knowledge graph is built for this. We've stored the "CEO" relationship as an <strong>edge with metadata</strong>, specifically a <code>year</code> attribute.</p>
<p>Our <code>query_with_conflict_resolution</code> function doesn't just find all CEO-related edges. It programmatically:</p>
<ol>
<li><p>Finds all nodes connected to "Twitter" by a <code>ceo</code> label.</p>
</li>
<li><p>Extracts the <code>year</code> from each of those edges.</p>
</li>
<li><p><strong>Sorts the candidates by year</strong> in descending order.</p>
</li>
<li><p>Returns only the top result.</p>
</li>
</ol>
<p>This provides a deterministic, programmatic way to resolve conflicts and always provide the most current fact based on the explicit timestamps in our graph.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Naive RAG</span>
docs_s4 = [
    <span class="hljs-comment"># --- The "Answer" Chunks (conflicting) ---</span>
    Document(page_content=<span class="hljs-string">"In 2022, Parag Agrawal was the CEO of Twitter."</span>),
    Document(page_content=<span class="hljs-string">"In 2023, Elon Musk became the CEO of Twitter."</span>),

    <span class="hljs-comment"># --- "Noise" Chunks (to add more conflict/confusion) ---</span>
    Document(page_content=<span class="hljs-string">"Linda Yaccarino is the current CEO of X (formerly Twitter), overseeing business operations."</span>),
    Document(page_content=<span class="hljs-string">"Jack Dorsey, a co-founder and former CEO of Twitter, is now focused on his company Block."</span>),
    Document(page_content=<span class="hljs-string">"CEOs of major tech companies, including Twitter's, have recently testified before Congress."</span>)
]
query_s4 = <span class="hljs-string">"Who is the CEO of Twitter?"</span>
rag_chain_s4 = create_rag_chain(docs_s4)
print(<span class="hljs-string">f"Naive RAG Answer: <span class="hljs-subst">{rag_chain_s4.invoke(query_s4)}</span>"</span>)

<span class="hljs-comment">#GraphRAG Pattern</span>
graph_s4 = KnowledgeGraph()
edges_s4 = [
    (<span class="hljs-string">"Twitter"</span>, <span class="hljs-string">"Parag Agrawal"</span>, {<span class="hljs-string">"label"</span>: <span class="hljs-string">"ceo"</span>, <span class="hljs-string">"year"</span>: <span class="hljs-number">2022</span>}),
    (<span class="hljs-string">"Twitter"</span>, <span class="hljs-string">"Elon Musk"</span>, {<span class="hljs-string">"label"</span>: <span class="hljs-string">"ceo"</span>, <span class="hljs-string">"year"</span>: <span class="hljs-number">2023</span>}),
]
graph_s4.add_data(edges=edges_s4)
print(<span class="hljs-string">f"GraphRAG Answer: <span class="hljs-subst">{graph_s4.query_with_conflict_resolution(<span class="hljs-string">'Twitter'</span>, <span class="hljs-string">'ceo'</span>, <span class="hljs-string">'year'</span>)}</span>"</span>)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-python">Naive RAG Answer: According to the context, <span class="hljs-keyword">in</span> <span class="hljs-number">2022</span>, Parag Agrawal was the CEO of Twitter. In <span class="hljs-number">2023</span>, Elon Musk became the CEO... Linda Yaccarino <span class="hljs-keyword">is</span> the current CEO of X (formerly Twitter)...
GraphRAG Answer: Elon Musk (<span class="hljs-keyword">as</span> of <span class="hljs-number">2023</span>)
</code></pre>
<h3 id="heading-pattern-5-the-implicit-relationship-hallucination">Pattern 5: The Implicit Relationship Hallucination</h3>
<p>RAG relies on implicit semantic closeness, which can be dangerous. If "Tesla," "Toyota," and "Panasonic" all appear near the word "battery" in the vector space, the LLM might hallucinate a relationship that doesn't exist.</p>
<ul>
<li><p><strong>Query:</strong> "Who did Tesla partner with on batteries?"</p>
</li>
<li><p><strong>Problem:</strong> The query is semantically "close" to any document mentioning "Tesla," "partner," and "batteries." The retriever will fetch chunks based on this closeness, even if they don't explicitly state a partnership, leading the LLM to infer one.</p>
</li>
</ul>
<h4 id="heading-why-the-naive-rag-fails-4">Why the Naïve RAG Fails</h4>
<p>The vector retriever will look for chunks that "sound" like the query. In our expanded document list, it's highly likely to retrieve a confusing context for the LLM.</p>
<p>The <code>top-k=3</code> chunks it finds will likely be:</p>
<ol>
<li><p>"Panasonic has a long-standing partnership to manufacture batteries..." (Relevant: "Panasonic," "partnership," "batteries")</p>
</li>
<li><p>"Tesla develops electric vehicles and relies on advanced battery tech..." (Relevant: "Tesla," "battery")</p>
</li>
<li><p>"Toyota also manufactures batteries and has discussed battery technology..." (Relevant: "Toyota," "manufactures batteries")</p>
</li>
</ol>
<p>When the LLM receives this context, it has "Panasonic," "Tesla," and "Toyota" all in a "battery" context. The chunk for Panasonic doesn't explicitly link it to Tesla. The chunk for Toyota also mentions batteries. The LLM, forced to synthesize an answer, may <em>incorrectly</em> infer a partnership that doesn't exist (like with Toyota) or state the facts without confirming the relationship.</p>
<h4 id="heading-why-the-graphrag-succeeds-4">Why the GraphRAG Succeeds</h4>
<p>The knowledge graph isn’t vulnerable to this kind of "semantic bleed-over." It doesn’t care if nodes are "semantically near" each other.</p>
<p>Our <code>query_explicit_relation</code> function asks a very specific, structural question: "Start at the node <strong>'Tesla'</strong> and return <em>only</em> the nodes connected to it by an edge with the <em>exact label</em> <strong>'partners_with'</strong>".</p>
<p>The graph then traverses its edges and finds only one: <code>("Tesla", "Panasonic", {"label": "partners_with"})</code>. It is structurally impossible for it to hallucinate a partnership with "Toyota" because no such <code>partners_with</code> edge exists for Tesla in the graph.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Naive RAG</span>
docs_s5 = [
    <span class="hljs-comment"># --- The "Answer" Chunks (ambiguous) ---</span>
    Document(page_content=<span class="hljs-string">"Tesla develops electric vehicles and relies on advanced battery tech."</span>),
    Document(page_content=<span class="hljs-string">"Panasonic has a long-standing partnership to manufacture batteries for electric vehicles."</span>),

    <span class="hljs-comment"># --- "Noise" Chunks (to create a false signal) ---</span>
    Document(page_content=<span class="hljs-string">"Toyota also manufactures batteries and hybrid powertrains for its own vehicle lineup."</span>),
    Document(page_content=<span class="hljs-string">"Tesla, Panasonic, and Toyota are all major players in the EV and battery supply chain."</span>),
    Document(page_content=<span class="hljs-string">"A new partnership for solid-state batteries was announced, but it did not involve Tesla."</span>)
]
query_s5 = <span class="hljs-string">"Who did Tesla partner with on batteries?"</span>
rag_chain_s5 = create_rag_chain(docs_s5)
print(<span class="hljs-string">f"Naive RAG Answer: <span class="hljs-subst">{rag_chain_s5.invoke(query_s5)}</span>"</span>)
<span class="hljs-comment">#</span>
<span class="hljs-comment"># GraphRAG Pattern</span>
graph_s5 = KnowledgeGraph()
edges_s5 = [
    (<span class="hljs-string">"Tesla"</span>, <span class="hljs-string">"Panasonic"</span>, {<span class="hljs-string">"label"</span>: <span class="hljs-string">"partners_with"</span>}),
    (<span class="hljs-string">"Toyota"</span>, <span class="hljs-string">"Toyota"</span>, {<span class="hljs-string">"label"</span>: <span class="hljs-string">"partners_with"</span>}),
]
graph_s5.add_data(edges=edges_s5)
print(<span class="hljs-string">f"GraphRAG Answer: <span class="hljs-subst">{graph_s5.query_explicit_relation(<span class="hljs-string">'Tesla'</span>, <span class="hljs-string">'partners_with'</span>)}</span>"</span>)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-python">Naive RAG Answer: Based on the context, Panasonic has a partnership to manufacture batteries, <span class="hljs-keyword">and</span> Tesla relies on advanced battery tech. Toyota also manufactures batteries.
GraphRAG Answer: Tesla partnered <span class="hljs-keyword">with</span> Panasonic.
</code></pre>
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>Standard RAG is an essential tool, but its strength is <strong>retrieval, not reasoning</strong>. It falters when true synthesis is required.</p>
<p>You may find that a powerful LLM like Gemini can still correctly answer some of the simple scenarios in this article. The five patterns shown here are meant to build intuition. They demonstrate what <em>can</em> and <em>does</em> go wrong as your knowledge base grows larger and more complex.</p>
<p>The real failure of naive RAG emerges as you feed it more and more conflicting, ambiguous, or incomplete information. This "noisy" context forces the LLM to either hallucinate connections or fail to reason altogether.</p>
<p>By moving from a "bag of chunks" to a structured Knowledge Graph, you build a more reliable and intelligent system. You give your system a "global memory" of how facts explicitly connect, allowing it to answer complex questions by traversing a verified path rather than just guessing from a cloud of semantically similar text.</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
