<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Bhavishya Pandit - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Bhavishya Pandit - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Thu, 21 May 2026 10:20:44 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/author/bhav09/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ Which Tools to Use for LLM-Powered Applications: LangChain vs LlamaIndex vs NIM ]]>
                </title>
                <description>
                    <![CDATA[ If you’re considering building an application powered by a Large Language Model, you may wonder which tool to use. Well, two well-established frameworks—LangChain and LlamaIndex—have gained significant attention for their unique features and capabili... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/llm-powered-apps-langchain-vs-llamaindex-vs-nim/</link>
                <guid isPermaLink="false">6716909d6cc6de90a6dad8be</guid>
                
                    <category>
                        <![CDATA[ LLM&#39;s  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ large language models ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Bhavishya Pandit ]]>
                </dc:creator>
                <pubDate>Mon, 21 Oct 2024 17:34:21 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1729527716896/58932669-914c-4380-88c8-33ffbad99b5f.webp" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>If you’re considering building an application powered by a Large Language Model, you may wonder which tool to use.</p>
<p>Well, two well-established frameworks—LangChain and LlamaIndex—have gained significant attention for their unique features and capabilities. But recently, NVIDIA NIM has emerged as a new player in the field, adding its tools and functionalities to the mix.</p>
<p>In this article, we'll compare LangChain, LlamaIndex, and NVIDIA NIM to help you determine which framework best fits your specific use case.</p>
<h2 id="heading-table-of-contents"><strong>Table of Contents:</strong></h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-introduction-to-langchain">Introduction to LangChain</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-introduction-to-llamaindex">Introduction to LlamaIndex</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-nvidia-nim">NVIDIA NIM</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-which-tool-to-use">Which Tool to Use</a>?</p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-introduction-to-langchain"><strong>Introduction to LangChain</strong></h2>
<p>According to LangChain’s official docs, “LangChain is a framework for developing applications powered by language models”.</p>
<p>We can elaborate a bit on that and say that LangChain is a versatile framework designed for building data-aware and agent-driven applications. It offers a collection of components and pre-built chains that simplify working with large language models (LLMs) like GPT.</p>
<p>Whether you're just starting or you’re an experienced developer, LangChain supports both quick prototyping and full-scale production applications.</p>
<p>You can use LangChain to simplify the entire development cycle of an LLM application:</p>
<ul>
<li><p><strong>Development:</strong> LangChain offers open-source building blocks, components and third-party integrations for building applications.</p>
</li>
<li><p><strong>Production:</strong> LangSmith, a tool from LangChain, helps monitor and evaluate chains for continuous optimization and deployment.</p>
</li>
</ul>
<ul>
<li><strong>Deployment:</strong> You can use LangGraph Cloud to turn your LLM applications into production-ready APIs.</li>
</ul>
<p>LangChain offers several open-source libraries for development and production purposes. Let’s take a look at some of them.</p>
<h3 id="heading-langchain-components"><strong>LangChain Components</strong></h3>
<p>LangChain Components are high-level APIs that simplify working with LLMs. You can compare them with Hooks in React and functions in Python.</p>
<p>These components are designed to be intuitive and easy to use. A key component is the LLM interface, which seamlessly connects to providers like OpenAI, Cohere, and Hugging Face, allowing you to effortlessly query models.</p>
<p>In this example, we utilize the langchain_google_vertexai library to interact with Google’s Vertex AI, specifically leveraging the <strong>Gemini 1.5 Flash</strong> model. Let’s break down what the code does:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain_google_vertexai <span class="hljs-keyword">import</span> ChatVertexAI

llm = ChatVertexAI(model=<span class="hljs-string">"gemini-1.5-flash"</span>)
llm.invoke(
  <span class="hljs-string">"Who was Napoleon?"</span>
)
</code></pre>
<p><strong>Importing the ChatVertexAI Class</strong>:</p>
<p>The first step is to import the ChatVertexAI class, which allows us to communicate with the <strong>Google Vertex AI</strong> platform. This library is part of the LangChain ecosystem, designed to integrate large language models (LLMs) seamlessly into applications.</p>
<p><strong>Instantiating the LLM (Large Language Model)</strong>:</p>
<pre><code class="lang-python">llm = ChatVertexAI(model=<span class="hljs-string">"gemini-1.5-flash"</span>)
</code></pre>
<p>Here, we create an instance of the ChatVertexAI class. We specify the model we want to use, which in this case is <strong>Gemini 1.5 Flash</strong>. This version of Gemini is optimized for fast responses while still maintaining high-quality language generation.</p>
<p><strong>Sending a Query to the Model</strong>:</p>
<pre><code class="lang-python">llm.invoke(<span class="hljs-string">"Who was Napoleon?"</span>)
</code></pre>
<p>Finally, we use the invoke method to send a question to the Gemini model. In this example, we ask the question, <strong>“Who was Napoleon?”</strong>. The model processes the query and provides a response, which would typically include information about Napoleon’s identity, historical significance, and key accomplishments.</p>
<h3 id="heading-chains"><strong>Chains</strong></h3>
<p>LangChain defines Chains as “sequences of calls”. To understand how chains work, we need to know what LCEL is.</p>
<p>LCEL stands for LangChain Expression Language, which is a declarative way to easily compose chains together – that’s it. LCEL just helps us multiple combine chains in long chains.  </p>
<p>LangChain supports two types of chains</p>
<ol>
<li><p>LCEL Chains: In this case, LangChain offers a higher-level constructor method. But all that is being done under the hood is constructing a chain with LCEL.  </p>
<p> For example, <a target="_blank" href="https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html#langchain.chains.combine_documents.stuff.create_stuff_documents_chain">create_stuff_documents_chain</a> is an LCEL Chain that takes a list of documents and formats them all into a prompt, then passes that prompt to an LLM. It passes ALL documents, so you should make sure it fits within the context window of the LLM you are using.</p>
</li>
<li><p>Legacy Chains: Legacy Chains are constructed by subclassing from a legacy <em>Chain</em> class. These chains do not use LCEL under the hood but are the standalone classes.  </p>
<p> For example, <a target="_blank" href="https://api.python.langchain.com/en/latest/chains/langchain.chains.api.base.APIChain.html#langchain.chains.api.base.APIChain">APIChain</a>: this chain uses an LLM to convert a query into an API request, then executes that request, gets back a response, and then passes that request to an LLM to respond.</p>
</li>
</ol>
<p>Legacy Chains were standard practices before LCEL. Once all the legacy chains get an LCEL alternative, they will become obsolete and unsupported.</p>
<h3 id="heading-langchain-quickstart"><strong>LangChain Quickstart</strong></h3>
<pre><code class="lang-python">!pip install -U langchain-google-genai

%env GOOGLE_API_KEY=<span class="hljs-string">"your-api-key"</span>

<span class="hljs-keyword">from</span> langchain_google_genai <span class="hljs-keyword">import</span> ChatGoogleGenerativeAI
</code></pre>
<h4 id="heading-1-using-langchain-with-googles-gemini-pro-model"><strong>1. Using LangChain with Google's Gemini Pro Model</strong></h4>
<p>This code demonstrates how to integrate Google’s Gemini Pro model with LangChain for natural language processing tasks.</p>
<pre><code class="lang-python">pip install -U langchain-google-genai
</code></pre>
<p>First, install the langchain-google-genai package, which allows you to interact with Google’s Generative AI models via LangChain. The -U flag ensures you get the latest version.</p>
<h4 id="heading-2-setting-up-your-api-key"><strong>2. Setting Up Your API Key</strong></h4>
<pre><code class="lang-python">%env GOOGLE_API_KEY=<span class="hljs-string">"your-api-key"</span>
</code></pre>
<p>You need to authenticate your API requests. Use your Google API key by setting it as an environment variable. This ensures secure communication with Google’s services.</p>
<h4 id="heading-3-accessing-the-gemini-pro-model"><strong>3. Accessing the Gemini Pro Model</strong></h4>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain_google_genai <span class="hljs-keyword">import</span> ChatGoogleGenerativeAI
</code></pre>
<p>The ChatGoogleGenerativeAI class is imported from the langchain-google-genai package. We instantiate it, specifying <strong>Gemini Pro</strong>—a powerful version of Google’s generative models known for producing high-quality language outputs.</p>
<h4 id="heading-4-querying-the-model"><strong>4. Querying the Model</strong></h4>
<pre><code class="lang-python">llm = ChatGoogleGenerativeAI(model=<span class="hljs-string">"gemini-pro"</span>)
llm.invoke(<span class="hljs-string">"Who was Alexander the Great?"</span>)
</code></pre>
<p>Finally, you invoke the model by passing a query. In this example, the query is asking for information about <strong>Alexander the Great</strong>. The model will return a detailed response, such as his historical background and significance.</p>
<h2 id="heading-introduction-to-llamaindex"><strong>Introduction to LlamaIndex</strong></h2>
<p>LlamaIndex, previously known as GPT Index, is a data framework tailored for large language model (LLM) applications. Its core purpose is to ingest, organize, and access private or domain-specific data, offering a suite of tools that simplify the integration of such data into LLMs.</p>
<p>Simply put, LLMs are very strong models but they don't work as well when used with smaller data bundles. LlamaIndex helps us in integrating our data into LLMS to serve specific needs.</p>
<p>LlamaIndex works using several components together. Let's take a look at them one by one. </p>
<h3 id="heading-data-connectors"><strong>Data Connectors</strong></h3>
<p>LlamaIndex supports ingesting data from multiple sources, such as APIs, PDFs, and SQL databases. These connectors streamline the process of integrating external data into LLM-based applications.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> llama_index.core <span class="hljs-keyword">import</span> VectorStoreIndex, download_loader

<span class="hljs-keyword">from</span> llama_index.readers.google <span class="hljs-keyword">import</span> GoogleDocsReader

gdoc_ids = [<span class="hljs-string">"your-google_doc-id"</span>]
loader = GoogleDocsReader()

documents = loader.load_data(document_ids=gdoc_ids)
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()
query_engine.query(<span class="hljs-string">"Where did the author go to school?"</span>)
</code></pre>
<p>This code uses LlamaIndex to load and query Google Docs. It imports necessary classes, specifies Google Doc IDs, and loads the document content using GoogleDocsReader. The content is then indexed as vectors with VectorStoreIndex, allowing for efficient querying. Finally, it creates a query engine to retrieve answers from the indexed documents based on natural language questions, such as "Where did the author go to school?"</p>
<h3 id="heading-data-indexing"><strong>Data Indexing</strong></h3>
<p>The framework organizes ingested data into intermediate formats designed to optimize how LLMs access and process information, ensuring both efficiency and performance.</p>
<p>Indexes are built from documents. They are used to build Query Engines and Chat Engines which enable question &amp; answer and chat over the data. In LlamaIndex indexes store data in <strong>Node</strong> objects. According to the docs:</p>
<blockquote>
<p>“A Node corresponds to a chunk of text from a Document. LlamaIndex takes in Document objects and internally parses/chunks them into Node objects.” (<a target="_blank" href="https://docs.llamaindex.ai/en/stable/module_guides/indexing/index_guide/">Source</a>)</p>
</blockquote>
<h3 id="heading-engines"><strong>Engines</strong></h3>
<p>LlamaIndex includes various engines for interacting with the data via natural language. These include engines for querying knowledge, facilitating conversational interactions, and data agents that enhance LLM-powered workflows.</p>
<h3 id="heading-advantages-of-llamaindex"><strong>Advantages of LlamaIndex</strong></h3>
<ul>
<li><p>Makes it easy to bring in data from different sources, such as APIs, PDFs, and databases like SQL/NoSQL, to be used in applications powered by large language models (LLMs).</p>
</li>
<li><p>Lets you store and organize private data, making it ready for different uses, while smoothly working with vector stores and databases.</p>
</li>
<li><p>Comes with a built-in query feature that allows you to get smart, data-driven answers based on your input.</p>
</li>
</ul>
<h3 id="heading-llamaindex-quickstart"><strong>LlamaIndex Quickstart</strong></h3>
<p>In this section, you’ll learn how to use <strong>LlamaIndex</strong> to create a queryable index from a collection of documents and interact with OpenAI’s API for querying purposes. This is the code to do this:</p>
<pre><code class="lang-python">pip install llama-index
%env OPENAI_API_KEY = <span class="hljs-string">"your-api-key"</span>

<span class="hljs-keyword">from</span> llama_index.core <span class="hljs-keyword">import</span> VectorStoreIndex, SimpleDirectoryReader
</code></pre>
<p>Now let’s break it down step by step:</p>
<h4 id="heading-1-install-the-llamaindex-package">1. <strong>Install the LlamaIndex Package</strong></h4>
<pre><code class="lang-python">pip install llama-index
</code></pre>
<p>You start by installing the llama-index package, which provides tools for building vector-based document indices that allow for natural language queries.</p>
<h4 id="heading-2-set-the-openai-api-key"><strong>2. Set the OpenAI API Key</strong></h4>
<pre><code class="lang-python">%env OPENAI_API_KEY = <span class="hljs-string">"your-api-key"</span>
</code></pre>
<p>Here, the OpenAI API key is set as an environment variable to authenticate and allow communication with OpenAI’s API. Replace "your-api-key" with your actual API key.</p>
<h4 id="heading-3-importing-necessary-components"><strong>3. Importing Necessary Components</strong></h4>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> llama_index.core <span class="hljs-keyword">import</span> VectorStoreIndex, SimpleDirectoryReader
</code></pre>
<p>The VectorStoreIndex class is used to create an index that will store vectors representing the document contents, while the SimpleDirectoryReader class is used to read documents from a specified directory.</p>
<h4 id="heading-4-loading-documents"><strong>4. Loading Documents</strong></h4>
<pre><code class="lang-python">documents = SimpleDirectoryReader(<span class="hljs-string">"data"</span>).load_data()
</code></pre>
<p>The SimpleDirectoryReader loads documents from the directory named "data". The load_data() method reads the contents and processes them so they can be used to create the index.</p>
<h4 id="heading-5-creating-the-vector-store-index"><strong>5. Creating the Vector Store Index</strong></h4>
<pre><code class="lang-python">index = VectorStoreIndex.from_documents(documents)
</code></pre>
<p>A VectorStoreIndex is created from the documents. This index converts the text into vector embeddings that capture the semantic meaning of the text, making it easier for AI models to interpret and query.</p>
<h4 id="heading-6-building-the-query-engine"><strong>6. Building the Query Engine</strong></h4>
<pre><code class="lang-python">query_engine = index.as_query_engine()
</code></pre>
<p>The query engine is created by converting the vector store index into a format that can be queried. This engine is the component that allows you to run natural language queries against the document index.</p>
<h4 id="heading-7-querying-the-engine"><strong>7. Querying the Engine</strong></h4>
<pre><code class="lang-python">response = query_engine.query(<span class="hljs-string">"Who is the protagonist in the story?"</span>)
</code></pre>
<p>Here, a query is made to the index asking for the protagonist of the story. The query engine processes the request and uses the document embeddings to retrieve the most relevant information from the indexed documents.</p>
<h4 id="heading-8-displaying-the-response"><strong>8. Displaying the Response</strong></h4>
<p>Finally, the response from the query engine, which contains the answer to the query, is printed.</p>
<p>Make sure your directory structure looks like this:</p>
<table><tbody><tr><td><p>|----- main.py<br>|----- data<br>&nbsp; &nbsp; &nbsp; &nbsp; |----- Matilda.txt</p></td></tr></tbody></table>

<h2 id="heading-nvidia-nim"><strong>NVIDIA NIM</strong></h2>
<p>Nvidia has recently launched their own set of tools for developing LLM applications called NIM. NIM stands for <strong>“Nvidia Inference Microservice”.</strong> </p>
<p>NVIDIA NIM is a collection of simple tools (microservices) that help quickly set up and run AI models on the cloud, in data centres, or on workstations.</p>
<p>NIMs are organized by model type. For instance, NVIDIA NIM for large language models (LLMs) makes it easy for businesses to use advanced language models for tasks like understanding and processing natural language.</p>
<h3 id="heading-how-nims-work"><strong>How NIMs Work</strong></h3>
<p>When you first set up a NIM, it checks your hardware and finds the best version of the model from its library to match your system.</p>
<p>If you have certain NVIDIA GPUs (listed in the Support Matrix), NIM will download and use an optimized version of the model with the TRT-LLM library for fast performance. For other NVIDIA GPUs, it will download a non-optimized model and run it using the vLLM library.  </p>
<p>So the main idea is to provide optimized AI models for faster local development and a cloud environment to host it for production.</p>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcFaoW3vnIQUTqz9mpYGkO7r2JqD2P7ZMg_W4GE0a9KL_Dfm7j9fBXYlWCMJsuAPJufoxQ9xmwxb6ori54o9SGR0IkTxr5SZluSNu4LILK_6WGkImb7_EwHcwTalFxaaZmFtd4Qe-5n7MDF8N8tLL2D0a52?key=Cq76nL_EGCQTxNOs8pe9wg" alt="Flow of starting a REST API server using Nvidia NIM." width="600" height="400" loading="lazy"></p>
<h3 id="heading-features-of-nim"><strong>Features of NIM</strong></h3>
<p>NIM simplifies the process of running AI models by handling technical details like execution engines and runtime operations for you. It’s also the fastest option, whether using TRT-LLM, vLLM, or other methods.</p>
<p>NIM offers the following high-performance features:</p>
<ul>
<li><p><strong>Scalable Deployment:</strong> It performs well and can easily grow from handling a few users to millions without issues.</p>
</li>
<li><p><strong>Advanced Language Model Support:</strong> NIM comes with pre-optimized engines for various of the latest language model designs.</p>
</li>
<li><p><strong>Flexible Integration:</strong> Adding NIM to your existing apps and workflows is easy. Developers can use an OpenAI API-compatible system with extra NVIDIA features for more capabilities.</p>
</li>
<li><p><strong>Enterprise-Grade Security:</strong> NIM prioritizes security by using safetensors, continuously monitoring for vulnerabilities (CVEs), and regularly applying security updates.</p>
</li>
</ul>
<h3 id="heading-nim-quickstart"><strong>NIM Quickstart</strong></h3>
<h4 id="heading-1-generate-an-ngc-api-key">1. Generate an NGC API key</h4>
<p>An NGC API key is required to access NGC resources and a key can be generated here: <a target="_blank" href="https://org.ngc.nvidia.com/setup/personal-keys">https://org.ngc.nvidia.com/setup/personal-keys</a>.</p>
<h4 id="heading-2-export-the-api-key">2. Export the API key</h4>
<pre><code class="lang-python">export NGC_API_KEY=&lt;value&gt;
</code></pre>
<h4 id="heading-3-docker-login-to-ngc">3. Docker login to NGC</h4>
<p>To pull the NIM container image from NGC, first authenticate with the NVIDIA Container Registry with the following command:</p>
<pre><code class="lang-python">echo <span class="hljs-string">"$NGC_API_KEY"</span> | docker login nvcr.io --username <span class="hljs-string">'$oauthtoken'</span> --password-stdin
</code></pre>
<h4 id="heading-4-list-available-nims">4. List available NIMs</h4>
<pre><code class="lang-python">ngc registry image list --format_type csv nvcr.io/nim/*
</code></pre>
<h4 id="heading-5-launch-nim">5. Launch NIM</h4>
<p>The following command launches a Docker container for the llama3-8b-instruct model. To launch a container for a different NIM, replace the values of Repository and Latest_Tag with values from the previous image list command and change the value of CONTAINER_NAME to something appropriate.</p>
<pre><code class="lang-dockerfile"><span class="hljs-comment"># Choose a container name for bookkeeping</span>
export CONTAINER_NAME=Llama3-<span class="hljs-number">8</span>B-Instruct

<span class="hljs-comment"># The container name from the previous ngc registgry image list command</span>
Repository=nim/meta/llama3-<span class="hljs-number">8</span>b-instruct
Latest_Tag=<span class="hljs-number">1.1</span>.<span class="hljs-number">2</span>

<span class="hljs-comment"># Choose a LLM NIM Image from NGC</span>
export IMG_NAME=<span class="hljs-string">"nvcr.io/${Repository}:${Latest_Tag}"</span>

<span class="hljs-comment"># Choose a path on your system to cache the downloaded models</span>
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p <span class="hljs-string">"$LOCAL_NIM_CACHE"</span>

<span class="hljs-comment"># Start the LLM NIM</span>
docker <span class="hljs-keyword">run</span><span class="bash"> -it --rm --name=<span class="hljs-variable">$CONTAINER_NAME</span> \
  --runtime=nvidia \
  --gpus all \
  --shm-size=16GB \
  -e NGC_API_KEY=<span class="hljs-variable">$NGC_API_KEY</span> \
  -v <span class="hljs-string">"<span class="hljs-variable">$LOCAL_NIM_CACHE</span>:/opt/nim/.cache"</span> \
  -u $(id -u) \
  -p 8000:8000 \
  <span class="hljs-variable">$IMG_NAME</span></span>
</code></pre>
<p>6. Usecase: OpenAI Completion Request</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> openai <span class="hljs-keyword">import</span> OpenAI

client = OpenAI(base_url=<span class="hljs-string">"http://0.0.0.0:8000/v1"</span>, api_key=<span class="hljs-string">"not-used"</span>)
prompt = <span class="hljs-string">"Once upon a time"</span>
response = client.completions.create(
    model=<span class="hljs-string">"meta/llama3-8b-instruct"</span>,
    prompt=prompt,
    max_tokens=<span class="hljs-number">16</span>,
    stream=<span class="hljs-literal">False</span>
)
completion = response.choices[<span class="hljs-number">0</span>].text
print(completion)
</code></pre>
<h2 id="heading-which-tool-to-use"><strong>Which Tool to Use?</strong></h2>
<p>So you may be wondering: which of these should you use for your specific use case? Well, the answer to this depends on what you’re working on.</p>
<p>LangChain is an excellent choice if you're looking for a versatile framework to integrate multiple tools or build intelligent agents that can handle several tasks simultaneously.</p>
<p>But if your main focus is smart search and data retrieval, LlamaIndex is the better option, as it specializes in indexing and retrieving information for LLMs, making it ideal for deep data exploration. While LangChain can manage indexing and retrieval, LlamaIndex is optimized for these tasks and offers easier data ingestion with its plugins and connectors.</p>
<p>On the other hand, if you're aiming for high-performance model deployment, NVIDIA NIM is a great solution. NIM abstracts the technical details, offers fast inference with tools like TRT-LLM and vLLM, and provides scalable deployment with enterprise-grade security.</p>
<p>So, for apps needing indexing and retrieval, LlamaIndex is recommended. For deploying LLMs at scale, NIM is a powerful choice. Otherwise, LangChain alone is sufficient for working with LLMs.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>In this article, we compared three powerful tools for working with large language models: LangChain, LlamaIndex, and NVIDIA NIM. We explored each tool’s unique strengths, such as LangChain's versatility for integrating multiple components, LlamaIndex's focus on efficient data indexing and retrieval, and NVIDIA NIM's high-performance model deployment capabilities.</p>
<p>We discussed key features like scalability, ease of integration, and optimized performance, showing how these tools address different needs within the AI ecosystem.</p>
<p>While each tool has its challenges, such as handling complex infrastructures or optimizing for specific tasks, LangChain, LlamaIndex, and NVIDIA NIM offer valuable solutions for building and scaling AI-powered applications.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a RAG Pipeline with LlamaIndex ]]>
                </title>
                <description>
                    <![CDATA[ Large Language Models are everywhere these days – think ChatGPT – but they have their fair share of challenges. One of the biggest challenges faced by LLMs is hallucination. This occurs when the model generates text that is factually incorrect or mis... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-rag-pipeline-with-llamaindex/</link>
                <guid isPermaLink="false">66d1c98990f244bf8b6cb9d3</guid>
                
                    <category>
                        <![CDATA[ RAG  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ llm ]]>
                    </category>
                
                    <category>
                        <![CDATA[ LlamaIndex ]]>
                    </category>
                
                    <category>
                        <![CDATA[ generative ai ]]>
                    </category>
                
                    <category>
                        <![CDATA[ IBM WatsonX ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Open Source ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ large language models ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Bhavishya Pandit ]]>
                </dc:creator>
                <pubDate>Fri, 30 Aug 2024 13:30:49 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1725024307257/62401eea-25ab-4f00-93d7-76d7c49cf330.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Large Language Models are everywhere these days – think ChatGPT – but they have their fair share of challenges.</p>
<p>One of the biggest challenges faced by LLMs is hallucination. This occurs when the model generates text that is factually incorrect or misleading, often based on patterns it has learned from its training data. So how can Retrieval-Augmented Generation, or RAG, help mitigate this issue?</p>
<p>By retrieving relevant information from a more vast, wider knowledge base, RAG ensures that the LLM's responses are grounded in real-world facts. This significantly reduces the likelihood of hallucinations and improves the overall accuracy and reliability of the generated content.</p>
<h2 id="heading-table-of-contents">Table of Contents:</h2>
<ol>
<li><p><a target="_blank" href="heading-what-is-retrieval-augmented-generation-rag">What is Retrieval Augmented Generation (RAG)?</a></p>
</li>
<li><p><a target="_blank" href="heading-understanding-the-components-of-a-rag-pipeline">Understanding the Components of a RAG Pipeline</a></p>
</li>
<li><p><a target="_blank" href="heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a target="_blank" href="heading-lets-get-started">Let's Get Started!</a></p>
</li>
<li><p><a target="_blank" href="heading-how-to-fine-tune-the-pipeline">How to Fine-Tune the Pipeline</a></p>
</li>
<li><p><a target="_blank" href="heading-real-world-applications-of-rag">Real-World Applications of RAG</a></p>
</li>
<li><p><a target="_blank" href="heading-rag-best-practices-and-considerations">RAG Best Practices and Considerations</a></p>
</li>
<li><p><a target="_blank" href="heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-what-is-retrieval-augmented-generation-rag">What is Retrieval Augmented Generation (RAG)?</h2>
<p>RAG is a technique that combines information retrieval with language generation. Think of it as a two-step process:</p>
<ol>
<li><p><strong>Retrieval:</strong> The model first retrieves relevant information from a large corpus of documents based on the user's query.</p>
</li>
<li><p><strong>Generation:</strong> Using this retrieved information, the model then generates a comprehensive and informative response.</p>
</li>
</ol>
<h3 id="heading-why-use-llamaindex-for-rag">Why use LlamaIndex for RAG?</h3>
<p>LlamaIndex is a powerful framework that simplifies the process of building RAG pipelines. It provides a flexible and efficient way to connect retrieval components (like vector databases and embedding models) with generation components (like LLMs).</p>
<p><strong>Some of the key benefits of using Llama-Index include:</strong></p>
<ul>
<li><p><strong>Modularity:</strong> It allows you to easily customize and experiment with different components.</p>
</li>
<li><p><strong>Scalability:</strong> It can handle large datasets and complex queries.</p>
</li>
<li><p><strong>Ease of use:</strong> It provides a high-level API that abstracts away much of the underlying complexity.</p>
</li>
</ul>
<h3 id="heading-what-youll-learn-here">What You'll Learn Here:</h3>
<p>In this article, we will delve deeper into the components of a RAG pipeline and explore how you can use LlamaIndex to build these systems.</p>
<p>We will cover topics such as vector databases, embedding models, language models, and the role of LlamaIndex in connecting these components.</p>
<h2 id="heading-understanding-the-components-of-a-rag-pipeline">Understanding the Components of a RAG Pipeline</h2>
<p>Here's a diagram that'll help familiarize you with the basics of RAG architecture:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724944925051/e525c6cb-6a99-4eec-8b47-3dc827ddff25.png" alt="RAG Architecture showing the flow from the user query through to the response" class="image--center mx-auto" width="1920" height="1080" loading="lazy"></p>
<p>This diagram is inspired by <a target="_blank" href="https://www.fivetran.com/blog/assembling-a-rag-architecture-using-fivetran">this article</a>. Let's go through the key pieces.</p>
<h3 id="heading-components-of-rag">Components of RAG</h3>
<p><strong>Retrieval Component:</strong></p>
<ul>
<li><p><strong>Vector Databases:</strong> These databases are optimized for storing and searching high-dimensional vectors. They are crucial for efficiently finding relevant information from a vast corpus of documents.</p>
</li>
<li><p><strong>Embedding Models:</strong> These models convert text into numerical representations or embeddings. These embeddings capture the semantic meaning of the text, allowing for efficient comparison and retrieval in vector databases.</p>
</li>
</ul>
<p>A vector is a mathematical object that represents a quantity with both magnitude (size) and direction. In the context of RAG, embeddings are high-dimensional vectors that capture the semantic meaning of text. Each dimension of the vector represents a different aspect of the text's meaning, allowing for efficient comparison and retrieval.</p>
<p><strong>Generation Component:</strong></p>
<ul>
<li><strong>Language Models:</strong> These models are trained on massive amounts of text data, enabling them to generate human-quality text. They are capable of understanding and responding to prompts in a coherent and informative manner.</li>
</ul>
<h3 id="heading-the-rag-flow">The RAG Flow</h3>
<ol>
<li><p><strong>Query Submission:</strong> A user submits a query or question.</p>
</li>
<li><p><strong>Embedding Creation:</strong> The query is converted into an embedding using the same embedding model used for the corpus.</p>
</li>
<li><p><strong>Retrieval:</strong> The embedding is searched against the vector database to find the most relevant documents.</p>
</li>
<li><p><strong>Contextualization:</strong> The retrieved documents are combined with the original query to form a context.</p>
</li>
<li><p><strong>Generation:</strong> The language model generates a response based on the provided context.</p>
</li>
</ol>
<h3 id="heading-lamaindex">LamaIndex</h3>
<p>LlamaIndex plays a crucial role in connecting the retrieval and generation components. It acts as an index that maps queries to relevant documents. By efficiently managing the index, LlamaIndex ensures that the retrieval process is fast and accurate.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>We will be using Python and <a target="_blank" href="https://www.ibm.com/products/watsonx-ai">IBM watsonx</a> via LlamaIndex in this article. You should have the following on your system before getting started:</p>
<ul>
<li><p>Python 3.9+</p>
</li>
<li><p><a target="_blank" href="https://dataplatform.cloud.ibm.com/docs/content/wsj/admin/admin-apikeys.html?context=wx">IBM watsonx project and API key</a></p>
</li>
<li><p>Curiosity to learn</p>
</li>
</ul>
<h2 id="heading-lets-get-started">Let's Get Started!</h2>
<p>In this article, we will be using LlamaIndex to make a simple RAG Pipeline.</p>
<p>Let's create a virtual environment for Python using the following command in your terminal: <code>python -m venv venv</code> . This will create a virtual environment (venv) for your project. If you are a Windows user you can activate it using <code>.\venv\Scripts\activate</code>, and Mac users can activate it with <code>source venv/bin/activate</code>.</p>
<p>Now let's install the packages:</p>
<pre><code class="lang-python">pip install wikipedia llama-index-llms-ibm llama-index-embeddings-huggingface
</code></pre>
<p>Once these packages are installed, you will need watsonx.ai's API key as well. This in turn will help you use LLMs via LlamaIndex.</p>
<p>To learn about how to get your watsonx.ai API keys, click <a target="_blank" href="https://cloud.ibm.com/docs/account?topic=account-userapikey&amp;interface=ui">here</a>. You need the project ID and API Key to be able to work on the "Generation" aspect of RAG. Having them will help you make LLM calls through watsonx.ai.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> wikipedia

<span class="hljs-comment"># Search for a specific page</span>
page = wikipedia.page(<span class="hljs-string">"Artificial Intelligence"</span>)

<span class="hljs-comment"># Access the content</span>
print(page.content)
</code></pre>
<p>Now let's save the page content to a text document. We are doing it so that we can access it later. You can do this using the below code:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os

<span class="hljs-comment"># Create the 'Document' directory if it doesn't exist</span>
<span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> os.path.exists(<span class="hljs-string">'Document'</span>):
    os.mkdir(<span class="hljs-string">'Document'</span>)

<span class="hljs-comment"># Open the file 'AI.txt' in write mode with UTF-8 encoding</span>
<span class="hljs-keyword">with</span> open(<span class="hljs-string">'Document/AI.txt'</span>, <span class="hljs-string">'w'</span>, encoding=<span class="hljs-string">'utf-8'</span>) <span class="hljs-keyword">as</span> f:
    <span class="hljs-comment"># Write the content of the 'page' object to the file</span>
    f.write(page.content)
</code></pre>
<p>Now we'll be using watsonx.ai via LlamaIndex. It will help us generate responses based on the user's query.</p>
<p>Note: Make sure to replace the parameters <code>WATSONX_APIKEY</code> and <code>project_id</code> with your values in the below code:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os
<span class="hljs-keyword">from</span> llama_index.llms.ibm <span class="hljs-keyword">import</span> WatsonxLLM
<span class="hljs-keyword">from</span> llama_index.core <span class="hljs-keyword">import</span> SimpleDirectoryReader, Document


<span class="hljs-comment"># Define a function to generate responses using the WatsonxLLM instance</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">generate_response</span>(<span class="hljs-params">prompt</span>):</span>
    <span class="hljs-string">"""
    Generates a response to the given prompt using the WatsonxLLM instance.

    Args:
        prompt (str): The prompt to provide to the large language model.

    Returns:
        str: The generated response from the WatsonxLLM.
    """</span>

    response = watsonx_llm.complete(prompt)
    <span class="hljs-keyword">return</span> response

<span class="hljs-comment"># Set the WATSONX_APIKEY environment variable (replace with your actual key)</span>
os.environ[<span class="hljs-string">"WATSONX_APIKEY"</span>] = <span class="hljs-string">'YOUR_WATSONX_APIKEY'</span>  <span class="hljs-comment"># Replace with your API key</span>

<span class="hljs-comment"># Define model parameters (adjust as needed)</span>
temperature = <span class="hljs-number">0</span>
max_new_tokens = <span class="hljs-number">1500</span>
additional_params = {
    <span class="hljs-string">"decoding_method"</span>: <span class="hljs-string">"sample"</span>,
    <span class="hljs-string">"min_new_tokens"</span>: <span class="hljs-number">1</span>,
    <span class="hljs-string">"top_k"</span>: <span class="hljs-number">50</span>,
    <span class="hljs-string">"top_p"</span>: <span class="hljs-number">1</span>,
}

<span class="hljs-comment"># Create a WatsonxLLM instance with the specified model, URL, project ID, and parameters</span>
watsonx_llm = WatsonxLLM(
    model_id=<span class="hljs-string">"meta-llama/llama-3-1-70b-instruct"</span>,
    url=<span class="hljs-string">"https://us-south.ml.cloud.ibm.com"</span>,
    project_id=<span class="hljs-string">"YOUR_PROJECT_ID"</span>,
    temperature=temperature,
    max_new_tokens=max_new_tokens,
    additional_params=additional_params,
)

<span class="hljs-comment"># Load documents from the specified directory</span>
documents = SimpleDirectoryReader(
    input_files=[<span class="hljs-string">"Document/AI.txt"</span>]
).load_data()

<span class="hljs-comment"># Combine the text content of all documents into a single Document object</span>
combined_documents = Document(text=<span class="hljs-string">"\n\n"</span>.join([doc.text <span class="hljs-keyword">for</span> doc <span class="hljs-keyword">in</span> documents]))

<span class="hljs-comment"># Print the combined document</span>
print(combined_documents)
</code></pre>
<p>Here's a breakdown of the parameters:</p>
<ul>
<li><p><strong>temperature = 0:</strong> This setting makes the model generate the most likely text sequence, leading to a more deterministic and predictable output. It's like telling the model to stick to the most common words and phrases.</p>
</li>
<li><p><strong>max_new_tokens = 1500:</strong> This limits the generated text to a maximum of 1500 new tokens (words or parts of words).</p>
</li>
<li><p><strong>additional_params:</strong></p>
<ul>
<li><p><strong>decoding_method = "sample":</strong> This means the model will generate text randomly based on the probability distribution of each token.</p>
</li>
<li><p><strong>min_new_tokens = 1:</strong> Ensures that at least one new token is generated, preventing the model from repeating itself.</p>
</li>
<li><p><strong>top_k = 50:</strong> This limits the model's choices to the 50 most likely tokens at each step, making the output more focused and less random.</p>
</li>
<li><p><strong>top_p = 1:</strong> This sets the nucleus sampling probability to 1, meaning all tokens with a probability greater than or equal to the top_p value will be considered.</p>
</li>
</ul>
</li>
</ul>
<p>You can tweak these parameters for experimentation and see how they affect your response. Now we'll be building and loading a vector store index from the given document. But first, let's understand what it is.</p>
<h3 id="heading-understanding-vector-store-indexes">Understanding Vector Store Indexes</h3>
<p>A vector store index is a specialized data structure designed to efficiently store and retrieve high-dimensional vectors. In the context of the Llama Index, these vectors represent the semantic embeddings of documents.</p>
<p><strong>Key characteristics of vector store indexes:</strong></p>
<ul>
<li><p><strong>High-dimensional vectors:</strong> Each document is represented as a high-dimensional vector, capturing its semantic meaning.</p>
</li>
<li><p><strong>Efficient retrieval:</strong> Vector store indexes are optimized for fast similarity search, allowing you to quickly find documents that are semantically similar to a given query.</p>
</li>
<li><p><strong>Scalability:</strong> They can handle large datasets and scale efficiently as the number of documents grows.</p>
</li>
</ul>
<p><strong>How Llama Index uses vector store indexes:</strong></p>
<ol>
<li><p><strong>Document Embedding:</strong> Documents are first converted into high-dimensional vectors using a language model like Llama.</p>
</li>
<li><p><strong>Index Creation:</strong> The embeddings are stored in a vector store index.</p>
</li>
<li><p><strong>Query Processing:</strong> When a user submits a query, it is also converted into a vector. The vector store index is then used to find the most similar documents based on their embeddings.</p>
</li>
<li><p><strong>Response Generation:</strong> The retrieved documents are used to generate a relevant response.</p>
</li>
</ol>
<p>In the below code, you'll come across the word "chunk". <strong>A chunk</strong> is a smaller, manageable unit of text extracted from a larger document. It's typically a paragraph or a few sentences long. They are used to make the retrieval and processing of information more efficient, especially when dealing with large documents.</p>
<p>By breaking down documents into chunks, RAG systems can focus on the most relevant parts and generate more accurate and concise responses.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> llama_index.core.node_parser <span class="hljs-keyword">import</span> SentenceSplitter
<span class="hljs-keyword">from</span> llama_index.core <span class="hljs-keyword">import</span> VectorStoreIndex, load_index_from_storage
<span class="hljs-keyword">from</span> llama_index.core <span class="hljs-keyword">import</span> Settings
<span class="hljs-keyword">from</span> llama_index.core <span class="hljs-keyword">import</span> StorageContext

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_build_index</span>(<span class="hljs-params">documents, embed_model=<span class="hljs-string">"local:BAAI/bge-small-en-v1.5"</span>, save_dir=<span class="hljs-string">"./vector_store/index"</span></span>):</span>
    <span class="hljs-string">"""
    Builds or loads a vector store index from the given documents.

    Args:
        documents (list[Document]): A list of Document objects.
        embed_model (str, optional): The embedding model to use. Defaults to "local:BAAI/bge-small-en-v1.5".
        save_dir (str, optional): The directory to save or load the index from. Defaults to "./vector_store/index".

    Returns:
        VectorStoreIndex: The built or loaded index.
    """</span>

    <span class="hljs-comment"># Set index settings</span>
    Settings.llm = watsonx_llm
    Settings.embed_model = embed_model
    Settings.node_parser = SentenceSplitter(chunk_size=<span class="hljs-number">1000</span>, chunk_overlap=<span class="hljs-number">200</span>)
    Settings.num_output = <span class="hljs-number">512</span>
    Settings.context_window = <span class="hljs-number">3900</span>

    <span class="hljs-comment"># Check if the save directory exists</span>
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> os.path.exists(save_dir):
        <span class="hljs-comment"># Create and load the index</span>
        index = VectorStoreIndex.from_documents(
            [documents], service_context=Settings
        )
        index.storage_context.persist(persist_dir=save_dir)
    <span class="hljs-keyword">else</span>:
        <span class="hljs-comment"># Load the existing index</span>
        index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=save_dir),
            service_context=Settings,
        )
    <span class="hljs-keyword">return</span> index

<span class="hljs-comment"># Get the Vector Index</span>
vector_index = get_build_index(documents=documents, embed_model=<span class="hljs-string">"local:BAAI/bge-small-en-v1.5"</span>, save_dir=<span class="hljs-string">"./vector_store/index"</span>)
</code></pre>
<p>This is the last part of RAG: we create a query engine with metadata replacement and sentence transformer reranking. Bruh! What is a re-ranker now?</p>
<p><strong>A re-ranker</strong> is a component that reorders the retrieved documents based on their relevance to the query. It uses additional information, such as semantic similarity or context-specific factors, to refine the initial ranking provided by the retrieval system. This helps ensure that the most relevant documents are presented to the user, leading to more accurate and informative responses.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> llama_index.core.postprocessor <span class="hljs-keyword">import</span> MetadataReplacementPostProcessor, SentenceTransformerRerank

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_query_engine</span>(<span class="hljs-params">sentence_index, similarity_top_k=<span class="hljs-number">6</span>, rerank_top_n=<span class="hljs-number">2</span></span>):</span>
    <span class="hljs-string">"""
    Creates a query engine with metadata replacement and sentence transformer reranking.

    Args:
        sentence_index (VectorStoreIndex): The sentence index to use.
        similarity_top_k (int, optional): The number of similar nodes to consider. Defaults to 6.
        rerank_top_n (int, optional): The number of nodes to rerank. Defaults to 2.

    Returns:
        QueryEngine: The query engine.
    """</span>

    postproc = MetadataReplacementPostProcessor(target_metadata_key=<span class="hljs-string">"window"</span>)
    rerank = SentenceTransformerRerank(
        top_n=rerank_top_n, model=<span class="hljs-string">"BAAI/bge-reranker-base"</span>
    )
    engine = sentence_index.as_query_engine(
        similarity_top_k=similarity_top_k, node_postprocessors=[postproc, rerank]
    )
    <span class="hljs-keyword">return</span> engine

<span class="hljs-comment"># Create a query engine with the specified parameters</span>
query_engine = get_query_engine(sentence_index=vector_index, similarity_top_k=<span class="hljs-number">8</span>, rerank_top_n=<span class="hljs-number">5</span>)

<span class="hljs-comment"># Query the engine with a question</span>
query = <span class="hljs-string">'What is Deep learning?'</span>
response = query_engine.query(query)
prompt = <span class="hljs-string">f'''Generate a detailed response for the query asked based only on the context fetched:
            Query: <span class="hljs-subst">{query}</span>
            Context: <span class="hljs-subst">{response}</span>

            Instructions:
            1. Show query and your generated response based on context.
            2. Your response should be detailed and should cover every aspect of the context.
            3. Be crisp and concise.
            4. Don't include anything else in your response - no header/footer/code etc
            '''</span>
response = generate_response(prompt)
print(response.text)

<span class="hljs-string">'''
OUTPUT - 
Query: What is Deep learning? 

Deep learning is a subset of artificial intelligence that utilizes multiple layers of neurons between the network's inputs and outputs to progressively extract higher-level features from raw input data. 
This technique allows for improved performance in various subfields of AI, such as computer vision, speech recognition, natural language processing, and image classification. 
The multiple layers in deep learning networks are able to identify complex concepts and patterns, including edges, faces, digits, and letters.
The reason behind deep learning's success is not attributed to a recent theoretical breakthrough, but rather the significant increase in computer power, particularly the shift to using graphics processing units (GPUs), which provided a hundred-fold increase in speed. 
Additionally, the availability of vast amounts of training data, including large curated datasets, has also contributed to the success of deep learning.
Overall, deep learning's ability to analyze and extract insights from raw data has led to its widespread application in various fields, and its performance continues to improve with advancements in technology and data availability. '''</span>
</code></pre>
<h2 id="heading-how-to-fine-tune-the-pipeline">How to Fine-Tune the Pipeline</h2>
<p>Once you've built a basic RAG pipeline, the next step is to fine-tune it for optimal performance. This involves iteratively adjusting various components and parameters to improve the quality of the generated responses.</p>
<h3 id="heading-how-to-evaluate-the-pipelines-performance">How to Evaluate the Pipeline's Performance</h3>
<p>To assess the pipeline's effectiveness, you can use <strong>metrics</strong> like:</p>
<ul>
<li><p><strong>Accuracy:</strong> How often does the pipeline generate correct and relevant responses?</p>
</li>
<li><p><strong>Relevance:</strong> How well do the retrieved documents match the query?</p>
</li>
<li><p><strong>Coherence:</strong> Is the generated text well-structured and easy to understand?</p>
</li>
<li><p><strong>Factuality:</strong> Are the generated responses accurate and consistent with known facts?</p>
</li>
</ul>
<h3 id="heading-iterate-on-the-index-structure-embedding-model-and-language-model">Iterate on the Index Structure, Embedding Model, and Language Model</h3>
<p>You can experiment with different <strong>index structures</strong> (for example flat index, hierarchical index) to find the one that best suits your data and query patterns. Consider using <strong>different embedding models</strong> to capture different semantic nuances. <strong>Fine-tuning the language model</strong> can also improve its ability to generate high-quality responses.</p>
<h3 id="heading-experiment-with-different-hyperparameters">Experiment with Different Hyperparameters</h3>
<p><strong>Hyperparameters</strong> are settings that control the behaviour of the pipeline components. By experimenting with different values, you can optimize the pipeline's performance. Some examples of hyperparameters include:</p>
<ul>
<li><p><strong>Embedding dimension:</strong> The size of the embedding vectors</p>
</li>
<li><p><strong>Index size:</strong> The maximum number of documents to store in the index</p>
</li>
<li><p><strong>Retrieval threshold:</strong> The minimum similarity score for a document to be considered relevant</p>
</li>
</ul>
<h2 id="heading-real-world-applications-of-rag">Real-World Applications of RAG</h2>
<p>RAG pipelines have a wide range of applications, including:</p>
<ul>
<li><p><strong>Customer support chatbots:</strong> Providing informative and helpful responses to customer inquiries</p>
</li>
<li><p><strong>Knowledge base search:</strong> Efficiently retrieving relevant information from large document collections</p>
</li>
<li><p><strong>Summarization of large documents:</strong> Condensing lengthy documents into concise summaries</p>
</li>
<li><p><strong>Question answering systems:</strong> Answering complex questions based on a given corpus of knowledge</p>
</li>
</ul>
<h2 id="heading-rag-best-practices-and-considerations">RAG Best Practices and Considerations</h2>
<p>To build effective RAG pipelines, consider these best practices:</p>
<ul>
<li><p><strong>Data quality and preprocessing:</strong> Ensure your data is clean, consistent, and relevant to your use case. Preprocess the data to remove noise and improve its quality.</p>
</li>
<li><p><strong>Embedding model selection:</strong> Choose an embedding model that is appropriate for your specific domain and task. Consider factors like accuracy, computational efficiency, and interpretability.</p>
</li>
<li><p><strong>Index optimization:</strong> Optimize the index structure and parameters to improve retrieval efficiency and accuracy.</p>
</li>
<li><p><strong>Ethical considerations and biases:</strong> Be aware of potential biases in your data and models. Take steps to mitigate bias and ensure fairness in your RAG pipeline.</p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>RAG pipelines offer a powerful approach to leveraging large language models for a variety of tasks. By carefully selecting and fine-tuning the components of an RAG pipeline, you can build systems that provide informative, accurate, and relevant responses.</p>
<p><strong>Key points to remember:</strong></p>
<ul>
<li><p>RAG combines information retrieval and language generation.</p>
</li>
<li><p>Llama-Index simplifies the process of building RAG pipelines.</p>
</li>
<li><p>Fine-tuning is essential for optimizing pipeline performance.</p>
</li>
<li><p>RAG has a wide range of real-world applications.</p>
</li>
<li><p>Ethical considerations are crucial in building responsible RAG systems.</p>
</li>
</ul>
<p>As RAG technology continues to evolve, we can expect to see even more innovative and powerful applications in the future. Till then, let's wait for the future to unfold!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ A Beginner's Guide to LLMs – What's a Large-Language Model and How Does it Work? ]]>
                </title>
                <description>
                    <![CDATA[ ChatGPT was released in November 2022. Since then, we’ve witnessed rapid advancements in the field of AI and technology. But did you know that the journey of AI chatbots began way back in 1966 with ELIZA? ELIZA was not as sophisticated as today’s mod... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/a-beginners-guide-to-large-language-models/</link>
                <guid isPermaLink="false">66be59dfcea37428c836a987</guid>
                
                    <category>
                        <![CDATA[ llm ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Beginner Developers ]]>
                    </category>
                
                    <category>
                        <![CDATA[ beginner ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Open Source ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Data Science ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Bhavishya Pandit ]]>
                </dc:creator>
                <pubDate>Thu, 15 Aug 2024 19:41:19 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1723750839199/0dc3a4ff-3e4e-4055-b3c1-955474946b0f.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>ChatGPT was released in November 2022. Since then, we’ve witnessed rapid advancements in the field of AI and technology.</p>
<p>But did you know that the journey of AI chatbots began way back in 1966 with ELIZA? ELIZA was not as sophisticated as today’s models like GPT, but it marked the beginning of the exciting path that led us to where we are now.</p>
<p>Language is the essence of human interaction, and in the digital age, teaching machines to understand and generate language has become a cornerstone of artificial intelligence.</p>
<p>The models we interact with today—such as GPT, Llama3, Gemini, and Claude—are known as Large Language Models (LLMs). This is because they are trained on vast datasets of text, enabling them to perform a wide range of language-related tasks.</p>
<p>But what exactly are LLMs, and why is there so much hype surrounding them?</p>
<p>In this article, you'll learn what LLMs are and what is the hype all about.</p>
<h2 id="heading-what-are-llms"><strong>What Are LLMs?</strong></h2>
<p>Large Language Models are AI models trained on vast amounts of text data to understand, generate, and manipulate human language. They are based on deep learning architectures like transformers, which allow them to process and predict text in a way that mimics human understanding.</p>
<p>In simpler terms, an LLM is a computer program that has been trained on many examples to differentiate between an apple and a Boeing 787 – and to be able to describe each of them.</p>
<p>Before they're ready for use and can answer your questions, LLMs are trained on massive datasets. Realistically, a program cannot conclude anything from a single sentence. But after analyzing, say, trillions of sentences, it's able to build a logic to complete sentences or even generate its own.</p>
<h3 id="heading-how-to-train-an-llm">How to Train an LLM</h3>
<p>Here’s how the training process works:</p>
<ol>
<li><p><strong>Data Collection:</strong> The first step involves gathering millions (or even billions) of text documents from diverse sources, including books, websites, research papers, and social media. This extensive dataset serves as the foundation for the model’s learning process.</p>
</li>
<li><p><strong>Learning Patterns:</strong> The model analyzes the collected data to identify and learn patterns in the text. These patterns include grammar rules, word associations, contextual relationships, and even some level of common sense. By processing this data, the model begins to understand how language works.</p>
</li>
<li><p><strong>Fine-Tuning:</strong> After the initial training, the model is fine-tuned for specific tasks. This involves adjusting the model’s parameters to optimize its performance for tasks such as translation, summarization, sentiment analysis, or question-answering.</p>
</li>
<li><p><strong>Evaluation and Testing:</strong> Once trained, the model is rigorously tested against a series of benchmarks to evaluate its accuracy, efficiency, and reliability. This step ensures that the model performs well in real-world applications.</p>
</li>
</ol>
<p>After the training process is completed, the models are heavily tested on a series of benchmarks for accuracy, efficiency, security, and so on.</p>
<h2 id="heading-applications-of-llms"><strong>Applications of LLMs</strong></h2>
<p>LLMs have a wide range of applications, from content generation to prediction and a lot more.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1723742816242/6e40d678-96ed-4f51-aa35-61c565548a32.png" alt="Applications of LLMs in different domains like Healthcare, Education, Customer Support, and so on." class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-content-creation"><strong>Content Creation:</strong></h3>
<ul>
<li><p><strong>Writing Assistance:</strong> Tools like Grammarly utilize LLMs to provide real-time suggestions for improving grammar, style, and clarity in writing. Whether you’re drafting an email or writing a novel, LLMs can help you polish your text.</p>
</li>
<li><p><strong>Automated Storytelling:</strong> AI models can now generate creative content, from short stories to full-length novels. These models can emulate the style of famous authors or even create entirely new literary styles.</p>
</li>
</ul>
<h3 id="heading-customer-service"><strong>Customer Service:</strong></h3>
<ul>
<li><p><strong>Chatbots:</strong> Many companies deploy AI-powered chatbots that can understand and respond to customer inquiries in real time. These chatbots can handle a wide range of tasks, from answering frequently asked questions to processing orders.</p>
</li>
<li><p><strong>Personal Assistants:</strong> Virtual assistants like Siri and Alexa use LLMs to interpret and respond to voice commands, providing users with information, reminders, and entertainment on demand.</p>
</li>
</ul>
<h3 id="heading-healthcare"><strong>Healthcare:</strong></h3>
<ul>
<li><p><strong>Medical Record Summarization:</strong> LLMs can assist healthcare professionals by summarizing patient records, making it easier to review critical information and make informed decisions.</p>
</li>
<li><p><strong>Diagnostic Assistance:</strong> AI models can analyze patient data and medical literature to assist doctors in diagnosing diseases and recommending treatments.</p>
</li>
</ul>
<h3 id="heading-research-and-education"><strong>Research and Education:</strong></h3>
<ul>
<li><p><strong>Literature Review:</strong> LLMs can sift through vast amounts of research papers to provide concise summaries, identify trends, and suggest new research directions.</p>
</li>
<li><p><strong>Educational Tools:</strong> AI-powered tutors can offer personalized learning experiences by adapting to a student’s progress and needs. These tools can provide instant feedback and tailored study plans.</p>
</li>
</ul>
<h3 id="heading-entertainment"><strong>Entertainment:</strong></h3>
<ul>
<li><p><strong>Game Development:</strong> LLMs are used to create more dynamic and responsive characters in video games. These AI-driven characters can engage with players more realistically and interactively.</p>
</li>
<li><p><strong>Music and Art Generation:</strong> AI models are now capable of composing music, generating artwork, and even writing scripts for movies, pushing the boundaries of creative expression.</p>
</li>
</ul>
<h2 id="heading-challenges-with-llms"><strong>Challenges with LLMs</strong></h2>
<p>While LLMs are powerful, they are not without their challenges. ChatGPT has over 150 Million monthly users, this gives us an idea about how big the impact of AI is. But new technologies pose some challenges too.</p>
<h3 id="heading-bias-and-fairness"><strong>Bias and Fairness:</strong></h3>
<ul>
<li>LLMs learn from the data they are trained on, which can include biases present in society. This can lead to biased or unfair outcomes in their predictions or responses. Addressing this requires careful dataset curation and algorithm adjustments to minimize bias.</li>
</ul>
<h3 id="heading-data-privacy"><strong>Data Privacy:</strong></h3>
<ul>
<li>LLMs may inadvertently learn and retain sensitive information from the data they are trained on, raising privacy concerns. There’s ongoing research on how to make LLMs more privacy-preserving.</li>
</ul>
<h3 id="heading-resource-intensive"><strong>Resource Intensive:</strong></h3>
<ul>
<li>Training LLMs require immense computational power and large datasets, which can be costly and environmentally taxing. Efforts are being made to create more efficient models that require less energy and data.</li>
</ul>
<h3 id="heading-interpretability"><strong>Interpretability:</strong></h3>
<ul>
<li>LLMs are often seen as "black boxes," meaning it’s challenging to understand exactly how they arrive at certain conclusions. Developing methods to make AI more interpretable and explainable is an ongoing research area.</li>
</ul>
<h2 id="heading-coding-with-llms-a-replicate-example"><strong>Coding with LLMs: A Replicate Example</strong></h2>
<p>For those of you who like to get your hands dirty with code, here’s a quick example of how to use an LLM with the Replicate library.</p>
<p><strong>Replicate</strong> is a Python package that simplifies the process of running machine learning models in the cloud. It provides a user-friendly interface to access and utilize a vast collection of pre-trained models from the Replicate platform.</p>
<p>With Replicate, you can easily:</p>
<ul>
<li><p>Run models directly from your Python code or Jupyter notebooks.</p>
</li>
<li><p>Access various model types, including image generation, text generation, and more.</p>
</li>
<li><p>Leverage powerful cloud infrastructure for efficient model execution.</p>
</li>
<li><p>Integrate AI capabilities into your applications without the complexities of model training and deployment.</p>
</li>
</ul>
<p>Here’s a simple code snippet to generate text using Meta's llama3-70b-instruct model. <strong>Llama 3</strong> is one of the latest open-source large language models developed by Meta. It's designed to be highly capable, versatile, and accessible, allowing users to experiment, innovate, and scale their AI applications.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> replicate <span class="hljs-comment"># pip install replicate</span>

<span class="hljs-comment"># Get your token from -&gt; https://replicate.com/account/api-tokens</span>
os.environ[<span class="hljs-string">"REPLICATE_API_TOKEN"</span>] = <span class="hljs-string">"TOKEN"</span>
api = replicate.Client(api_token=os.environ[<span class="hljs-string">"REPLICATE_API_TOKEN"</span>])

<span class="hljs-comment"># Running llama3 model using replicate</span>
output = api.run(
    <span class="hljs-string">"meta/meta-llama-3-70b-instruct"</span>,
        input={<span class="hljs-string">"prompt"</span>: <span class="hljs-string">'Hey how are you?'</span>}
    )

<span class="hljs-comment"># Printing llama3's response</span>
<span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> output:
    print(item, end=<span class="hljs-string">""</span>)
</code></pre>
<p>Explanation:</p>
<ul>
<li><p>We first save the replicate token using the os package as an environment variable.</p>
</li>
<li><p>Then we use the Llama3 70b-instruct model to give a response based on our prompt. You can customize the output by changing the prompt.</p>
</li>
</ul>
<p>And what is a prompt? <strong>A prompt is essentially a text-based instruction or query given to an AI model.</strong> It's like providing a starting point or direction for the AI to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.</p>
<p>For example:</p>
<ul>
<li><p><strong>"Write a poem about a robot exploring the ocean."</strong></p>
</li>
<li><p><strong>"Translate 'Hello, how are you?' into Spanish."</strong></p>
</li>
<li><p><strong>"Explain quantum computing in simple terms."</strong></p>
</li>
</ul>
<p>These are all prompts that guide the AI to produce a specific output.</p>
<p>Using Meta's <strong>llama-3-70b-instruct</strong>, you can build various tools around applications that are mentioned in this article. Tweak the prompts based on your use case and you will be ready to go! ⚡️</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>In this article, we explored the world of Large Language Models, providing a high-level understanding of how they work and their training process. We delved into the core concepts of LLMs, including data collection, pattern learning, and fine-tuning, and discussed the extensive applications of LLMs across various industries.</p>
<p>While LLMs offer immense potential, they also come with challenges such as bias, privacy concerns, resource demands, and interpretability. Addressing these challenges is crucial as AI continues to evolve and integrate more deeply into our lives.</p>
<p>We also provided a glimpse into how you can start working with LLMs using the Replicate library, showing that even complex models like Llama3 70b-instruct can be accessible to developers with the right tools.</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
