<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Elabonga Atuo - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Elabonga Atuo - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Sun, 24 May 2026 22:24:00 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/author/Ellabee/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Local RAG App with Ollama and ChromaDB in the R Programming Language ]]>
                </title>
                <description>
                    <![CDATA[ A Large Language Model (LLM) is a type of machine learning model that is trained to understand and generate human-like text. These models are trained on vast datasets to capture the nuances of human language, enabling them to generate coherent and co... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/build-a-local-rag-app-with-ollama-and-chromadb-in-r/</link>
                <guid isPermaLink="false">67fd5ac89a2c2895da61d799</guid>
                
                    <category>
                        <![CDATA[ ollama ]]>
                    </category>
                
                    <category>
                        <![CDATA[ chromadb ]]>
                    </category>
                
                    <category>
                        <![CDATA[ R Language ]]>
                    </category>
                
                    <category>
                        <![CDATA[ RAG  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Elabonga Atuo ]]>
                </dc:creator>
                <pubDate>Mon, 14 Apr 2025 18:58:16 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1744638731389/83993a5e-7a4d-4615-a8c5-582008115fc4.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>A Large Language Model (LLM) is a type of machine learning model that is trained to understand and generate human-like text. These models are trained on vast datasets to capture the nuances of human language, enabling them to generate coherent and contextually relevant responses.</p>
<p>You can enhance the performance of an LLM by providing context — structured or unstructured data, such as documents, articles, or knowledge bases — tailored to the domain or information you want the model to specialize in. Using techniques like prompt engineering and context injection, you can build an intelligent chatbot capable of navigating extensive datasets, retrieving relevant information, and delivering responses.</p>
<p>Whether it's storing recipes, code documentation, research articles, or answering domain-specific queries, an LLM-based chatbot can adapt to your needs with customization and privacy. You can deploy it locally to create a highly specialized conversational assistant that respects your data.</p>
<p>In this article, you will learn how to build a local Retrieval-Augmented Generation (RAG) application using Ollama and ChromaDB in R. By the end, you'll have a custom conversational assistant with a Shiny interface that efficiently retrieves information while maintaining privacy and customization.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-rag">What is RAG?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-project-overview">Project Overview</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-project-setup">Project Setup</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-ollama-installation">Ollama Installation</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-data-collection-and-cleaning">Data Collection and Cleaning</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-create-chunks">How to Create Chunks</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-generate-sentence-embeddings">How to Generate Sentence Embeddings</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-set-up-the-vector-database-for-embedding-storage">How to Set Up the Vector Database for Embedding Storage</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-write-the-user-input-query-embedding-function">How to Write the User Input Query Embedding Function</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-tool-calling">Tool Calling</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-initialize-the-chat-system-design-prompts-and-integrate-tools">How to Initialize the Chat System, Design Prompts, and Integrate Tools</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-interact-with-your-chatbot-using-a-shiny-app">How to Interact with Your Chatbot Using a Shiny App</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-complete-code">Complete Code</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-what-is-rag">What is RAG?</h2>
<p>Retrieval-Augmented Generation (RAG) is a method that integrates retrieval systems with generative AI, enabling chatbots to access recent and specific information from external sources.</p>
<p>By using a retrieval pipeline, the chatbot can fetch up-to-date, relevant data and combine it with the generative model’s language capabilities, producing responses that are both accurate and contextually enriched. This makes RAG particularly useful for applications requiring fact-based, real-time knowledge delivery.</p>
<h2 id="heading-project-overview">Project Overview</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744367291671/3e7989f8-0cd9-4857-ba48-23a352d9ae8d.png" alt="Setting up a local RAG chatbot from data gathering, cleaning, chunking, embedding, vector database storage, system prompting and interactive chatbot using Shiny" class="image--center mx-auto" width="1318" height="1101" loading="lazy"></p>
<h2 id="heading-project-setup">Project Setup</h2>
<h3 id="heading-prerequisites">Prerequisites</h3>
<p>Before you begin, ensure you have installed the latest version of the items listed here:</p>
<ol>
<li><p><a target="_blank" href="https://posit.co/download/rstudio-desktop/"><strong>RStudio</strong></a><strong>: The IDE</strong> <em>–</em> RStudio is the primary workspace where you'll write and test your R code. Its user-friendly interface, debugging tools, and integrated environment make it ideal for data analysis and chatbot development.</p>
</li>
<li><p><a target="_blank" href="https://cran.rstudio.com/"><strong>R</strong></a><strong>: The Programming Language</strong> <em>–</em> R is the backbone of your project. You'll use it to handle data manipulation, apply statistical models, and integrate your recipe chatbot components seamlessly.</p>
</li>
<li><p><a target="_blank" href="https://www.python.org/downloads/"><strong>Python</strong></a> – Some libraries, like the embedding library you'll use for text vectorization, are built on Python. It’s vital to have Python installed to enable these functionalities alongside your R code.</p>
</li>
<li><p><a target="_blank" href="https://www.java.com/en/download/"><strong>Java</strong></a> – Java serves as a foundational element for certain embedding libraries. It ensures efficient processing and compatibility for text embedding tasks required to train your chatbot.</p>
</li>
<li><p><a target="_blank" href="https://www.docker.com/products/docker-desktop/"><strong>Docker Desktop</strong></a> – Docker Desktop allows you to run ChromaDB, the vector database, locally on your machine. This enables fast and reliable storage of embeddings, ensuring your chatbot retrieves relevant information quickly.</p>
</li>
<li><p><a target="_blank" href="https://ollama.com/"><strong>Ollama</strong></a> – Ollama brings powerful Large Language Models (LLMs) directly to your local computer, removing the need for cloud resources. It lets you access multiple models, customize outputs, and integrate them into your chatbot effortlessly.</p>
</li>
</ol>
<h2 id="heading-ollama-installation">Ollama Installation</h2>
<p>Ollama is an open-sourced tool you can use to run and manage LLMs on your computer. Once installed, you can access various LLMs as per your needs. You will be using <code>llama3.2:3b-instruct-q4_K_M</code> model to build this chatbot.</p>
<p>A quantized model is a version of a machine learning model that has been optimized to use less memory and computational power by reducing the precision of the numbers it uses. This enables you to use an LLM locally, especially when you don’t have access to a GPU (Graphics Processing Unit – a specialized processor that perform complex computations).</p>
<p>To start, you can download and install the Ollama software <a target="_blank" href="https://ollama.com/download">here</a>.</p>
<p>Then you can confirm installation by running this command:</p>
<pre><code class="lang-bash">ollama --version
</code></pre>
<p>Run the following command to start Ollama:</p>
<pre><code class="lang-bash">ollama serve
</code></pre>
<p>Next, run the following command to pull the Q4_K_M quantization of llama3.2:3b-instruct:</p>
<pre><code class="lang-bash">ollama pull llama3.2:3b-instruct-q4_K_M
</code></pre>
<p>Then confirm that the model was extracted with this:</p>
<pre><code class="lang-bash">ollama list
</code></pre>
<p>If the model extraction was successful, a list containing the model’s name, ID, and size will be returned, like so:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744288047721/f6349ca4-fe86-4851-beaf-2f04fe2a4d80.png" alt="Confirm Ollama Installation" class="image--center mx-auto" width="1455" height="256" loading="lazy"></p>
<p>Now you can chat with the model:</p>
<pre><code class="lang-bash">ollama run llama3.2:3b-instruct-q4_K_M
</code></pre>
<p>If successful, you should receive a prompt that you can test by asking a question and getting an answer. For example:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744288433940/d831d256-0f6c-49c0-b647-bce1c1976584.png" alt="Ollama llama3.2:3b-instruct-q4_K_M chat console" class="image--center mx-auto" width="1612" height="559" loading="lazy"></p>
<p>Then you can exit the console by typing <code>/bye</code> or ctrl + D</p>
<h2 id="heading-data-collection-and-cleaning">Data Collection and Cleaning</h2>
<p>The chatbot you are building will be a cooking assistant that suggests recipes given your available ingredients, what you want to eat, and how much food a recipe yields.</p>
<p>You first have to get the data to train the model. You will be using a <a target="_blank" href="https://www.kaggle.com/datasets/paultimothymooney/recipenlg">dataset</a> that contains recipes from Kaggle.</p>
<p>To start, load the necessary libraries:</p>
<pre><code class="lang-r"><span class="hljs-comment"># loading required libraries</span>
<span class="hljs-keyword">library</span>(xml2) <span class="hljs-comment">#read, parse, and manipulate XML,HTML documents</span>
<span class="hljs-keyword">library</span>(jsonlite) <span class="hljs-comment">#manipulate JSON objects</span>

<span class="hljs-keyword">library</span>(RKaggle) <span class="hljs-comment"># download datasets from Kaggle </span>
<span class="hljs-keyword">library</span>(dplyr)   <span class="hljs-comment"># data manipulation</span>
</code></pre>
<p>Then download and save recipe dataset:</p>
<pre><code class="lang-r"><span class="hljs-comment"># Download and read the "recipe" dataset from Kaggle</span>
recipes_list &lt;- RKaggle::get_dataset(<span class="hljs-string">"thedevastator/better-recipes-for-a-better-life"</span>)
</code></pre>
<p>Inspect the dataframe and extract the first element like this:</p>
<pre><code class="lang-r"><span class="hljs-comment"># inspect the dataset</span>
class(recipes_list)
str(recipes_list)
head(recipes_list)
<span class="hljs-comment"># extract the first tibble</span>
recipes_df &lt;- recipes_list[[<span class="hljs-number">1</span>]]
</code></pre>
<p>A quick inspection of the <code>recipes_list</code> object shows that it contains two objects of type tibble. You will be using only the first element for this project. A tibble is a type of data structure used for storing and manipulating data. It’s similar to a traditional dataframe, but it’s designed to enforce stricter rules and perform fewer automatic actions compared to traditional dataframes.</p>
<p>We’ll use a regular dataframe in this project because more people are likely familiar with it. It can also efficiently handle row indexing, which is crucial for accessing and manipulating specific rows in our recipe dataset.</p>
<p>In the code block below, you’ll convert the tibble to a dataframe and then drop the first column, which is the index column. Then you’ll inspect the newly converted dataframe and drop unnecessary columns.</p>
<p>Unnecessary columns are best removed to streamline the dataset and focus on relevant features. In this project, we’ll drop certain columns that aren’t particularly useful for training the chatbot. This ensures that the model concentrates on meaningful data to improve its accuracy and functionality.</p>
<pre><code class="lang-r"><span class="hljs-comment"># convert to dataframe and drop the first column</span>
recipes_df &lt;- as.data.frame(recipes_df[, -<span class="hljs-number">1</span>])
<span class="hljs-comment"># inspect the converted dataframe</span>
head(recipes_df)
class(recipes_df)
colnames(recipes_df)
<span class="hljs-comment"># drop unnecessary columns</span>
cleaned_recipes_df &lt;- subset(recipes_df, select = -c(yield,rating,url,cuisine_path,nutrition,timing,img_src))
</code></pre>
<p>Now you need to identify rows with NA (missing) values, which you can do like this:</p>
<pre><code class="lang-r"><span class="hljs-comment"># Identify rows and columns with NA values</span>
which(is.na(cleaned_recipes_df), arr.ind = <span class="hljs-literal">TRUE</span>)

<span class="hljs-comment"># a quick inspection reveals columns [2:4] have missing values</span>
subset_column_names &lt;- colnames(cleaned_recipes_df)[<span class="hljs-number">2</span>:<span class="hljs-number">4</span>]
subset_column_names
</code></pre>
<p>It is important to handle NA values to ensure that your data is complete, to prevent errors, and to preserve context.</p>
<p>Now, replace the NA values and confirm that there are no missing values:</p>
<pre><code class="lang-r"><span class="hljs-comment"># Replace NA values dynamically based on conditions</span>
cols_to_modify &lt;- c(<span class="hljs-string">"prep_time"</span>, <span class="hljs-string">"cook_time"</span>, <span class="hljs-string">"total_time"</span>)
cleaned_recipes_df[cols_to_modify] &lt;- lapply(
  cleaned_recipes_df[cols_to_modify],
  <span class="hljs-keyword">function</span>(x, df) {
    <span class="hljs-comment"># Replace NA in prep_time and cook_time where both are NA</span>
    replace(x, is.na(df$prep_time) &amp; is.na(df$cook_time), <span class="hljs-string">"unknown"</span>)
  },
  df = cleaned_recipes_df  <span class="hljs-comment"># Pass the whole dataframe for conditions</span>
)
cleaned_recipes_df &lt;- cleaned_recipes_df %&gt;%
  mutate(
    prep_time = case_when(
      <span class="hljs-comment"># If cooktime is present but preptime is NA, replace with "no preparation required"</span>
      !is.na(cook_time) &amp; is.na(prep_time) ~ <span class="hljs-string">"no preparation required"</span>,
      <span class="hljs-comment"># Otherwise, retain original value</span>
      <span class="hljs-literal">TRUE</span> ~ as.character(prep_time)
    ),
    cook_time = case_when(
      <span class="hljs-comment"># If prep_time is present but cook_time is NA, replace with "no cooking required"</span>
      !is.na(prep_time) &amp; is.na(cook_time) ~ <span class="hljs-string">"no cooking required"</span>,
      <span class="hljs-comment"># Otherwise, retain original value</span>
      <span class="hljs-literal">TRUE</span> ~ as.character(cook_time)
    )
  )
<span class="hljs-comment"># confirm there are no missing values</span>
any(is.na(cleaned_recipes_df))
)

<span class="hljs-comment"># confirm the replacing NA logic works by inspecting specific rows</span>
cleaned_recipes_df[<span class="hljs-number">1081</span>,]
cleaned_recipes_df[<span class="hljs-number">1</span>,]
cleaned_recipes_df[<span class="hljs-number">405</span>,]
</code></pre>
<p>For this tutorial, we’ll subset the dataframe to the first 250 rows for demo purposes. This saves on time when it comes to generating embeddings.</p>
<pre><code class="lang-r"><span class="hljs-comment"># recommended for demo/learning purposes</span>
cleaned_recipes_df &lt;- head(cleaned_recipes_df,<span class="hljs-number">250</span>)
</code></pre>
<h2 id="heading-how-to-create-chunks">How to Create Chunks</h2>
<p>To understand why chunking is important before embedding, you need to understand what an embedding is.</p>
<p>An embedding is a vectoral representation of a word or a sentence. Machines don’t understand human text – they understand numbers. LLMs work by transforming human text to numerical representations in order to give answers. The process of generating embeddings requires a lot of computation, and breaking down the data to be embedded optimizes the embedding process.</p>
<p>So now we’re going to split the dataframe into smaller chunks of a specified size to enable efficient batch processing and iteration.</p>
<pre><code class="lang-r"><span class="hljs-comment"># Define the size of each chunk (number of rows per chunk)</span>
chunk_size &lt;- <span class="hljs-number">1</span>

<span class="hljs-comment"># Get the total number of rows in the dataframe</span>
n &lt;- nrow(cleaned_recipes_df)

<span class="hljs-comment"># Create a vector of group numbers for chunking</span>
<span class="hljs-comment"># Each group number repeats for 'chunk_size' rows</span>
<span class="hljs-comment"># Ensure the vector matches the total number of rows</span>
r &lt;- rep(<span class="hljs-number">1</span>:ceiling(n/chunk_size), each = chunk_size)[<span class="hljs-number">1</span>:n]

<span class="hljs-comment"># Split the dataframe into smaller chunks (subsets) based on the group numbers</span>
chunks &lt;- split(cleaned_recipes_df, r)
</code></pre>
<h2 id="heading-how-to-generate-sentence-embeddings">How to Generate Sentence Embeddings</h2>
<p>As previously mentioned, embeddings are vector representations of words or sentences. Embeddings can be generated from both words and sentences. How you choose to generate embeddings depends on your intended application of the LLM.</p>
<p>Word embeddings are numerical representations of individual words in a continuous vector space. They capture semantic relationships between words, allowing similar words to have vectors close to each other.</p>
<p>Word embeddings can be used in search engines as they support word-level queries by matching embeddings to retrieve relevant documents. They can also be used in text classification to classify documents, emails, or tweets based on word-level features (for example, detecting spam emails or sentiment analysis).</p>
<p>Sentence embeddings are numerical representations of entire sentences in a vector space, designed to capture the overall meaning and context of the sentence. They are used in settings where sentences provide better context like question answering systems where user queries are matched to relevant sentences or documents for more precise retrieval.</p>
<p>For our recipe chatbot, sentence embedding is the best choice.</p>
<p>First, create an empty dataframe that has three columns.</p>
<pre><code class="lang-r"><span class="hljs-comment">#empty dataframe</span>
recipe_sentence_embeddings &lt;-  data.frame(
  recipe = character(),
  recipe_vec_embeddings = I(list()),
  recipe_id = character()
)
</code></pre>
<p>The first column will hold the actual recipe in text form, the <code>recipe_vec_embeddings</code> column will hold the generated sentence embeddings, and the <code>recipe_id</code> holds a unique id for each recipe. This will help in indexing and retrieval from the vector database.</p>
<p>Next, it’s helpful to define a progress bar, which you can do like this:</p>
<pre><code class="lang-r"><span class="hljs-comment"># create a progress bar</span>
pb &lt;- txtProgressBar(min = <span class="hljs-number">1</span>, max = length(chunks), style = <span class="hljs-number">3</span>)
</code></pre>
<p>Embedding can take a while, so it’s important to keep track of the progress of the process.</p>
<p>Now it’s time to generate embeddings and populate the dataframe.</p>
<p>Write a for loop that executes the code block as long as the length of the chunks.</p>
<pre><code class="lang-r"><span class="hljs-keyword">for</span> (i <span class="hljs-keyword">in</span> <span class="hljs-number">1</span>:length(chunks)) {}
</code></pre>
<p>The recipe field is the text at the chunk that is currently being executed and the unique chunk id is generated by pasting the index of the chunk and the text “chunk”.</p>
<pre><code class="lang-r"><span class="hljs-keyword">for</span> (i <span class="hljs-keyword">in</span> <span class="hljs-number">1</span>:length(chunks)) {
    recipe &lt;- as.character(chunks[i])
    recipe_id &lt;- paste0(<span class="hljs-string">"recipe"</span>,i)
}
</code></pre>
<p>The text embed function from the text library generates either sentence or word embeddings. It takes in a character variable or a dataframe and produces a tibble of embeddings. You can read loading instructions here for smooth running of the <a target="_blank" href="https://www.r-text.org/">text</a> library.</p>
<p>The <code>batch_size</code> defines how many rows are embedded at a time from the input. Setting the <code>keep_token_embeddings</code> discards the embeddings for individual tokens after processing, and <code>aggregation_from_layers_to_tokens</code> “concatenates” or combines embeddings from specified layers to create detailed embeddings for each token. A token is the smallest unit of text that a model can process.</p>
<pre><code class="lang-r"><span class="hljs-keyword">for</span> (i <span class="hljs-keyword">in</span> <span class="hljs-number">1</span>:length(chunks)) {
    recipe &lt;- as.character(chunks[i])
    recipe_id &lt;- paste0(<span class="hljs-string">"recipe"</span>,i)
    recipe_embeddings &lt;- textEmbed(as.character(recipe),
                                layers = <span class="hljs-number">10</span>:<span class="hljs-number">11</span>,
                                aggregation_from_layers_to_tokens = <span class="hljs-string">"concatenate"</span>,
                                aggregation_from_tokens_to_texts = <span class="hljs-string">"mean"</span>,
                                keep_token_embeddings = <span class="hljs-literal">FALSE</span>,
                                batch_size = <span class="hljs-number">1</span>
  )
}
</code></pre>
<p>In order to specify sentence embeddings, you need to set the argument to the <code>aggregation_from_tokens_to_texts</code> parameter as <code>"mean"</code>.</p>
<pre><code class="lang-r">aggregation_from_tokens_to_texts = <span class="hljs-string">"mean"</span>
</code></pre>
<p>The "mean" operation averages the embeddings of all tokens in a sentence to generate a single vector that represents the entire sentence. This sentence-level embedding captures the overall meaning and semantics of the text, regardless of its token length.</p>
<pre><code class="lang-r"><span class="hljs-comment"># convert tibble to vector</span>
  recipe_vec_embeddings &lt;- unlist(recipe_embeddings, use.names = <span class="hljs-literal">FALSE</span>)
  recipe_vec_embeddings &lt;- list(recipe_vec_embeddings)
</code></pre>
<p>The embedding function returns a tibble object. In order to obtain a vector embedding, you need to first unlist the tibble and drop the row names and then list the result to form a simple vector.</p>
<pre><code class="lang-r">  <span class="hljs-comment"># Append the current chunk's data to the dataframe</span>
  recipe_sentence_embeddings &lt;- recipe_sentence_embeddings %&gt;%
    add_row(
      recipe = recipe,
      recipe_vec_embeddings = recipe_vec_embeddings,
      recipe_id = recipe_id
    )
</code></pre>
<p>Finally, update the empty dataframe after each iteration with the newly generated data.</p>
<pre><code class="lang-r">  <span class="hljs-comment"># track embedding progress</span>
  setTxtProgressBar(pb, i)
</code></pre>
<p>In order to keep track of the embedding progress, you can use the earlier defined progress bar inside the loop. It will update at the end of every iteration.</p>
<p><strong>Complete Code Block:</strong></p>
<pre><code class="lang-r"><span class="hljs-comment"># load required library</span>
<span class="hljs-keyword">library</span>(text)
<span class="hljs-comment"># # ensure to read loading instructions here for smooth running of the 'text' library</span>
<span class="hljs-comment"># # https://www.r-text.org/</span>
<span class="hljs-comment"># embedding data</span>
<span class="hljs-keyword">for</span> (i <span class="hljs-keyword">in</span> <span class="hljs-number">1</span>:length(chunks)) {
  recipe &lt;- as.character(chunks[i])
  recipe_id &lt;- paste0(<span class="hljs-string">"recipe"</span>,i)
  recipe_embeddings &lt;- textEmbed(as.character(recipe),
                                layers = <span class="hljs-number">10</span>:<span class="hljs-number">11</span>,
                                aggregation_from_layers_to_tokens = <span class="hljs-string">"concatenate"</span>,
                                aggregation_from_tokens_to_texts = <span class="hljs-string">"mean"</span>,
                                keep_token_embeddings = <span class="hljs-literal">FALSE</span>,
                                batch_size = <span class="hljs-number">1</span>
  )

  <span class="hljs-comment"># convert tibble to vector</span>
  recipe_vec_embeddings &lt;- unlist(recipe_embeddings, use.names = <span class="hljs-literal">FALSE</span>)
  recipe_vec_embeddings &lt;- list(recipe_vec_embeddings)

  <span class="hljs-comment"># Append the current chunk's data to the dataframe</span>
  recipe_sentence_embeddings &lt;- recipe_sentence_embeddings %&gt;%
    add_row(
      recipe = recipe,
      recipe_vec_embeddings = recipe_vec_embeddings,
      recipe_id = recipe_id
    )

  <span class="hljs-comment"># track embedding progress</span>
  setTxtProgressBar(pb, i)

}
</code></pre>
<h2 id="heading-how-to-set-up-the-vector-database-for-embedding-storage">How to Set Up the Vector Database for Embedding Storage</h2>
<p>A vector database is a special type of database that stores embeddings and allows you to query and retrieve relevant information. There are numerous vector databases available, but for this project, you will use ChromaDB, an open-source option that integrates with the R environment through the <code>rchroma</code> library.</p>
<p>ChromaDB runs locally in a Docker container. Just make sure you have Docker installed and running on your device.</p>
<p>Then load the rchroma library and run your ChromaDB instance:</p>
<pre><code class="lang-r"><span class="hljs-comment"># load rchroma library</span>
<span class="hljs-keyword">library</span>(rchroma)
<span class="hljs-comment"># run ChromaDB instance.</span>
chroma_docker_run()
</code></pre>
<p>If it was successful, you should see this in the console:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744383249217/bd8fb67c-0731-46f9-8a13-0747b4789714.png" alt="Confirm ChromaDB is running locally" class="image--center mx-auto" width="598" height="121" loading="lazy"></p>
<p>Next, connect to a local ChromaDB instance and check the connection:</p>
<pre><code class="lang-r"><span class="hljs-comment"># Connect to a local ChromaDB instance</span>
client &lt;- chroma_connect()

<span class="hljs-comment"># Check the connection</span>
heartbeat(client)
version(client)
</code></pre>
<p>Now you’ll need to create a collection and confirm that it was created. Collections in ChromaDB function similarly to tables in conventional databases.</p>
<pre><code class="lang-r"><span class="hljs-comment"># Create a new collection</span>
create_collection(client, <span class="hljs-string">"recipes_collection"</span>)

<span class="hljs-comment"># List all collections</span>
list_collections(client)
</code></pre>
<p>Now, add embeddings to the collection. To add embeddings to the <code>recipes_collection</code>, use the <code>add_documents</code> function.</p>
<pre><code class="lang-r"><span class="hljs-comment"># Add documents to the collection</span>
add_documents(
  client,
  <span class="hljs-string">"recipes_collection"</span>,
  documents = recipe_sentence_embeddings$recipe,
  ids = recipe_sentence_embeddings$recipe_id,
  embeddings = recipe_sentence_embeddings$recipe_vec_embeddings
)
</code></pre>
<p>The <code>add_documents()</code> function is used to add recipe data to the <code>recipes_collection</code>. Here's a breakdown of its arguments and how the corresponding data is accessed:</p>
<ol>
<li><p><code>documents</code>: This argument represents the recipe text. It is sourced from the <code>recipe</code> column of the <code>recipe_sentence_embeddings</code> dataframe.</p>
</li>
<li><p><code>ids</code>: This is the unique identifier for each recipe. It is extracted from the <code>recipe_id</code> column of the same dataframe.</p>
</li>
<li><p><code>embeddings</code>: This contains the sentence embeddings, which were previously generated for each recipe. These embeddings are accessed from the <code>recipe_vec_embeddings</code> column of the dataframe.</p>
</li>
</ol>
<p>All three arguments—<code>documents</code>, <code>ids</code>, and <code>embeddings</code>—are obtained by subsetting their respective columns from the <code>recipe_sentence_embeddings</code> dataframe.</p>
<h2 id="heading-how-to-write-the-user-input-query-embedding-function">How to Write the User Input Query Embedding Function</h2>
<p>In order to retrieve information from a vector database, you must first embed your query text. The database compares your query's embedding with its stored embeddings to find and retrieve the most relevant document.</p>
<p>It's important to ensure that the dimensions (rows × columns) of your query embedding match those of the database embeddings. This alignment is achieved by using the same embedding model to generate your query.</p>
<p>Matching embeddings involves calculating the similarity (for example, cosine similarity) between the query and stored embeddings, identifying the closest match for effective retrieval.</p>
<p>Let’s write a function that allows us to embed a query which then queries similar documents using the generated embeddings. Wrapping it in a function makes it reusable.</p>
<pre><code class="lang-r">  <span class="hljs-comment">#sentence embeddings function and query</span>
  question &lt;- <span class="hljs-keyword">function</span>(sentence){
    sentence_embeddings &lt;- textEmbed(sentence,
                                     layers = <span class="hljs-number">10</span>:<span class="hljs-number">11</span>,
                                     aggregation_from_layers_to_tokens = <span class="hljs-string">"concatenate"</span>,
                                     aggregation_from_tokens_to_texts = <span class="hljs-string">"mean"</span>,
                                     keep_token_embeddings = <span class="hljs-literal">FALSE</span>
    )

    <span class="hljs-comment"># convert tibble to vector</span>
    sentence_vec_embeddings &lt;- unlist(sentence_embeddings, use.names = <span class="hljs-literal">FALSE</span>)
    sentence_vec_embeddings &lt;- list(sentence_vec_embeddings)

    <span class="hljs-comment"># Query similar documents using embeddings</span>
    results &lt;- query(
      client,
      <span class="hljs-string">"recipes_collection"</span>,
      query_embeddings = sentence_vec_embeddings ,
      n_results = <span class="hljs-number">2</span>
    )
    results

  }
</code></pre>
<p>This chunk of code is similar to how we have previously used the <code>text_embed()</code> function. The <code>query()</code> function is added to enable querying the vector database, particularly the recipes' collection, and returns the top two documents that closely match a user’s query.</p>
<p>Our function thus takes in a sentence as an argument and embeds the sentence to generate sentence embeddings. It then queries the database and returns two documents that match the query most.</p>
<h2 id="heading-tool-calling">Tool Calling</h2>
<p>To interact with Ollama in R, you will utilize the <code>ellmer</code> library. This library streamlines the use of large language models (LLMs) by offering an interface that enables seamless access to and interaction with a variety of LLM providers.</p>
<p>To enhance the LLM’s usage, we need to provide context to it. You can do this by tool calling. Tool calling allows an LLM to access external resources in order to enhance its functionality.</p>
<p>For this project, we are implementing <a target="_blank" href="https://www.freecodecamp.org/news/learn-rag-fundamentals-and-advanced-techniques/">Retrieval-Augmented Generation (RAG)</a>, which combines retrieving relevant information from a vector database and generating responses using an LLM. This approach improves the chatbot's ability to provide accurate and contextually relevant answers.</p>
<p>Now, define a function that links to the LLM to provide context using the <code>tool()</code> function from the <code>ellmer</code> library.</p>
<pre><code class="lang-r"><span class="hljs-comment"># load ellmer library</span>
<span class="hljs-keyword">library</span>(ellmer)

<span class="hljs-comment"># function that links to llm to provide context</span>
  tool_context  &lt;- tool(
    question,
    <span class="hljs-string">"obtains the right context for a given question"</span>,
    sentence = type_string()

  )
</code></pre>
<p>The <code>tool()</code> function takes the question function that returns the relevant documents that we’ll use as context as the first argument. We’ll use the documents to help the LLM answer questions accordingly.</p>
<p>The text, "obtains the right context for a given question", is a description of what the tool will be doing.</p>
<p>Finally, the <code>sentence = type_string()</code> defines what type of object the <code>question()</code> function expects.</p>
<h2 id="heading-how-to-initialize-the-chat-system-design-prompts-and-integrate-tools">How to Initialize the Chat System, Design Prompts, and Integrate Tools</h2>
<p>Next, you’ll set up a conversational AI system by defining its role and functionality. Using system prompt design, you will shape the assistant’s behavior, tone, and focus as a culinary assistant. You’ll also integrate external tools to extend the chatbot’s capabilities by registering tools. Let’s dive in.</p>
<p>First, you need to initialize a Chat Object:</p>
<pre><code class="lang-r"><span class="hljs-comment">#  Initialize the chat system with propmpt instructions.</span>
  chat &lt;- chat_ollama(system_prompt = <span class="hljs-string">"You are a knowledgeable culinary assistant specializing in recipe recommendations. 
                      You provide tailored meal suggestions based on the user's available ingredients and the desired amount of food or servings.
                      Ensure the recipes align closely with the user's inputs and yield the expected quantity."</span>,
                      model = <span class="hljs-string">"llama3.2:3b-instruct-q4_K_M"</span>)
</code></pre>
<p>You can do that using the <code>chat_ollama()</code> function. This sets up a conversational agent with the specified system prompt and model.</p>
<p>The system prompt defines the conversational behavior, tone, and focus of the LLM while the model argument specifies the language model (<code>llama3.2:3b-instruct-q4_K_M</code>) that the chat system will use to generate responses.</p>
<p>Next, you need to register a tool.</p>
<pre><code class="lang-r"> <span class="hljs-comment">#register tool</span>
  chat$register_tool(tool_context)
</code></pre>
<p>We need to tell our chat object about our <code>tool_context()</code> function. Do this by registering a tool using the <code>register_tool()</code> function.</p>
<h2 id="heading-how-to-interact-with-your-chatbot-using-a-shiny-app"><strong>How to Interact with Your Chatbot Using a Shiny App</strong></h2>
<p>To interact with the chatbot you’ve just created, we’ll use <strong>Shiny</strong>, a framework for building interactive web applications in R. Shiny provides a user-friendly graphical interface that allows seamless interaction with the chatbot.</p>
<p>For this purpose, we’ll use the <strong>shinychat</strong> library, which simplifies the process of building a chat interface within a Shiny app. This involves defining two key components:</p>
<ol>
<li><p><strong>User Interface (UI)</strong>:</p>
<ul>
<li><p>Responsible for the visual layout and what the user sees.</p>
</li>
<li><p>In this case, <code>chat_ui("chat")</code> is used to create the interactive chat interface.</p>
</li>
</ul>
</li>
<li><p><strong>Server Function</strong>:</p>
<ul>
<li><p>Handles the functionality and logic of the application.</p>
</li>
<li><p>It connects the chatbot to external tools and manages processes like embedding queries, retrieving relevant responses, and handling user inputs.</p>
</li>
</ul>
</li>
</ol>
<pre><code class="lang-r"><span class="hljs-comment"># load the required library</span>
<span class="hljs-keyword">library</span>(shinychat)

<span class="hljs-comment"># wrap the chat code in a Shiny App</span>
ui &lt;- bslib::page_fluid(
  chat_ui(<span class="hljs-string">"chat"</span>)
)

server &lt;- <span class="hljs-keyword">function</span>(input, output, session) {
  <span class="hljs-comment"># Connect to a local ChromaDB instance running on docker with embeddings loaded</span>
  client &lt;- chroma_connect()

  <span class="hljs-comment">#sentence embeddings function and query</span>
  question &lt;- <span class="hljs-keyword">function</span>(sentence){
    sentence_embeddings &lt;- textEmbed(sentence,
                                     layers = <span class="hljs-number">10</span>:<span class="hljs-number">11</span>,
                                     aggregation_from_layers_to_tokens = <span class="hljs-string">"concatenate"</span>,
                                     aggregation_from_tokens_to_texts = <span class="hljs-string">"mean"</span>,
                                     keep_token_embeddings = <span class="hljs-literal">FALSE</span>
    )

    <span class="hljs-comment"># convert tibble to vector</span>
    sentence_vec_embeddings &lt;- unlist(sentence_embeddings, use.names = <span class="hljs-literal">FALSE</span>)
    sentence_vec_embeddings &lt;- list(sentence_vec_embeddings)

    <span class="hljs-comment"># Query similar documents using embeddings</span>
    results &lt;- query(
      client,
      <span class="hljs-string">"recipes_collection"</span>,
      query_embeddings = sentence_vec_embeddings ,
      n_results = <span class="hljs-number">2</span>
    )
    results

  }


  <span class="hljs-comment"># function that provides context</span>
  tool_context  &lt;- tool(
    question,
    <span class="hljs-string">"obtains the right context for a given question"</span>,
    sentence = type_string()

  )

  <span class="hljs-comment">#  Initialize the chat system with the first chunk</span>
  chat &lt;- chat_ollama(system_prompt = <span class="hljs-string">"You are a knowledgeable culinary assistant specializing in recipe recommendations. 
                      You provide tailored meal suggestions based on the user's available ingredients and the desired amount of food or servings.
                      Ensure the recipes align closely with the user's inputs and yield the expected quantity."</span>,
                      model = <span class="hljs-string">"llama3.2:3b-instruct-q4_K_M"</span>)
  <span class="hljs-comment">#register tool</span>
  chat$register_tool(tool_context)

  observeEvent(input$chat_user_input, {
    stream &lt;- chat$stream_async(input$chat_user_input)
    chat_append(<span class="hljs-string">"chat"</span>, stream)
  })
}

shinyApp(ui, server)
</code></pre>
<p>Alright, let’s understand how this is working:</p>
<ol>
<li><p><strong>User input monitoring with</strong> <code>observeEvent()</code>: The <code>observeEvent()</code> block monitors user inputs from the chat interface (<code>input$chat_user_input</code>). When a user sends a message, the chatbot processes it, retrieves relevant context using the embeddings, and streams the response dynamically to the chat interface.</p>
</li>
<li><p><strong>Tool calling for context</strong>: The chatbot employs tool calling to interact with external resources (like the vector database) and enhance its functionality. In this project, Retrieval-Augmented Generation (RAG) ensures the chatbot provides accurate and context-rich responses by integrating retrieval and generation seamlessly.</p>
</li>
</ol>
<p>This approach brings the chatbot to life, enabling users to interact with it dynamically through a responsive Shiny app.</p>
<h2 id="heading-complete-code">Complete Code</h2>
<p>The R scripts have been split in two, with <code>data.R</code> containing code that handles data gathering and cleaning, text chunking, sentence embeddings generation, creating a vector database, and loading documents to it.</p>
<p>The <code>chat.R</code> script contains code that handles user input querying, context retrieval, chat initialization, system prompt design, tool integration, and a chat Shiny app.</p>
<p><strong>data.R</strong></p>
<pre><code class="lang-r"><span class="hljs-comment"># install and load required packages</span>
<span class="hljs-comment"># install devtools from CRAN</span>
install.packages(<span class="hljs-string">'devtools'</span>)
devtools::install_github(<span class="hljs-string">"benyamindsmith/RKaggle"</span>)

<span class="hljs-keyword">library</span>(text)
<span class="hljs-keyword">library</span>(rchroma)
<span class="hljs-keyword">library</span>(RKaggle)
<span class="hljs-keyword">library</span>(dplyr)

<span class="hljs-comment"># run ChromaDB instance.</span>
chroma_docker_run()

<span class="hljs-comment"># Connect to a local ChromaDB instance</span>
client &lt;- chroma_connect()

<span class="hljs-comment"># Check the connection</span>
heartbeat(client)
version(client)


<span class="hljs-comment"># Create a new collection</span>
create_collection(client, <span class="hljs-string">"recipes_collection"</span>)

<span class="hljs-comment"># List all collections</span>
list_collections(client)

<span class="hljs-comment"># Download and read the "recipe" dataset from Kaggle</span>
recipes_list &lt;- RKaggle::get_dataset(<span class="hljs-string">"thedevastator/better-recipes-for-a-better-life"</span>)

<span class="hljs-comment"># extract the first tibble</span>
recipes_df &lt;- recipes_list[[<span class="hljs-number">1</span>]]

<span class="hljs-comment"># convert to dataframe and drop the first column</span>
recipes_df &lt;- as.data.frame(recipes_df[, -<span class="hljs-number">1</span>])

<span class="hljs-comment"># drop unnecessary columns</span>
cleaned_recipes_df &lt;- subset(recipes_df, select = -c(yield,rating,url,cuisine_path,nutrition,timing,img_src))

<span class="hljs-comment">## Replace NA values dynamically based on conditions</span>
<span class="hljs-comment"># Replace NA when all columns have NA values</span>
cols_to_modify &lt;- c(<span class="hljs-string">"prep_time"</span>, <span class="hljs-string">"cook_time"</span>, <span class="hljs-string">"total_time"</span>)
cleaned_recipes_df[cols_to_modify] &lt;- lapply(
  cleaned_recipes_df[cols_to_modify],
  <span class="hljs-keyword">function</span>(x, df) {
    <span class="hljs-comment"># Replace NA in prep_time and cook_time where both are NA</span>
    replace(x, is.na(df$prep_time) &amp; is.na(df$cook_time), <span class="hljs-string">"unknown"</span>)
  },
  df = cleaned_recipes_df  
)

<span class="hljs-comment"># Replace NA when either or columns have NA values</span>
cleaned_recipes_df &lt;- cleaned_recipes_df %&gt;%
  mutate(
    prep_time = case_when(
      <span class="hljs-comment"># If cook_time is present but prep_time is NA, replace with "no preparation required"</span>
      !is.na(cook_time) &amp; is.na(prep_time) ~ <span class="hljs-string">"no preparation required"</span>,
      <span class="hljs-comment"># Otherwise, retain original value</span>
      <span class="hljs-literal">TRUE</span> ~ as.character(prep_time)
    ),
    cook_time = case_when(
      <span class="hljs-comment"># If prep_time is present but cook_time is NA, replace with "no cooking required"</span>
      !is.na(prep_time) &amp; is.na(cook_time) ~ <span class="hljs-string">"no cooking required"</span>,
      <span class="hljs-comment"># Otherwise, retain original value</span>
      <span class="hljs-literal">TRUE</span> ~ as.character(cook_time)
    )
  )

<span class="hljs-comment"># chunk the dataset</span>
chunk_size &lt;- <span class="hljs-number">1</span>
n &lt;- nrow(cleaned_recipes_df)
r &lt;- rep(<span class="hljs-number">1</span>:ceiling(n/chunk_size),each = chunk_size)[<span class="hljs-number">1</span>:n]
chunks &lt;- split(cleaned_recipes_df,r)

<span class="hljs-comment">#empty dataframe</span>
recipe_sentence_embeddings &lt;-  data.frame(
  recipe = character(),
  recipe_vec_embeddings = I(list()),
  recipe_id = character()
)

<span class="hljs-comment"># create a progress bar</span>
pb &lt;- txtProgressBar(min = <span class="hljs-number">1</span>, max = length(chunks), style = <span class="hljs-number">3</span>)

<span class="hljs-comment"># embedding data</span>
<span class="hljs-keyword">for</span> (i <span class="hljs-keyword">in</span> <span class="hljs-number">1</span>:length(chunks)) {
  recipe &lt;- as.character(chunks[i])
  recipe_id &lt;- paste0(<span class="hljs-string">"recipe"</span>,i)
  recipe_embeddings &lt;- textEmbed(as.character(recipe),
                                layers = <span class="hljs-number">10</span>:<span class="hljs-number">11</span>,
                                aggregation_from_layers_to_tokens = <span class="hljs-string">"concatenate"</span>,
                                aggregation_from_tokens_to_texts = <span class="hljs-string">"mean"</span>,
                                keep_token_embeddings = <span class="hljs-literal">FALSE</span>,
                                batch_size = <span class="hljs-number">1</span>
  )

  <span class="hljs-comment"># convert tibble to vector</span>
  recipe_vec_embeddings &lt;- unlist(recipe_embeddings, use.names = <span class="hljs-literal">FALSE</span>)
  recipe_vec_embeddings &lt;- list(recipe_vec_embeddings)

  <span class="hljs-comment"># Append the current chunk's data to the dataframe</span>
  recipe_sentence_embeddings &lt;- recipe_sentence_embeddings %&gt;%
    add_row(
      recipe = recipe,
      recipe_vec_embeddings = recipe_vec_embeddings,
      recipe_id = recipe_id
    )

  <span class="hljs-comment"># track embedding progress</span>
  setTxtProgressBar(pb, i)

}

<span class="hljs-comment"># Add documents to the collection</span>
add_documents(
  client,
  <span class="hljs-string">"recipes_collection"</span>,
  documents = recipe_sentence_embeddings$recipe,
  ids = recipe_sentence_embeddings$recipe_id,
  embeddings = recipe_sentence_embeddings$recipe_vec_embeddings
)
</code></pre>
<p><strong>chat.R</strong></p>
<pre><code class="lang-r"><span class="hljs-comment"># Load required packages</span>
<span class="hljs-keyword">library</span>(ellmer)
<span class="hljs-keyword">library</span>(text)
<span class="hljs-keyword">library</span>(rchroma)
<span class="hljs-keyword">library</span>(shinychat)

ui &lt;- bslib::page_fluid(
  chat_ui(<span class="hljs-string">"chat"</span>)
)

server &lt;- <span class="hljs-keyword">function</span>(input, output, session) {
  <span class="hljs-comment"># Connect to a local ChromaDB instance running on docker with embeddings loaded </span>
  client &lt;- chroma_connect()

  <span class="hljs-comment"># sentence embeddings function and query</span>
  question &lt;- <span class="hljs-keyword">function</span>(sentence){
    sentence_embeddings &lt;- textEmbed(sentence,
                                     layers = <span class="hljs-number">10</span>:<span class="hljs-number">11</span>,
                                     aggregation_from_layers_to_tokens = <span class="hljs-string">"concatenate"</span>,
                                     aggregation_from_tokens_to_texts = <span class="hljs-string">"mean"</span>,
                                     keep_token_embeddings = <span class="hljs-literal">FALSE</span>
    )

    <span class="hljs-comment"># convert tibble to vector</span>
    sentence_vec_embeddings &lt;- unlist(sentence_embeddings, use.names = <span class="hljs-literal">FALSE</span>)
    sentence_vec_embeddings &lt;- list(sentence_vec_embeddings)

    <span class="hljs-comment"># Query similar documents</span>
    results &lt;- query(
      client,
      <span class="hljs-string">"recipes_collection"</span>,
      query_embeddings = sentence_vec_embeddings ,
      n_results = <span class="hljs-number">2</span>
    )
    results

  }


  <span class="hljs-comment"># function that provides context</span>
  tool_context  &lt;- tool(
    question,
    <span class="hljs-string">"obtains the right context for a given question"</span>,
    sentence = type_string()

  )

  <span class="hljs-comment">#  Initialize the chat system </span>
  chat &lt;- chat_ollama(system_prompt = <span class="hljs-string">"You are a knowledgeable culinary assistant specializing in recipe recommendations. 
                      You provide tailored meal suggestions based on the user's available ingredients and the desired amount of food or servings.
                      Ensure the recipes align closely with the user's inputs and yield the expected quantity."</span>,
                      model = <span class="hljs-string">"llama3.2:3b-instruct-q4_K_M"</span>)
  <span class="hljs-comment">#register tool</span>
  chat$register_tool(tool_context)

  observeEvent(input$chat_user_input, {
    stream &lt;- chat$stream_async(input$chat_user_input)
    chat_append(<span class="hljs-string">"chat"</span>, stream)
  })
}

shinyApp(ui, server)
</code></pre>
<p>You can find the complete code <a target="_blank" href="https://github.com/elabongaatuo/Recipe-Chatbot/">here</a>.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Building a local Retrieval-Augmented Generation (RAG) application using Ollama and ChromaDB in R programming offers a powerful way to create a specialized conversational assistant.</p>
<p>By leveraging the capabilities of large language models and vector databases, you can efficiently manage and retrieve relevant information from extensive datasets.</p>
<p>This approach not only enhances the performance of language models but also ensures customization and privacy by running the application locally.</p>
<p>Whether you're developing a cooking assistant or any other domain-specific chatbot, this method provides a robust framework for delivering intelligent and contextually aware responses.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744380659737/4e096d1c-87d6-4baa-bbf3-03657e05c182.gif" alt="Chatbot running on Shiny giving relevant recipe after user prompt" class="image--center mx-auto" width="800" height="903" loading="lazy"></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Web Scraping With RSelenium (Chrome Driver) and Rvest ]]>
                </title>
                <description>
                    <![CDATA[ Web scraping lets you automatically extract data from websites, so you can store it in a structured format for later use. In this article, you'll explore how to use popular R libraries for web scraping to extract data from a website. The target websi... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/web-scraping-with-rselenium-chrome-driver-and-rvest/</link>
                <guid isPermaLink="false">67d8272af45871e3e821d5fa</guid>
                
                    <category>
                        <![CDATA[ Rselenium ]]>
                    </category>
                
                    <category>
                        <![CDATA[ RVest ]]>
                    </category>
                
                    <category>
                        <![CDATA[ selenium ]]>
                    </category>
                
                    <category>
                        <![CDATA[ R Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ webscraping  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ chromedriver ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Elabonga Atuo ]]>
                </dc:creator>
                <pubDate>Mon, 17 Mar 2025 13:44:10 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1742219025681/47c07711-cfa5-482f-a72b-d127bc5b63bc.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Web scraping lets you automatically extract data from websites, so you can store it in a structured format for later use.</p>
<p>In this article, you'll explore how to use popular R libraries for web scraping to extract data from a website. The target website displays different books across multiple pages, requiring navigation between them. You'll learn how to use RVest for data extraction and RSelenium to automate button clicks.</p>
<p>There are a couple of housekeeping rules when it comes to harvesting data on the internet:</p>
<ul>
<li><p><strong>Inspect the robots.txt file</strong>: Check the robots.txt file of a website to understand what data you are allowed to extract. You can find this file by appending “/robots.txt” to the website's home URL.</p>
</li>
<li><p><strong>Review terms and conditions</strong>: Before scraping, read the website's terms and conditions to understand the legal expectations regarding data extraction.</p>
</li>
<li><p><strong>Limit requests</strong>: Avoid overloading the server with requests by implementing rate limiting. The <a target="_blank" href="https://dmi3kno.github.io/polite/">polite</a> library in R can help manage request rates effectively.</p>
</li>
</ul>
<p>Let’s dive in!</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-project-overview">Project Overview</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-project-setup">Project Setup</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-understand-and-inspect-a-webpage">How to Understand and Inspect a Webpage</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-extract-data-using-rvest">How to Extract Data Using RVest</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-mimic-human-behaviour-using-rselenium">How to Mimic Human Behaviour Using RSelenium</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-combine-rselenium-amp-rvest-and-save-to-csv">How to Combine RSelenium &amp; RVest and Save to CSV</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-bringing-it-all-together">Bringing it All Together</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-project-overview">Project Overview</h2>
<p>Here’s what we’re going to be building:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739891904874/e10f91f5-f5ba-4a9d-82d7-bd297b409b1b.gif" alt="e10f91f5-f5ba-4a9d-82d7-bd297b409b1b" class="image--center mx-auto" width="800" height="450" loading="lazy"></p>
<p>This approach to web scraping allows you to see the browser in action as it navigates and extracts data from the website. Unlike headless browsing, where everything runs in the background without a visible interface, this method provides a graphical UI, making it easier to monitor and debug the process.</p>
<p>To practice your data mining skills, you will be scraping data from a website built specifically for that: <a target="_blank" href="https://books.toscrape.com/">Books To Scrape</a>. You are going to be using a driver to drive a browser which will then open your target website. It’ll navigate from the first page, mimicking human behaviour (clicking the next button) while collecting data about the books, right to the last page.</p>
<h2 id="heading-project-setup">Project Setup</h2>
<h3 id="heading-prerequisites"><strong>Prerequisites:</strong></h3>
<p>To follow along with this tutorial, you will need:</p>
<ul>
<li><p>R programming knowledge</p>
</li>
<li><p>HTML knowledge</p>
</li>
<li><p>R Studio installed</p>
</li>
</ul>
<p>Note that I’m building this tutorial on a Windows machine.</p>
<h3 id="heading-setup-and-install-chrome-driver">Setup and Install Chrome Driver</h3>
<p>First, you’ll want to check to make sure you have Java installed on your computer by running this terminal command:</p>
<pre><code class="lang-bash">java -version
</code></pre>
<p>If it’s not present, download and install Java <a target="_blank" href="https://www.java.com/en/download/">here</a>.</p>
<p>Next, install the Chrome browser if you don’t already have it. Once it’s installed, check for your browser version in the settings section.</p>
<p>Then you can download the Browser Driver that corresponds to your Browser Version <a target="_blank" href="https://developer.chrome.com/docs/chromedriver/downloads/version-selection">here</a>. Check where other browser drivers are stored on your device by running this in RStudio terminal:</p>
<pre><code class="lang-r"><span class="hljs-comment"># install and load wdman and binman packages</span>
install.packages(<span class="hljs-string">"wdman"</span>)
<span class="hljs-keyword">library</span>(wdman)

install.packages(<span class="hljs-string">"binman"</span>)
<span class="hljs-keyword">library</span>(binman)

<span class="hljs-comment"># check drivers already installed</span>
binman::list_versions(appname = <span class="hljs-string">"chromedriver"</span>)

<span class="hljs-comment"># check browser driver locations</span>
wdman::selenium(retcommand = <span class="hljs-literal">TRUE</span>, check = <span class="hljs-literal">FALSE</span>)
</code></pre>
<p>Extract the driver “.exe“ and store it at the specified folder location. This is usually the following location:</p>
<pre><code class="lang-bash"><span class="hljs-string">"C:\Users\YourName\AppData\Local\binman\binman_chromedriver\win32\version\chromedriver.exe"</span>
</code></pre>
<p>Now, add the drivers to your system path by specifying the folder path excluding the application. Confirm installation by running the following terminal command.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Chromedriver SYSTEMS PATH: "C:\Users\YourName\AppData\Local\binman\binman_chromedriver\win32\version\"</span>
<span class="hljs-comment"># check chromedriver installation</span>
chromedriver -version
</code></pre>
<h2 id="heading-how-to-understand-and-inspect-a-webpage">How to Understand and Inspect a Webpage</h2>
<p>A webpage is a visual representation of an HTML document that is available on the internet and accessed through a web browser. The components of a webpage, called elements, are structured hierarchically in a HTML DOM (Document Object Model) tree. Each element can be located using specific paths called selectors or locators, which you can read more about <a target="_blank" href="https://testrigor.com/blog/css-selector-vs-xpath-your-pocket-cheat-sheet/">here</a>.</p>
<p>Developer Tools are a set of tools available in your browser. They’re helpful for inspecting and analyzing a webpage’s structure. The feature “Inspect“ helps examine the structure and styling of a specific element. You can access this feature by selecting the element you would like to inspect, right clicking on it, and clicking “Inspect”.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739974770342/59c960b1-2c88-4c1d-a23d-d9e9fee91dc5.gif" alt="Inspecting an element" class="image--center mx-auto" width="1366" height="728" loading="lazy"></p>
<h2 id="heading-how-to-extract-data-using-rvest">How to Extract Data Using RVest</h2>
<p>RVest is an R package that contains a set of functions that enables you to extract data from HTML and XML web pages</p>
<p>We are interested in extracting the following information about books from every page on the website’s catalogue:</p>
<ul>
<li><p>Book Title</p>
</li>
<li><p>Book Rating</p>
</li>
<li><p>Book Price</p>
</li>
<li><p>Individual Book Link</p>
</li>
<li><p>Cover Image Link</p>
</li>
</ul>
<p>Let’s go through the steps for using RVest to extract this data.</p>
<h3 id="heading-step-1-load-the-webpage"><strong>Step 1: Load the webpage</strong></h3>
<p>To load the first page of your target website and parse the HTML document using the RVest package in R, follow these steps:</p>
<ol>
<li><p><strong>Install and load the RVest package</strong>: If you haven't already installed the RVest package, you can do so by running the following command in R:</p>
<pre><code class="lang-r"> install.packages(<span class="hljs-string">"rvest"</span>)
</code></pre>
<p> Then, load the package:</p>
<pre><code class="lang-r"> <span class="hljs-keyword">library</span>(rvest)
</code></pre>
</li>
<li><p><strong>Load the webpage and parse the HTML</strong>: Use the <code>read_html()</code> function from the RVest package to fetch and parse the HTML content of the webpage. Here's an example of how to do this:</p>
<pre><code class="lang-r"> <span class="hljs-comment"># Specify the URL of the target website</span>
 url &lt;- <span class="hljs-string">"https://books.toscrape.com/"</span>

 <span class="hljs-comment"># Fetch and parse the HTML content</span>
 webpage &lt;- read_html(url)
</code></pre>
</li>
</ol>
<p>This code will download the HTML content of the specified webpage and convert it into an XML document, making it easier to structure and organize the data for further processing or storage.</p>
<h3 id="heading-step-2-identify-the-target-elements"><strong>Step 2: Identify the target elements</strong></h3>
<p>The target elements are the HTML elements that contain the specific data you intend to extract.</p>
<p>A quick inspection of the webpage using developer tools shows that the each book’s information is contained in an <code>article</code> tag and forms part of an ordered list. It’s important to specify the <code>&lt;ol&gt;</code> tag in the path, as there are other lists in the tree.</p>
<p>The pipe <code>%&gt;%</code> operator facilitates chaining operations, making it easier to extract elements step by step. <code>html_element()</code> returns the first matching element while <code>html_elements()</code> returns all the elements that match the defined path.</p>
<pre><code class="lang-r"><span class="hljs-comment"># define the path from which other details will be extracted</span>
book &lt;- books %&gt;% html_element(<span class="hljs-string">"ol"</span>)  %&gt;% html_elements(<span class="hljs-string">"li"</span>) %&gt;% html_element(<span class="hljs-string">"article"</span>)

<span class="hljs-comment"># extracting details using css locators.</span>
<span class="hljs-comment"># title</span>
title &lt;- book %&gt;% 
  html_element(<span class="hljs-string">"h3 a"</span>) %&gt;% 
  html_attr(<span class="hljs-string">"title"</span>)

<span class="hljs-comment"># rating</span>
rating &lt;- book %&gt;% 
  html_element(<span class="hljs-string">"p"</span>) %&gt;% 
  html_attr(<span class="hljs-string">"class"</span>)

<span class="hljs-comment"># price</span>
price &lt;- book %&gt;% 
  html_element(<span class="hljs-string">".product_price p"</span>) %&gt;% 
  html_text2()

<span class="hljs-comment">#link to book page</span>
book_link &lt;- book %&gt;% 
  html_element(<span class="hljs-string">"h3 a"</span>) %&gt;% 
  html_attr(<span class="hljs-string">"href"</span>)

<span class="hljs-comment"># cover page image link</span>
cover_page_link &lt;- book %&gt;% 
  html_element(<span class="hljs-string">".image_container a img"</span>) %&gt;% 
  html_attr(<span class="hljs-string">"src"</span>)

<span class="hljs-comment"># inspect right format by selecting the first element of each detail</span>
title[[<span class="hljs-number">1</span>]]
rating[[<span class="hljs-number">1</span>]]
price[[<span class="hljs-number">1</span>]]
book_link[[<span class="hljs-number">1</span>]]
cover_page_link[[<span class="hljs-number">1</span>]]
</code></pre>
<h3 id="heading-step-3-clean-the-rating-data"><strong>Step 3: Clean the “rating” data</strong></h3>
<p>To clean the "star-rating" data, you can use the <code>stringr</code> package in R to remove the unnecessary text and trim any whitespace. Here's how you can do it:</p>
<pre><code class="lang-r"><span class="hljs-keyword">library</span>(stringr)

<span class="hljs-comment"># Example of extracted rating data</span>
rating_data &lt;- <span class="hljs-string">"star-rating Three"</span>

<span class="hljs-comment"># Remove "star-rating " and trim whitespace</span>
cleaned_rating &lt;- str_trim(str_replace(rating_data, <span class="hljs-string">"star-rating "</span>, <span class="hljs-string">""</span>))

<span class="hljs-comment"># Output the cleaned rating</span>
cleaned_rating
</code></pre>
<p>This code will output "Three", effectively removing the "star-rating" prefix and any leading or trailing whitespace.</p>
<h2 id="heading-how-to-mimic-human-behaviour-using-rselenium">How to Mimic Human Behaviour Using RSelenium</h2>
<h3 id="heading-how-selenium-works"><strong>How Selenium Works</strong></h3>
<p>Selenium is a tool that allows you to simulate user actions on a website, usually for testing purposes. RSelenium is an R library that allows you to access this functionality.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739961235501/f358a1e1-6a2f-45dd-a0b0-12925811cab1.png" alt="Diagram illustrating Selenium's architecture. It shows a client with a Selenium script communicating with a server's browser driver using JSON Wire Protocol over HTTP. The server then sends a HTTP request to a browser" class="image--center mx-auto" width="1490" height="573" loading="lazy"></p>
<p>We need a script, a browser, and browser driver to mimic user behaviour. The code you write that contains the instructions detailing the actions you would like to automate is the script. The browser driver acts as a bridge between your script and the browser and performs your desired actions by translating the script into actions.</p>
<p>The script, when run, is the client which requests and receives info from the browser driver’s server.</p>
<p>When you run a script, the script is converted to JSON format data which is then transferred to the browser driver via the JSON Wire Protocol. A protocol is simply a set of rules that define how data should be managed and handle during transfer across devices.</p>
<p>The driver receives and validates the received data. If successful, it communicates the actions defined in the script to the browser. If it’s unsuccessful, an error is sent to the client.</p>
<p>On browser initialization, the driver performs the actions step by step. This carries on to completion or until an error is encountered (missing elements, server errors, and so on). The bidirectional communication between the driver and browser is via HTTP. Finally, the results are sent back to the client and the browser is shut down.</p>
<h3 id="heading-automating-page-navigation-and-data-collection-with-rselenium">Automating Page Navigation and Data Collection with RSelenium</h3>
<pre><code class="lang-r"><span class="hljs-comment"># install and load RSelenium</span>
install.packages(<span class="hljs-string">"RSelenium"</span>)
<span class="hljs-keyword">library</span>(RSelenium)

<span class="hljs-comment"># initialize and run the chrome driver</span>
rD &lt;- rsDriver(browser = <span class="hljs-string">"chrome"</span>, port = <span class="hljs-number">4567L</span>)

<span class="hljs-comment"># extract and assign the client</span>
remDr &lt;- rD[[<span class="hljs-string">"client"</span>]]
</code></pre>
<p>Running <code>rsDriver()</code> starts a Selenium server that launches ChromeDriver. Extract and assign the <code>rD[["client"]]</code> to a variable. This variable allows you to control and interact with the browser.</p>
<p>Sometimes, starting the driver may fail due to reasons such as permission restrictions, missing dependencies, or incorrect setup. If that happens, you can manually launch ChromeDriver by adding the following block of code right after loading the libraries at the top of the script. It is important to ensure the port numbers match.</p>
<pre><code class="lang-r">cDrv &lt;- chrome(verbose = <span class="hljs-literal">FALSE</span>, check = <span class="hljs-literal">FALSE</span>, port = <span class="hljs-number">4567L</span>)
cDrv$process
</code></pre>
<p>Now, navigate to the target webpage:</p>
<pre><code class="lang-r"><span class="hljs-comment"># naivigate to the target site</span>
remDr$navigate(<span class="hljs-string">"https://books.toscrape.com/"</span>)

<span class="hljs-comment">#maximize Chrome Window Size</span>
remDr$maxWindowSize()
</code></pre>
<p>And scroll to the bottom of the page:</p>
<pre><code class="lang-r"><span class="hljs-comment"># scroll to the bottom of the page</span>
webElem &lt;- remDr$findElement(<span class="hljs-string">"css"</span>, <span class="hljs-string">"body"</span>)
webElem$sendKeysToElement(list(key = <span class="hljs-string">"end"</span>))
</code></pre>
<p>The above code locates the body element and simulates pressing the down key to the end of the page.</p>
<p>Now, click Next to navigate to the next page:</p>
<pre><code class="lang-r"><span class="hljs-comment"># locate next button and click next</span>
nextPage &lt;-  remDr$findElement(using = <span class="hljs-string">"css selector"</span>,
                               value = <span class="hljs-string">".next &gt; a"</span>)
nextPage$clickElement()
</code></pre>
<p>Find the element that contains the link to the next page and click on it to redirect you.</p>
<p>Now we’re going to write a while loop that navigates through all the pages, up to page 50, and then closes the browser once it’s done.</p>
<p>A while loop executes a piece of code as long as a specific condition is met. Once the condition is not met, the loop exits.</p>
<pre><code class="lang-r"><span class="hljs-keyword">while</span>(condition is <span class="hljs-literal">TRUE</span>){
    <span class="hljs-comment">#DO SOMETHING</span>
}
</code></pre>
<p>Write a loop that ensures the next page button is clicked as long as the element containing the link to the next page is visible in the HTML DOM.</p>
<p>First, locate the next button element. Its presence in the open webpage makes sure that the loop runs.</p>
<p>The last page does not have a next button, so the loop will exit when it reaches that page (and Selenium will throw an error due to the missing element).</p>
<pre><code class="lang-r">nextPage &lt;- remDr$findElement(using = <span class="hljs-string">"css selector"</span>, value = <span class="hljs-string">".next &gt; a"</span>)
</code></pre>
<p>Wrap the nextPage element search in a <code>tryCatch()</code> block. This prevents the script from crashing if the 'Next' button is missing. If an error occurs, <code>tryCatch()</code> returns <code>NULL</code>, signaling that there are no more pages to navigate.</p>
<p>An <code>if</code> block then checks for a <code>NULL</code> value. If encountered, a message is displayed to inform the client that no 'Next' button was found, and the <code>break</code> statement exits the loop.</p>
<p>Finally, close the browser once the driver navigates to the last page (page 50 in the catalogue) to free up system resources using <code>remDr$close()</code>.</p>
<pre><code class="lang-r">
<span class="hljs-keyword">while</span> (<span class="hljs-literal">TRUE</span>) {  
  <span class="hljs-comment"># Try to find and click "Next" button</span>
  nextPage &lt;- <span class="hljs-keyword">tryCatch</span>({
    remDr$findElement(using = <span class="hljs-string">"css selector"</span>, value = <span class="hljs-string">".next &gt; a"</span>)
  }, error = <span class="hljs-keyword">function</span>(e) {
    <span class="hljs-keyword">return</span>(<span class="hljs-literal">NULL</span>)  <span class="hljs-comment"># No more pages</span>
  })

  <span class="hljs-keyword">if</span> (is.null(nextPage)) {
    message(<span class="hljs-string">"No 'Next' button found. Exiting loop."</span>)
    <span class="hljs-keyword">break</span>
  }

  nextPage$clickElement()
  Sys.sleep(<span class="hljs-number">3</span>)  <span class="hljs-comment"># Allow next page to load</span>

}
print(<span class="hljs-string">"finished scraping"</span>)
remDr$close()
</code></pre>
<h2 id="heading-how-to-combine-rselenium-amp-rvest-and-save-to-csv">How to Combine RSelenium &amp; RVest and Save to CSV</h2>
<p>Now that we’ve extracted data from specific HTML elements using RVest and automated user actions using RSelenium, let’s combine the two to scrape data from all the pages in the website.</p>
<h3 id="heading-create-a-scrape-books-function"><strong>Create a scrape books function</strong></h3>
<p>You will be saving the scraped books information in a CSV file. First, create an empty dataframe to hold the scraped data:</p>
<pre><code class="lang-r"><span class="hljs-comment"># install and load dplyr for dataframe manipulation</span>
install.packages(<span class="hljs-string">"dplyr"</span>)
<span class="hljs-keyword">library</span>(dplyr)

<span class="hljs-comment"># create a dataframe to hold book information</span>
Books &lt;-  data.frame()
</code></pre>
<h3 id="heading-retrieve-and-parse-the-webpage">Retrieve and parse the webpage</h3>
<p>For Rvest to work with RSelenium, you have to retrieve the HTML source of the currently loaded webpage within the Selenium-controlled browser using <code>remDr$getPageSource()[[1]]</code> to extract the HMTL content.</p>
<pre><code class="lang-r">page &lt;- remDr$getPageSource()[[<span class="hljs-number">1</span>]]
</code></pre>
<p>Convert the HTML content to XML using <code>read_html()</code> like this:</p>
<pre><code class="lang-r"> <span class="hljs-comment"># define the path from which other details will be extracted</span>
    books &lt;- read_html(page)  %&gt;% html_element(<span class="hljs-string">"ol"</span>)  %&gt;% html_elements(<span class="hljs-string">"li"</span>) %&gt;% html_element(<span class="hljs-string">"article"</span>)
</code></pre>
<p>Extract each book’s details using CSS selectors with <code>rvest</code> functions. The scraped objects returned are XML objects and lists. They need to be formatted to character strings, preventing unexpected data type issues when working with the data. Do this by piping <code>as.character()</code> at the very end of each extracted detail.</p>
<pre><code class="lang-r">    <span class="hljs-comment"># title</span>
    title &lt;- book %&gt;% 
      html_element(<span class="hljs-string">"h3 a"</span>) %&gt;% 
      html_attr(<span class="hljs-string">"title"</span>) %&gt;% 
      as.character()
</code></pre>
<p>Wrap the block of code used to extract details from HTML elements in a function and return a dataframe whose column values are the book details. This makes the code reusable and modular.</p>
<pre><code class="lang-r">
scrape_books &lt;- <span class="hljs-keyword">function</span>() {
    page &lt;- remDr$getPageSource()[[<span class="hljs-number">1</span>]]

    <span class="hljs-comment"># define the path from which other details will be extracted</span>
    books &lt;- read_html(page)  %&gt;% html_element(<span class="hljs-string">"ol"</span>)  %&gt;% html_elements(<span class="hljs-string">"li"</span>) %&gt;% html_element(<span class="hljs-string">"article"</span>)

    <span class="hljs-comment"># extracting details using css locators.</span>
    <span class="hljs-comment"># title</span>
    title &lt;- book %&gt;% 
      html_element(<span class="hljs-string">"h3 a"</span>) %&gt;% 
      html_attr(<span class="hljs-string">"title"</span>) %&gt;% 
      as.character() 

    <span class="hljs-comment"># rating</span>
    rating &lt;- book %&gt;% 
      html_element(<span class="hljs-string">"p"</span>) %&gt;% 
      html_attr(<span class="hljs-string">"class"</span>) %&gt;% 
      as.character() 

    cleaned_rating &lt;- str_trim(gsub(<span class="hljs-string">"star-rating"</span>, <span class="hljs-string">""</span>, rating))

    <span class="hljs-comment"># price</span>
    price &lt;- book %&gt;% 
      html_element(<span class="hljs-string">".product_price p"</span>) %&gt;% 
      html_text2() %&gt;% 
      as.character() 

    <span class="hljs-comment">#link to book page</span>
    book_link &lt;- book %&gt;% 
      html_element(<span class="hljs-string">"h3 a"</span>) %&gt;% 
      html_attr(<span class="hljs-string">"href"</span>) %&gt;% 
      as.character() 

    <span class="hljs-comment"># image link</span>
    cover_page_link &lt;- book %&gt;% 
      html_element(<span class="hljs-string">".image_container a img"</span>) %&gt;% 
      html_attr(<span class="hljs-string">"src"</span>) %&gt;% 
      as.character() 

    <span class="hljs-keyword">return</span>(data.frame(title,cleaned_rating,price,book_link,cover_page_link, stringsAsFactors = <span class="hljs-literal">FALSE</span>))
}
</code></pre>
<h3 id="heading-write-to-csv"><strong>Write to CSV</strong></h3>
<p>Save the dataframe to a CSV file saved as “books.csv“:</p>
<pre><code class="lang-r">write.csv(Books, file = <span class="hljs-string">"./books.csv"</span>, fileEncoding = <span class="hljs-string">"UTF-8"</span>)
</code></pre>
<h2 id="heading-bringing-it-all-together">Bringing it All Together</h2>
<p>Let’s review what we’ve done so far: First, the script to scrape book data begins by loading the browser, maximizing the window size, and navigating to the Books To Scrape Page.</p>
<p>Then we created an empty dataframe to hold the scraped data. We then scraped the data from the first page, saved it to the dataframe, and located the ‘Next‘ button in order to navigate to the next page – from which we scraped data and stored it.</p>
<p>The process of scraping, adding to the dataframe, and clicking the next page button continues until the ‘Next’ button is no longer available in the HTML DOM.</p>
<p>Once the last page has been reached, the code exits the loop and saves the data to CSV. Finally, it closes the driver to free up system resources.</p>
<pre><code class="lang-r"><span class="hljs-comment"># load libraries</span>
<span class="hljs-keyword">library</span>(wdman)
<span class="hljs-keyword">library</span>(binman)
<span class="hljs-keyword">library</span>(rvest)
<span class="hljs-keyword">library</span>(stringr)
<span class="hljs-keyword">library</span>(RSelenium)
<span class="hljs-keyword">library</span>(dplyr)


cDrv &lt;- chrome(verbose = <span class="hljs-literal">FALSE</span>, check = <span class="hljs-literal">FALSE</span>, port = <span class="hljs-number">4450L</span>)
cDrv$process

rD &lt;- rsDriver(browser = <span class="hljs-string">"chrome"</span>, port = <span class="hljs-number">4450L</span>)
remDr &lt;- rD[[<span class="hljs-string">"client"</span>]]


remDr$navigate(<span class="hljs-string">"https://books.toscrape.com/"</span>)
remDr$maxWindowSize()

page &lt;- remDr$getPageSource()[[<span class="hljs-number">1</span>]]
webElem &lt;- remDr$findElement(<span class="hljs-string">"css"</span>, <span class="hljs-string">"body"</span>)
webElem$sendKeysToElement(list(key = <span class="hljs-string">"end"</span>))

nextPage &lt;-  remDr$findElement(using = <span class="hljs-string">"css selector"</span>,
                               value = <span class="hljs-string">".next &gt; a"</span>)
nextPage$clickElement()


<span class="hljs-comment"># converting the lists containg the scraped data into a dataframe </span>
Books &lt;-  data.frame(title = character(), rating = character(), stringsAsFactors = <span class="hljs-literal">FALSE</span>)

scrape_books &lt;- <span class="hljs-keyword">function</span>() {
    page &lt;- remDr$getPageSource()[[<span class="hljs-number">1</span>]]

    <span class="hljs-comment"># define the path from which other details will be extracted</span>
    books &lt;- read_html(page)  %&gt;% html_element(<span class="hljs-string">"ol"</span>)  %&gt;% html_elements(<span class="hljs-string">"li"</span>) %&gt;% html_element(<span class="hljs-string">"article"</span>)

    <span class="hljs-comment"># extracting details using css locators.</span>
    <span class="hljs-comment"># title</span>
    title &lt;- book %&gt;% 
      html_element(<span class="hljs-string">"h3 a"</span>) %&gt;% 
      html_attr(<span class="hljs-string">"title"</span>) %&gt;% 
      as.character() 

    <span class="hljs-comment"># rating</span>
    rating &lt;- book %&gt;% 
      html_element(<span class="hljs-string">"p"</span>) %&gt;% 
      html_attr(<span class="hljs-string">"class"</span>) %&gt;% 
      as.character() 

    cleaned_rating &lt;- str_trim(gsub(<span class="hljs-string">"star-rating"</span>, <span class="hljs-string">""</span>, rating))

    <span class="hljs-comment"># price</span>
    price &lt;- book %&gt;% 
      html_element(<span class="hljs-string">".product_price p"</span>) %&gt;% 
      html_text2() %&gt;% 
      as.character() 

    <span class="hljs-comment">#link to book page</span>
    book_link &lt;- book %&gt;% 
      html_element(<span class="hljs-string">"h3 a"</span>) %&gt;% 
      html_attr(<span class="hljs-string">"href"</span>) %&gt;% 
      as.character() 

    <span class="hljs-comment"># image link</span>
    cover_page_link &lt;- book %&gt;% 
      html_element(<span class="hljs-string">".image_container a img"</span>) %&gt;% 
      html_attr(<span class="hljs-string">"src"</span>) %&gt;% 
      as.character() 

    <span class="hljs-keyword">return</span>(data.frame(title,cleaned_rating,price,book_link,cover_page_link, stringsAsFactors = <span class="hljs-literal">FALSE</span>))
}

<span class="hljs-comment"># scrape first page</span>
Books &lt;- rbind(Books, scrape_books())

<span class="hljs-keyword">while</span> (<span class="hljs-literal">TRUE</span>) {
  <span class="hljs-comment"># scrape current page</span>
  Books &lt;- rbind(Books, scrape_books())

  <span class="hljs-comment"># find and click "next" button</span>
  nextPage &lt;- <span class="hljs-keyword">tryCatch</span>({
    remDr$findElement(using = <span class="hljs-string">"css selector"</span>, value = <span class="hljs-string">".next &gt; a"</span>)
  }, error = <span class="hljs-keyword">function</span>(e) {
    <span class="hljs-keyword">return</span>(<span class="hljs-literal">NULL</span>)  <span class="hljs-comment"># No more pages</span>
  })

  <span class="hljs-comment"># exit loop if "next" button is missing</span>
  <span class="hljs-keyword">if</span> (is.null(nextPage)) {
    message(<span class="hljs-string">"No 'Next' button found. Exiting loop."</span>)
    <span class="hljs-keyword">break</span>
  }

  nextPage$clickElement()
  <span class="hljs-comment"># Allow next page to load</span>
  Sys.sleep(<span class="hljs-number">3</span>)  

}

write.csv(Books, file = <span class="hljs-string">"./books.csv"</span>, fileEncoding = <span class="hljs-string">"UTF-8"</span>)
print(<span class="hljs-string">"finished scraping"</span>)
remDr$close()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1740129915080/2ee1344b-58a8-477b-a568-719ba4336c95.png" alt="2ee1344b-58a8-477b-a568-719ba4336c95" class="image--center mx-auto" width="390" height="993" loading="lazy"></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this tutorial, you learned how to effectively combine RSelenium and RVest to scrape data from a website. By leveraging RSelenium, you can automate user interactions and navigate through web pages, while RVest allows you to extract specific data from HTML elements.</p>
<p>This approach provides a powerful and flexible method for web scraping, enabling you to handle dynamic content and mimic human behavior. By following the steps outlined here, you can successfully scrape data from multiple pages and save it to a CSV file for further analysis.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Weather App with R Shiny ]]>
                </title>
                <description>
                    <![CDATA[ In this tutorial, you’ll learn how to build a weather app in R. Really – a weather app, in R? Wait, hear me out. When you think of R, you probably imagine someone wearing chunky thick prescription glasses and devouring a book. You know, a statisticia... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-weather-app-with-r-shiny/</link>
                <guid isPermaLink="false">67570d22e8032cfb3def7b4e</guid>
                
                    <category>
                        <![CDATA[ rshiny ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Semantic UI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ R Language ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Elabonga Atuo ]]>
                </dc:creator>
                <pubDate>Mon, 09 Dec 2024 15:30:42 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1733174501446/a177379f-3c32-424a-9fbe-6608310f2ea6.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In this tutorial, you’ll learn how to build a weather app in R. Really – a weather app, in R? Wait, hear me out.</p>
<p>When you think of R, you probably imagine someone wearing chunky thick prescription glasses and devouring a book. You know, a statistician dealing with complex models, an insane amount of mathematical equations, and copious amounts of data.</p>
<p>But R is far more than just a tool for statistics. It shines when you need to turn raw data into actionable insights and present those insights in a clear, engaging way.</p>
<p>With frameworks like Shiny, R takes this one step further, enabling you to create fully interactive web apps without having to worry about frontends, backends, or learning an entirely new programming language.</p>
<p>In this tutorial, you will create a simple weather app that fetches data from an API and displays the results in a good-looking app.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-project-overview">Project Overview</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-project-setup">Project Setup</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-api-keys-storage-and-retrieval">API Keys: Storage and Retrieval</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-make-your-first-api-call">How to Make Your First API Call</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-build-the-shiny-app">How to Build the Shiny App</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-project-overview">Project Overview</h2>
<p>Here’s what we’re going to be building:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733341336823/dd605385-5531-43c5-924d-dde24b38846b.gif" alt="The R Shiny weather app demo" class="image--center mx-auto" width="1366" height="588" loading="lazy"></p>
<p>For the weather app to work, you will need to make two separate API calls. We’ll use the One Call API 3.0 to update weather data and the OpenWeather API for geocoding. You can get your API Key <a target="_blank" href="https://openweathermap.org/api">here</a>. Just keep in mind that if this is your first time signing up for an API key, activation may take up to 24 hours.</p>
<p>The weather app will take the location/city from user input. The input will then be geocoded by making the call to OpenWeather API. Then, from its response, the coordinates (latitude and longitude) will be extracted. The coordinates will be used as query arguments for the One Call API call to obtain the weather data in JSON format.</p>
<h3 id="heading-prerequisites">Prerequisites:</h3>
<p>To follow along with this tutorial, you will need:</p>
<ul>
<li><p>R programming knowledge</p>
</li>
<li><p>HTML and a bit of JavaScript knowledge</p>
</li>
<li><p>R Studio installed</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733172415724/c4f884f6-b583-4f13-b0f8-eb564ab6531f.png" alt="Weather Update API Flow" class="image--center mx-auto" width="622" height="458" loading="lazy"></p>
<h2 id="heading-project-setup">Project Setup</h2>
<p>Create a folder in your desired directory. Set and confirm the project folder as the working directory using the following command in the R console:</p>
<pre><code class="lang-r">setwd(<span class="hljs-string">"path/to/your/project/file"</span>)
getwd()
</code></pre>
<p>Create a project in the set path using the following command:</p>
<pre><code class="lang-r"><span class="hljs-comment">#create R project</span>
usethis::create_project(path = <span class="hljs-string">"."</span>, open = <span class="hljs-literal">FALSE</span>)
</code></pre>
<p>You should have a folder structure that looks like this.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733166096334/93e004da-4449-4cb4-8ddd-d3082e5687d8.png" alt="project folder structure" class="image--center mx-auto" width="181" height="104" loading="lazy"></p>
<p>Create an R file in the root directory and save it as <code>app.R</code>. All your R code will be contained here.</p>
<p>Install and load the following libraries that you are going to work with:</p>
<pre><code class="lang-r"><span class="hljs-keyword">library</span>(shiny)
<span class="hljs-keyword">library</span>(bslib)
<span class="hljs-keyword">library</span>(shinyjs)
<span class="hljs-keyword">library</span>(httr2)
<span class="hljs-keyword">library</span>(lubridate)
<span class="hljs-keyword">library</span>(shiny.semantic)
</code></pre>
<h2 id="heading-api-keys-storage-and-retrieval">API Keys: Storage and Retrieval</h2>
<p>Storing your credentials in a location separate from your scripts and global environment is a good practice. This ensures security, scalability, and flexibility, especially when working in shared or production environments. The <code>.Renviron</code> file best serves that purpose.</p>
<p>Open and edit your <code>.Renviron</code> file in the following way:</p>
<pre><code class="lang-r"><span class="hljs-comment">#open and edit .Renviron</span>
usethis::edit_r_environ(scope=c(<span class="hljs-string">"project"</span>)
</code></pre>
<p>The scope argument set to <code>project</code> sets up the <code>.Renviron</code> specifically to your project. In the newly opened file, add your API key as follows:</p>
<pre><code class="lang-r">OPENWEATHERAPIKEY=<span class="hljs-string">"yourapikey"</span>
</code></pre>
<h2 id="heading-how-to-make-your-first-api-call">How to Make Your First API Call</h2>
<p>You will be using the httr2 library (built based on httr) to obtain data from the API. It grants you more control over how you make requests to the web.</p>
<h3 id="heading-make-the-api-key-accessible-in-the-script">Make the API Key accessible in the script</h3>
<p>First, you’ll need to securely access and store the API key in the script without hardcoding it. You can do that like this:</p>
<pre><code class="lang-r"><span class="hljs-comment">#access API keys in script</span>
readenviron(<span class="hljs-string">".Renviron"</span>)
api_key = Sys.getenv(<span class="hljs-string">"OPENWEATHERAPIKEY"</span>)
</code></pre>
<h3 id="heading-define-the-geocoding-function">Define the Geocoding Function</h3>
<p>You will create a function that takes a location and an API key as inputs, sends a request to the OpenWeather geocoding API, and returns the coordinates of the specified location.</p>
<p>Start by creating a request. The pipe (<code>|&gt;</code>) operator facilitates the chaining of HTTP requests step by step in a clear and readable manner. The geocoding URL takes two parameters: location, denoted by <code>q</code>, and the API key, denoted by <code>app_id</code>. The <code>req_url_query()</code> function appends these parameters to the query.</p>
<p>Chain the query to perform the request and fetch action, and finally obtain the response in JSON format using the second to last line.</p>
<pre><code class="lang-r"><span class="hljs-comment"># Geocoding URL</span>
geocoding_url &lt;- <span class="hljs-string">"https://api.openweathermap.org/data/2.5/weather"</span>
geocode &lt;- <span class="hljs-keyword">function</span>(location, api_key) {
  request(geocoding_url) |&gt; 
    req_url_query(`q` = location, `appid` = api_key) |&gt; 
    req_perform() |&gt; 
    resp_body_json() |&gt;
    coordinates()
}
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733342454801/feed01b2-a7a1-4c69-8297-2dcfdc8ec39f.png" alt="A sample response to the geocoding API" class="image--center mx-auto" width="1203" height="284" loading="lazy"></p>
<h3 id="heading-define-the-coordinate-extracting-function">Define the coordinate-extracting function</h3>
<p>The <code>coordinates()</code> function is a helper function that extracts the latitude and longitude values from the JSON response. A quick inspection of the JSON response reveals the coordinate's position. The JSON object is simply a long list of lists and you can access elements by subsetting it.</p>
<p>A blank data body would imply that the city/location is unavailable, and you’d get the message <em>"No such city exists!"</em>. If the JSON contains an element, the length would be more than 0 – it is a list after all.</p>
<pre><code class="lang-r">coordinates &lt;- <span class="hljs-keyword">function</span>(body) {
  <span class="hljs-keyword">if</span>(length(body) != <span class="hljs-number">0</span>) { 
    lat &lt;- body$coord$lat
    lng &lt;- body$coord$lon
    town &lt;- body$name
    c(lat, lng, town)
  } <span class="hljs-keyword">else</span> {
    <span class="hljs-string">"No such city exists!"</span>
  }
}
</code></pre>
<h3 id="heading-define-the-weather-update-function">Define the weather-update function</h3>
<p>You will create a function that sends a request to the OpenWeather API with specified query parameters, handles errors using a predefined function, and returns the parsed JSON response containing the weather data.</p>
<p>As implemented in the geocoding function, start by creating a request and adding the necessary query parameters using the <code>req_url_query()</code> function. The <code>openweather_json()</code> function accepts two main arguments:</p>
<ul>
<li><p><code>api_key</code>: This is a required argument used for authentication with the OpenWeather API matched by position.</p>
</li>
<li><p><code>...</code>: This represents optional keyword arguments that you can use to customize the query. You can pass as many additional parameters as needed, provided they are specified as named arguments.</p>
</li>
</ul>
<pre><code class="lang-r">openweather_json &lt;- <span class="hljs-keyword">function</span>(api_key, <span class="hljs-keyword">...</span>) { 
  request(current_weather_url) |&gt; 
    req_url_query(<span class="hljs-keyword">...</span>, `appid` = api_key, `units` = <span class="hljs-string">"metric"</span>) |&gt; 
    req_error(body = openweather_error_body) |&gt;
    req_perform() |&gt; 
    resp_body_json()
}
</code></pre>
<h3 id="heading-error-handling-extracting-and-managing-status-codes">Error Handling: Extracting and Managing Status Codes</h3>
<p>You will create an error-handling function that extracts non-200 status codes from a response and defines how to manage them. The structure of this function depends on how the API reports errors and where the relevant information is stored.</p>
<h4 id="heading-define-the-weather-update-error-body">Define the weather-update error body</h4>
<p>The <code>req_error()</code> in <code>openweather_json()</code> introduces a new concept: error handling. API requests may throw exceptions, and getting the status codes helps you know what message to show the user and how to resolve it.</p>
<p>Create an error body which is a function that captures the error code if the status code is not 200 (which means everything is OK).</p>
<p>The function takes a response and extracts the status response stored in the JSON response at the <code>$message</code> sublist. The underscore <code>(_)</code>is a placeholder for the JSON object.</p>
<pre><code class="lang-r">openweather_error_body &lt;- <span class="hljs-keyword">function</span>(resp) {
  resp |&gt; resp_body_json() |&gt; _$message 
}
</code></pre>
<h4 id="heading-define-the-geocode-error-body">Define the geocode error body</h4>
<p>This error body function will prove useful in the Shiny App. This is a simple walkthrough.</p>
<p>The <code>req_error()</code> function allows you to customize how response errors are handled. Its <code>is_error</code> argument determines whether a given response should be considered an error. By setting <code>is_error</code> to <code>\(resp) FALSE</code> (an anonymous function that always returns FALSE), all responses, regardless of the status code, are treated as successful. This prevents the app from exiting due to non-200 status codes.</p>
<p>With this setup, you can extract the status code from the response body and pipe it into the <code>resp_status()</code> function to retrieve the exact code.</p>
<pre><code class="lang-r">openstreetmap_error_body &lt;- <span class="hljs-keyword">function</span>(location, api_key) {
  resp &lt;- request(geocoding_url) |&gt; 
    req_url_query(`q` = location, `appid` = api_key) |&gt; 
    req_error(is_error = \(resp) <span class="hljs-literal">FALSE</span>) |&gt;
    req_perform() |&gt;  resp_status()
  resp
}
</code></pre>
<h2 id="heading-how-to-build-the-shiny-app">How to Build the Shiny App</h2>
<p>Now that you have nailed down how to obtain data from the API, it’s time to render the results in an interpretable and interactive format. For this, you will use Shiny. Shiny is a framework that allows you to create interactive web apps.</p>
<p>A Shiny App is made up of two components:</p>
<ul>
<li><p>The UI: what the user interacts with. It defines the layout and appearance of the app.</p>
</li>
<li><p>The server: contains the app’s logic and behaviour.</p>
</li>
</ul>
<h3 id="heading-building-the-shiny-ui">Building the Shiny UI</h3>
<p>Shiny UI provides a collection of elements that allow users to input data, make selections, and trigger events seamlessly.</p>
<p>You will include a <code>textInput</code> element that takes in the location and the weather data will be fetched and rendered upon submission. The <code>input_task_button</code> button prevents the user from clicking when an API call is in progress. The other elements are output elements where the weather data will be displayed and a mode-switching button.</p>
<h4 id="heading-styling-the-shiny-app">Styling the Shiny app</h4>
<p>You can use <code>shiny.semantic</code>, a library built on top of Fomantic-UI, to style your Shiny dashboard. Fomantic-UI is a front-end framework that provides a rich collection of pre-styled HTML components like buttons, modals, form inputs, and more. It simplifies UI design by allowing developers to create visually appealing and responsive interfaces without needing extensive custom CSS or HTML knowledge.</p>
<p>Fomantic-UI styling is applied by wrapping elements in their corresponding classes, which define their behavior and appearance.</p>
<p>A grid in Fomantic-UI is a flexible layout system used to organize content. It acts as a canvas that divides the layout into rows (horizontally aligned) and columns (vertically aligned). A root grid can contain up to 16 columns, making it ideal for creating structured and responsive designs.</p>
<p>To specify a column's width, you append classes like wide and the size (a number from 1 to 16) to represent its span. The total width of all columns in a row should sum up to 16.</p>
<p>A segment groups related content, while a card displays detailed, content-rich items, such as a user's social media profile. Dividers are visual elements used to separate sections or content within a layout.</p>
<p>For the weather app, first create a div of class <code>grid</code> within which you’ll nest the various elements.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733137762676/12d5695c-2ed7-4606-8267-44243c2bee57.png" alt="semantic page layout demo" class="image--center mx-auto" width="885" height="568" loading="lazy"></p>
<h5 id="heading-search-bar-section"><strong>Search bar section</strong></h5>
<p>Divide the grid into sixteen columns and create a segment that groups elements in the search bar section. Add a theme toggle button, location input that takes in user input, a search button for submitting the location to the API, and a notification button, defining their width by the column size.</p>
<pre><code class="lang-r">div(class = <span class="hljs-string">"sixteen wide column"</span>,
          div(class = <span class="hljs-string">"ui segment"</span>,
              div(class = <span class="hljs-string">"ui grid"</span>,
                  div(class = <span class="hljs-string">"two wide column"</span>,
                      button(
                        class = <span class="hljs-string">"ui button icon basic"</span>,
                        input_id = <span class="hljs-string">"darkmode"</span>,
                        label = <span class="hljs-literal">NULL</span>,
                        icon = icon(<span class="hljs-string">"moon icon"</span>)
                      )
                  ),
                  div(class = <span class="hljs-string">"ten wide column"</span>,
                      textInput(
                        <span class="hljs-string">"location"</span>,
                        label = <span class="hljs-literal">NULL</span>,
                        placeholder = <span class="hljs-string">"Search for your preferred city"</span>
                      )
                  ),
                  div(class = <span class="hljs-string">"two wide column"</span>,
                      tags$div(
                        class = <span class="hljs-string">"ui button"</span>,
                        id = <span class="hljs-string">"my-custom-button"</span>,
                        input_task_button(<span class="hljs-string">"search"</span>, label = <span class="hljs-string">"Search"</span>, icon = icon(<span class="hljs-string">"search"</span>))
                      )
                  ),
                  div(class = <span class="hljs-string">"two wide column"</span>,
                      actionButton(<span class="hljs-string">"show_alert"</span>, label = icon(<span class="hljs-string">"bell"</span>), class = <span class="hljs-string">"bell-no-alert"</span>),
                      textOutput(<span class="hljs-string">"alert_message"</span>)
                  )
              )
          )
      )
</code></pre>
<h5 id="heading-location-and-current-weather-section"><strong>Location and current weather section</strong></h5>
<p>Divide the grid into sixteen columns and nest another grid within the partitions that will host two columns.</p>
<p>Within the grid, define two columns. The first column is for time, location, and date data, and the second column will hold current weather data.</p>
<p>Then create card elements to hold each weather parameter, its unit of measurement, and the corresponding icon.</p>
<pre><code class="lang-r">div(class = <span class="hljs-string">"sixteen wide column"</span>,
          div(class = <span class="hljs-string">"ui equal-height-grid grid"</span>,
              div(class = <span class="hljs-string">"left floated center aligned four wide column"</span>,
                  div(class = <span class="hljs-string">"ui raised equal-height-two-segment segment"</span>,
                      style = <span class="hljs-string">"flex: 1;"</span>,
                      div(class = <span class="hljs-string">"column center aligned"</span>,
                          div(class = <span class="hljs-string">"ui hidden section divider"</span>),
                          span(class = <span class="hljs-string">"ui large text"</span>, textOutput(<span class="hljs-string">"city"</span>)),
                          div(class = <span class="hljs-string">"ui hidden section divider"</span>),
                          span(class = <span class="hljs-string">"ui big text"</span>, textOutput(<span class="hljs-string">"currentTime"</span>)),
                          div(class = <span class="hljs-string">"ui hidden section divider"</span>),
                          span(class = <span class="hljs-string">"ui large text"</span>, textOutput(<span class="hljs-string">"currentDate"</span>)),
                          div(class = <span class="hljs-string">"ui hidden section divider"</span>)
                      )
                  )
              ),
              div(class = <span class="hljs-string">"right floated center aligned twelve wide column"</span>,
                  div(class = <span class="hljs-string">"ui raised segment"</span>,
                      div(class = <span class="hljs-string">"ui horizontal equal width segments"</span>,
                          div(class = <span class="hljs-string">"ui equal-height-two-segment segment"</span>,
                              style = <span class="hljs-string">"flex: 3;"</span>,
                              div(class = <span class="hljs-string">"column"</span>,
                                  span(class = <span class="hljs-string">"ui big text centered"</span>, textOutput(<span class="hljs-string">"currentTemp"</span>)),
                                  textOutput(<span class="hljs-string">"feelsLike"</span>),
                                  card(
                                    class = <span class="hljs-string">"ui mini"</span>,
                                    div(class = <span class="hljs-string">"content"</span>, icon(class = <span class="hljs-string">"large sun"</span>),
                                        div(class = <span class="hljs-string">"sub header"</span>, <span class="hljs-string">"Sunrise"</span>),
                                        div(class = <span class="hljs-string">"description"</span>, textOutput(<span class="hljs-string">"sunriseTime"</span>))
                                    )
                                  ),
                                  card(
                                    class = <span class="hljs-string">"ui mini"</span>,
                                    div(class = <span class="hljs-string">"content"</span>, icon(class = <span class="hljs-string">"large moon"</span>),
                                        div(class = <span class="hljs-string">"sub header"</span>, <span class="hljs-string">"Sunset"</span>),
                                        div(class = <span class="hljs-string">"description"</span>, textOutput(<span class="hljs-string">"sunsetTime"</span>))
                                    )
                                  )
                              )
                          ),
                          div(class = <span class="hljs-string">"ui segment"</span>,
                              style = <span class="hljs-string">"flex: 3;"</span>,
                              div(
                                class = <span class="hljs-string">"column center aligned"</span>,
                                div(class = <span class="hljs-string">"ui hidden divider"</span>),
                                htmlOutput(<span class="hljs-string">"currentWeatherIcon"</span>),
                                span(class = <span class="hljs-string">"ui large text"</span>, textOutput(<span class="hljs-string">"currentWeatherDescription"</span>))
                              )
                          ),
                          div(class = <span class="hljs-string">"ui segment"</span>,
                              style = <span class="hljs-string">"flex: 3;"</span>,
                              div(class = <span class="hljs-string">"column"</span>,
                                  card(
                                    class = <span class="hljs-string">"ui tiny"</span>,
                                    div(class = <span class="hljs-string">"content"</span>, icon(class = <span class="hljs-string">"big tint"</span>),
                                        div(class = <span class="hljs-string">"sub header"</span>, <span class="hljs-string">"Humidity"</span>),
                                        div(class = <span class="hljs-string">"description"</span>, textOutput(<span class="hljs-string">"currentHumidity"</span>))
                                    )
                                  ),
                                  card(
                                    class = <span class="hljs-string">"ui tiny"</span>,
                                    div(class = <span class="hljs-string">"content"</span>, icon(class = <span class="hljs-string">"big tachometer alternate"</span>),
                                        div(class = <span class="hljs-string">"sub header"</span>, <span class="hljs-string">"Pressure"</span>),
                                        div(class = <span class="hljs-string">"description"</span>, textOutput(<span class="hljs-string">"currentPressure"</span>))
                                    )
                                  )
                              )
                          ),
                          div(class = <span class="hljs-string">"ui segment"</span>,
                              style = <span class="hljs-string">"flex: 3;"</span>,
                              div(class = <span class="hljs-string">"column center aligned"</span>,
                                  card(
                                    class = <span class="hljs-string">"ui tiny"</span>,
                                    div(class = <span class="hljs-string">"content"</span>, icon(class = <span class="hljs-string">"big wind"</span>),
                                        div(class = <span class="hljs-string">"sub header"</span>, <span class="hljs-string">"Wind Speed"</span>),
                                        div(class = <span class="hljs-string">"description"</span>, textOutput(<span class="hljs-string">"currentWindSpeed"</span>))
                                    )
                                  ),
                                  card(
                                    class = <span class="hljs-string">"ui tiny"</span>,
                                    div(class = <span class="hljs-string">"content"</span>, icon(class = <span class="hljs-string">"big umbrella"</span>),
                                        div(class = <span class="hljs-string">"sub header"</span>, <span class="hljs-string">"UV Index"</span>),
                                        div(class = <span class="hljs-string">"description"</span>, textOutput(<span class="hljs-string">"currentUV"</span>))
                                    )
                                  )
                              )
                          )
                      )
                  )
              )
          )
      )
</code></pre>
<p><strong>Forecast section</strong></p>
<p>This section holds the forecasted data. Divide the grid into sixteen columns and nest another grid within the partitions hosting two columns.</p>
<p>Within the grid, define two columns. The first column holds the <em>5-Day Forecast</em> data. Separate the elements containing different values using rows. The second column contains <em>Hourly Forecast</em> data. Separate the elements containing different values using columns.</p>
<pre><code class="lang-r">      <span class="hljs-comment"># Forecast section</span>
      div(class = <span class="hljs-string">"sixteen wide column"</span>,
          div(class = <span class="hljs-string">"ui grid equal-height-grid"</span>,
              div(class = <span class="hljs-string">"left floated center aligned six wide column"</span>,
                  div(class = <span class="hljs-string">"ui raised segment special-segment equal-height-segment"</span>,
                      h4(<span class="hljs-string">"5 Days Forecast:"</span>),
                      div(class = <span class="hljs-string">"ui three column special-column grid"</span>,
                          <span class="hljs-comment"># Day forecasts</span>
                          div(class = <span class="hljs-string">"row"</span>,
                              div(class = <span class="hljs-string">"five wide column"</span>, textOutput(<span class="hljs-string">"dailyDtOne"</span>)),
                              div(class = <span class="hljs-string">"three wide column"</span>, textOutput(<span class="hljs-string">"dailyTempOne"</span>)),
                              div(class = <span class="hljs-string">"three wide column"</span>, htmlOutput(<span class="hljs-string">"dailyIconOne"</span>))
                          ),
                          div(class = <span class="hljs-string">"row"</span>,
                              div(class = <span class="hljs-string">"five wide column"</span>, textOutput(<span class="hljs-string">"dailyDtTwo"</span>)),
                              div(class = <span class="hljs-string">"three wide column"</span>, textOutput(<span class="hljs-string">"dailyTempTwo"</span>)),
                              div(class = <span class="hljs-string">"three wide column"</span>, htmlOutput(<span class="hljs-string">"dailyIconTwo"</span>))
                          ),
                          div(class = <span class="hljs-string">"row"</span>,
                              div(class = <span class="hljs-string">"five wide column"</span>, textOutput(<span class="hljs-string">"dailyDtThree"</span>)),
                              div(class = <span class="hljs-string">"three wide column"</span>, textOutput(<span class="hljs-string">"dailyTempThree"</span>)),
                              div(class = <span class="hljs-string">"three wide column"</span>, htmlOutput(<span class="hljs-string">"dailyIconThree"</span>))
                          ),
                          div(class = <span class="hljs-string">"row"</span>,
                              div(class = <span class="hljs-string">"five wide column"</span>, textOutput(<span class="hljs-string">"dailyDtFour"</span>)),
                              div(class = <span class="hljs-string">"three wide column"</span>, textOutput(<span class="hljs-string">"dailyTempFour"</span>)),
                              div(class = <span class="hljs-string">"three wide column"</span>, htmlOutput(<span class="hljs-string">"dailyIconFour"</span>))
                          ),
                          div(class = <span class="hljs-string">"row"</span>,
                              div(class = <span class="hljs-string">"five wide column"</span>, textOutput(<span class="hljs-string">"dailyDtFive"</span>)),
                              div(class = <span class="hljs-string">"three wide column"</span>, textOutput(<span class="hljs-string">"dailyTempFive"</span>)),
                              div(class = <span class="hljs-string">"three wide column"</span>, htmlOutput(<span class="hljs-string">"dailyIconFive"</span>))
                          )
                      )
                  )
              ),
              div(class = <span class="hljs-string">"right floated center aligned ten wide column"</span>,
                  div(class = <span class="hljs-string">"ui raised segment special-segment equal-height-segment"</span>,
                      h4(<span class="hljs-string">"Hourly Forecast:"</span>),
                      div(
                        class = <span class="hljs-string">"ui grid"</span>,
                        style = <span class="hljs-string">"display: flex; flex-direction: row; align-items: center; justify-content: space-around; flex-wrap: wrap; height: 100%;"</span>,
                        <span class="hljs-comment"># Hourly forecasts</span>
                        div(class = <span class="hljs-string">"column"</span>,
                            textOutput(<span class="hljs-string">"hourlyDtOne"</span>),
                            htmlOutput(<span class="hljs-string">"hourlyIconOne"</span>),
                            textOutput(<span class="hljs-string">"hourlyTempOne"</span>)
                        ),
                        div(class = <span class="hljs-string">"column"</span>,
                            textOutput(<span class="hljs-string">"hourlyDtTwo"</span>),
                            htmlOutput(<span class="hljs-string">"hourlyIconTwo"</span>),
                            textOutput(<span class="hljs-string">"hourlyTempTwo"</span>)
                        ),
                        div(class = <span class="hljs-string">"column"</span>,
                            textOutput(<span class="hljs-string">"hourlyDtThree"</span>),
                            htmlOutput(<span class="hljs-string">"hourlyIconThree"</span>),
                            textOutput(<span class="hljs-string">"hourlyTempThree"</span>)
                        ),
                        div(class = <span class="hljs-string">"column"</span>,
                            textOutput(<span class="hljs-string">"hourlyDtFour"</span>),
                            htmlOutput(<span class="hljs-string">"hourlyIconFour"</span>),
                            textOutput(<span class="hljs-string">"hourlyTempFour"</span>)
                        ),
                        div(class = <span class="hljs-string">"column"</span>,
                            textOutput(<span class="hljs-string">"hourlyDtFive"</span>),
                            htmlOutput(<span class="hljs-string">"hourlyIconFive"</span>),
                            textOutput(<span class="hljs-string">"hourlyTempFive"</span>)
                        )
                      )
                  )
              )
          )
      )
  )
</code></pre>
<h3 id="heading-building-the-shiny-server">Building the Shiny Server</h3>
<p>Each element in the UI section has an ID (unique identifier) that is used to manipulate what data/information will be displayed to it.</p>
<p>The <code>render*()</code> set of functions defines the visualization type while the <code>output$*</code> functions subset elements. These two are used to link the visual to the logic. Most elements will have data extracted from the JSON list, except for the weather icons (for which an external link as a source will be referenced).</p>
<h4 id="heading-reactivity">Reactivity</h4>
<p>Reactivity is what makes Shiny apps dynamic—outputs automatically update when their dependencies change.</p>
<p>Two key components of reactivity are reactives and observers. A reactive computes and returns a value based on its dependencies, while an observer monitors reactive values and runs code that causes side effects, like logging or updating a database.</p>
<p>To control reactivity, you can use <code>bindEvent()</code> to delay execution until a specific event occurs or <code>observeEvent()</code> to listen for a user action and trigger a code block. Together, these tools provide flexibility for managing app behavior.</p>
<h4 id="heading-the-server-code">The Server Code</h4>
<ol>
<li><code>location</code> <strong>reactive</strong></li>
</ol>
<p>The location reactive includes an if-else conditional block that defines what message to display depending on the status code. The query variable contains the city/location that will be geocoded to obtain coordinates. The flow is piped to <code>bindEvent()</code>. This ensures the geocoding API call is completed before another call can be made, which reduces unnecessary requests.</p>
<pre><code class="lang-r">location &lt;- reactive({
    query &lt;- input$location
    <span class="hljs-keyword">if</span>(openstreetmap_error_body(query, api_key) == <span class="hljs-string">"404"</span>){
      validate(<span class="hljs-string">"No such city/town exists. Check your spelling!"</span>)
    }
    <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span>(openstreetmap_error_body(query, api_key) == <span class="hljs-string">"400"</span>){
      validate(<span class="hljs-string">"Bad request"</span>)
    }
    coords &lt;- geocode(query, api_key)
  }) %&gt;% bindEvent(input$search)
</code></pre>
<ol start="2">
<li><code>weather_data</code> <strong>reactive</strong></li>
</ol>
<p>The weather reactive combines a geocoding API call and a weather update API call using coordinates obtained and extracted from <code>location()</code>:</p>
<pre><code class="lang-r">  weather_data &lt;- reactive({
    loc &lt;- location()
    openweather_json(api_key, lat = loc[<span class="hljs-number">1</span>], lon = loc[<span class="hljs-number">2</span>])
  })
</code></pre>
<p>To access the JSON objects returned by the API call, you call the reactive as if it were a function. The specific values to be extracted can then be accessed by subsetting the JSON value.</p>
<pre><code class="lang-r"><span class="hljs-comment"># subsetting weather data.</span>
  output$city &lt;- renderText({
    location()[<span class="hljs-number">3</span>]
  })

  output$currentWeatherDescription &lt;- renderText({
    weather_data()$current$weather[[<span class="hljs-number">1</span>]]$description
  })
</code></pre>
<ol start="3">
<li><strong>Create a Parse Date function</strong></li>
</ol>
<p>All the time data in the JSON response, forecasted or current, is provided in UNIX format. To make this information user-friendly, it needs to be converted into a human-readable format. You can do this by creating a function that takes the time data as input and uses functions from the <code>lubridate</code> package to handle the conversion.</p>
<p>First, convert the timestamp element to a datetime object. Format the time item to a 12-hour clock system and a date item to include the day of the week, the date, and the month.</p>
<ul>
<li><p><code>%I</code>: Displays the hour in a 12-hour clock format (01-12).</p>
</li>
<li><p><code>%M</code>: Displays the minutes (00-59).</p>
</li>
<li><p><code>%p</code>: Adds the AM/PM indicator.</p>
</li>
</ul>
<p>The paste function concatenates the values. The function returns a vector containing date and time values to be extracted by subsetting.</p>
<pre><code class="lang-r">parse_date &lt;- <span class="hljs-keyword">function</span>(timestamp) {
  datetime &lt;- as_datetime(timestamp) 
  date &lt;- paste(weekdays(datetime), <span class="hljs-string">","</span>, day(datetime), months(datetime))
  time &lt;- format(as.POSIXct(datetime), format = <span class="hljs-string">"%I:%M %p"</span>)
  c(date, time)
}
</code></pre>
<ol start="4">
<li><strong>Add a modal to display error messages</strong></li>
</ol>
<p>The <code>location</code> reactive provides a way to handle errors. You can incorporate a modal to enhance the user experience by overlaying the page and disabling its content until the user completes a specified action whenever an error occurs.</p>
<p>You’ll add JavaScript to control when and how the modal shows.</p>
<p>Add two modals in the UI section, each featuring an explanation of the error (header) and an outline of the required action (content). The <code>action</code> class includes a button that enables the user to close the modal.</p>
<pre><code class="lang-r"><span class="hljs-comment"># modals - UI</span>
  div(id = <span class="hljs-string">"notFound"</span>, class = <span class="hljs-string">"ui modal"</span>,
      div(class = <span class="hljs-string">"header"</span>, <span class="hljs-string">"Location Not Found"</span>),
      div(class = <span class="hljs-string">"content"</span>, <span class="hljs-string">"No such city/town exists. Check your spelling!"</span>),
      div(class = <span class="hljs-string">"actions"</span>,
          div(class = <span class="hljs-string">"ui button"</span>, id = <span class="hljs-string">"closeNotFound"</span>, <span class="hljs-string">"OK"</span>))
  ),
  div(id = <span class="hljs-string">"badRequest"</span>, class = <span class="hljs-string">"ui modal"</span>,
      div(class = <span class="hljs-string">"header"</span>, <span class="hljs-string">"Invalid Request"</span>),
      div(class = <span class="hljs-string">"content"</span>, <span class="hljs-string">"Bad request. Please try again with valid details."</span>),
      div(class = <span class="hljs-string">"actions"</span>,
          div(class = <span class="hljs-string">"ui button"</span>, id = <span class="hljs-string">"closeBadRequest"</span>, <span class="hljs-string">"OK"</span>))
  )
</code></pre>
<p>Slightly adjust the location reactive to incorporate the modal. The commented-out code will be replaced with the JavaScript lines. The <code>runjs</code> function shows the modal depending on the error encountered. <code>req(FALSE)</code> terminates the reactive flow.</p>
<pre><code class="lang-r"><span class="hljs-comment"># show and hide modals  - Server</span>
location &lt;- reactive({
    query &lt;- input$location
    <span class="hljs-keyword">if</span>(openstreetmap_error_body(query, api_key) == <span class="hljs-string">"404"</span>){
      <span class="hljs-comment">#validate("No such city/town exists. Check your spelling!")</span>
      runjs(<span class="hljs-string">"$('#notFound').modal('show');"</span>)
      req(<span class="hljs-literal">FALSE</span>)
    }
    <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span>(openstreetmap_error_body(query, api_key) == <span class="hljs-string">"400"</span>){
      <span class="hljs-comment">#validate("Bad request")</span>
      runjs(<span class="hljs-string">"$('#badRequest').modal('show');"</span>)
      req(<span class="hljs-literal">FALSE</span>)
    }
    coords &lt;- geocode(query, api_key)
  }) %&gt;% bindEvent(input$search)

<span class="hljs-comment"># listens for button click on modals to hide modal</span>
observeEvent(input$closeNotFound, {
    runjs(<span class="hljs-string">"$('#notFound').modal('hide');"</span>)
  })

observeEvent(input$closeBadRequest, {
    runjs(<span class="hljs-string">"$('#badRequest').modal('hide');"</span>)
  })
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this tutorial, you have built a weather app using Shiny that retrieves weather data from an API and displays it in an interactive and visually appealing way.</p>
<p>To do this, you used the following libraries:</p>
<ul>
<li><p><code>httr2</code> for making API requests and handling responses</p>
</li>
<li><p><code>shiny.semantic</code> for styling the app</p>
</li>
<li><p><code>lubridate</code> for working with and formatting time data</p>
</li>
<li><p><code>shinyjs</code> for integrating JavaScript features into the app</p>
</li>
</ul>
<p>This combination of tools allowed you to create a functional, user-friendly weather app.</p>
<p>You can find the complete code for the project <a target="_blank" href="https://github.com/elabongaatuo/R-weather-app">here</a>.</p>
<p>La Fin!</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
