<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Nokogiri - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Nokogiri - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Sun, 24 May 2026 16:31:20 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/nokogiri/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to scrape with Ruby and Nokogiri and map the data ]]>
                </title>
                <description>
                    <![CDATA[ By Andrew Bales Sometimes you want to grab data from a website for your own project. So what do you use? Ruby, Nokogiri, and JSON to the rescue! Recently, I was working on a project to map data about bridges. Using Nokogiri, I was able to capture a c... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-scrape-with-ruby-and-nokogiri-and-map-the-data-bd9febb5e18a/</link>
                <guid isPermaLink="false">66c3545653e0c377d44064d2</guid>
                
                    <category>
                        <![CDATA[ Nokogiri ]]>
                    </category>
                
                    <category>
                        <![CDATA[ google maps ]]>
                    </category>
                
                    <category>
                        <![CDATA[ JavaScript ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Ruby ]]>
                    </category>
                
                    <category>
                        <![CDATA[ technology ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Thu, 24 May 2018 22:52:19 +0000</pubDate>
                <media:content url="https://cdn-media-1.freecodecamp.org/images/1*kUyC5E-rXXkL4DcR8L91rA.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Andrew Bales</p>
<p>Sometimes you want to grab data from a website for your own project. So what do you use? Ruby, Nokogiri, and JSON to the rescue!</p>
<p>Recently, I was working on a project to map <a target="_blank" href="https://bridgereports.com/">data about bridges</a>. Using Nokogiri, I was able to capture a city’s bridge data from a table. I then used links within that same table to scrape associated pages. Finally, I converted the scraped data to JSON and used it to populate a Google Map.</p>
<p>This article walks you through the tools I used and how the code works!</p>
<p>See the full code on my <a target="_blank" href="https://github.com/agbales/wichita-bridges">GitHub</a> repo.</p>
<p>Live map demo <a target="_blank" href="https://agbales.github.io/wichita-bridges/">here</a>.</p>
<h3 id="heading-the-project">The Project</h3>
<p>My goal was to take a table from a bridge data <a target="_blank" href="https://bridgereports.com/">website</a> and turn it into a Google map with geolocated pins that would produce informational popups for each bridge.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/pTodl03NV9GsnFl6mYtcO0-rPk6F8AUjRyBb" alt="Image" width="800" height="348" loading="lazy">
<em>The Idea: HTML Table to Map</em></p>
<p>To make this happen, I’d need to:</p>
<ol>
<li>Scrape data from the original website.</li>
<li>Convert that data into a <a target="_blank" href="https://www.w3schools.com/js/js_json_objects.asp">JSON object</a>.</li>
<li>Apply that data to make a new, interactive map.</li>
</ol>
<p>Your project will vary, surely — how many people are trying to map antique bridges? — but I hope this process will prove useful for your context.</p>
<h3 id="heading-nokogiri">Nokogiri</h3>
<p>Ruby has an amazing web scraping gem called <a target="_blank" href="https://github.com/sparklemotion/nokogiri">Nokogiri</a>. Among other features, it allows you to search HTML documents by CSS selectors. That means if we know the ids, classes, or even types of elements where the data is stored in the DOM, we’re able to pluck it out.</p>
<h4 id="heading-the-scraper">The scraper</h4>
<p>If you’re following along with the <a target="_blank" href="https://github.com/agbales/wichita-bridges">GibHub repo</a>, you can find my scraper in bridges_scraper.rb</p>
<pre><code><span class="hljs-built_in">require</span> <span class="hljs-string">'open-uri'</span><span class="hljs-built_in">require</span> <span class="hljs-string">'nokogiri'</span><span class="hljs-built_in">require</span> <span class="hljs-string">'json'</span>
</code></pre><p>Open-uri lets us open the HTML like a file and pass it to Nokogiri for the heavy lifting.</p>
<p>In the code below, I’m passing the DOM information from the URL with the bridge data over to Nokogiri. I then find the table element holding the data, search for its rows, and iterate through them.</p>
<pre><code>url = <span class="hljs-string">'https://bridgereports.com/city/wichita-kansas/'</span>html = open(url)
</code></pre><pre><code>doc = Nokogiri::HTML(html)bridges = []table = doc.at(<span class="hljs-string">'table'</span>)
</code></pre><pre><code>table.search(<span class="hljs-string">'tr'</span>).each <span class="hljs-keyword">do</span> |tr|  bridges.push(    carries: cells[<span class="hljs-number">1</span>].text,    <span class="hljs-attr">crosses</span>: cells[<span class="hljs-number">2</span>].text,    <span class="hljs-attr">location</span>: cells[<span class="hljs-number">3</span>].text,    <span class="hljs-attr">design</span>: cells[<span class="hljs-number">4</span>].text,    <span class="hljs-attr">status</span>: cells[<span class="hljs-number">5</span>].text,    <span class="hljs-attr">year_build</span>: cells[<span class="hljs-number">6</span>].text.to_i,    <span class="hljs-attr">year_recon</span>: cells[<span class="hljs-number">7</span>].text,    <span class="hljs-attr">span_length</span>: cells[<span class="hljs-number">8</span>].text.to_f,    <span class="hljs-attr">total_length</span>: cells[<span class="hljs-number">9</span>].text.to_f,    <span class="hljs-attr">condition</span>: cells[<span class="hljs-number">10</span>].text,    <span class="hljs-attr">suff_rating</span>: cells[<span class="hljs-number">11</span>].text.to_f,    <span class="hljs-attr">id</span>: cells[<span class="hljs-number">12</span>].text.to_i  )end
</code></pre><pre><code>json = <span class="hljs-built_in">JSON</span>.pretty_generate(bridges)File.open(<span class="hljs-string">"data.json"</span>, <span class="hljs-string">'w'</span>) { |file| file.write(json) }
</code></pre><p>Nokogiri has lots of methods (here’s a <a target="_blank" href="https://github.com/sparklemotion/nokogiri/wiki/Cheat-sheet">cheat sheet</a> and a starter <a target="_blank" href="https://readysteadycode.com/howto-parse-html-with-ruby-and-nokogiri">guide</a>!). We’re using just a few.</p>
<p>The table is found with <strong>.at(‘table’)</strong>, which returns the first occurrence of a table element in the DOM. This works just fine for this relatively simple page.</p>
<p>With the table in hand, <strong>.search(‘tr’)</strong> provides an array of the row elements that we iterate over with <strong>.each</strong>. In each row, the data is cleaned up and pushed into a single entry for the bridges array.</p>
<p>After all the rows are collected, the data is converted into JSON and saved in a new file called “data.json”.</p>
<h3 id="heading-combining-data-from-multiple-pages">Combining data from multiple pages</h3>
<p>In this case, I needed information from other associated pages. Specifically, I needed the latitude and longitude of each bridge, which was not featured on the table. However, I found that the link in the first cell of each row led to a page that <em>did</em> provide those details.</p>
<p>I needed to write code that did a few things:</p>
<ul>
<li>Gathered links from the first cell in the table.</li>
<li>Created a new Nokogiri object from the HTML on that page.</li>
<li>Pluck out the latitude and longitude.</li>
<li>Sleep the program until that process completes.</li>
</ul>
<pre><code>cells = tr.search(<span class="hljs-string">'th, td'</span>)  links = {}  cells[<span class="hljs-number">0</span>].css(<span class="hljs-string">'a'</span>).each <span class="hljs-keyword">do</span> |a|    links[a.text] = a[<span class="hljs-string">'href'</span>]  end    got_coords = <span class="hljs-literal">false</span>    <span class="hljs-keyword">if</span> links[<span class="hljs-string">'NBI report'</span>]    nbi = links[<span class="hljs-string">'NBI report'</span>]    report = <span class="hljs-string">"https://bridgereports.com"</span> + nbi    report_html = open(report)    sleep <span class="hljs-number">1</span> until report_html    r = Nokogiri::HTML(report_html)        lat = r.css(<span class="hljs-string">'span.latitude'</span>).text.strip.to_f    long = r.css(<span class="hljs-string">'span.longitude'</span>).text.strip.to_f
</code></pre><pre><code>    got_coords = <span class="hljs-literal">true</span>  <span class="hljs-keyword">else</span>    got_coords = <span class="hljs-literal">true</span>  end    sleep <span class="hljs-number">1</span> until got_coords == <span class="hljs-literal">true</span>
</code></pre><pre><code>  bridges.push(        links: links,        <span class="hljs-attr">latitude</span>: lat,        <span class="hljs-attr">longitude</span>: long,        <span class="hljs-attr">carries</span>: cells[<span class="hljs-number">1</span>].text,        ..., # all other previous key/value pairs  )end
</code></pre><p>A few additional things are worth pointing out here:</p>
<ul>
<li>I’m using the “got_coords” as a simple binary. This is set to <strong>false</strong> by default and is toggled when the data is captured OR simply not available.</li>
<li>The latitude and longitude are located in spans with corresponding classes. That makes securing the data simple: <strong>.css(‘span.latitude’)</strong> This is followed by <strong>.text, .strip</strong> and <strong>.to_f</strong> which 1) gets the text from the span, 2) strips any excess whitespace, and 3) converts the string to a float number.</li>
</ul>
<h3 id="heading-json-google-map"><strong>JSON → Google Map</strong></h3>
<p>The newly formed JSON object has to be modified a touch to fit the Google Maps API. I did this with JavaScript inside <strong>map.js</strong></p>
<p>The JSON data is accessible within <strong>map.js</strong> because it has been moved to the JS folder, assigned to a variable called “bridge_data”, and included in a </p> ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
