<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ data analytics - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ data analytics - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Sat, 30 May 2026 16:31:34 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/data-analytics/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ Beginner’s Guide to Cloud Data Analytics ]]>
                </title>
                <description>
                    <![CDATA[ If you want to transform your career and become a data-driven decision maker, this course is for you. freeCodeCamp.org just published a comprehensive Google Cloud Data Analytics course on our YouTube channel. The course was developed by Google Cloud ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/beginners-guide-to-cloud-data-analytics/</link>
                <guid isPermaLink="false">685162a7b5948514b5ab9308</guid>
                
                    <category>
                        <![CDATA[ Cloud ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ youtube ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Beau Carnes ]]>
                </dc:creator>
                <pubDate>Tue, 17 Jun 2025 12:42:15 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1749071398844/dd78731e-80d1-4835-bc2b-cbec1ba9a9d3.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>If you want to transform your career and become a data-driven decision maker, this course is for you. freeCodeCamp.org just published a comprehensive <a target="_blank" href="https://youtu.be/GAdgTK2Esn4">Google Cloud Data Analytics course</a> on our YouTube channel. The course was developed by Google Cloud and helps people earn the Google Cloud Data Analytics Certificate.</p>
<h3 id="heading-why-learn-data-analytics-with-google-cloud">Why Learn Data Analytics with Google Cloud?</h3>
<p>Turning raw numbers into actionable insights is a skill that sets you apart in the job market. As a Cloud Data Analyst, you’ll be able to unlock the stories hidden within massive datasets, helping organizations make smarter, faster decisions.</p>
<p>This course will help you enter that world. It offers a blend of expert instruction, hands-on labs, and real-world projects that will help you build a portfolio to impress employers.</p>
<h3 id="heading-whats-in-the-course">What’s In the Course?</h3>
<p>This course is a carefully curated program developed by Google Cloud, designed to take you from beginner to job-ready. You’ll learn technical skills like SQL, data visualization, and cloud storage. And you’ll also learn how to communicate your findings and drive business impact.</p>
<p>The course is structured to help you build a portfolio of industry-relevant projects, so you can showcase your expertise to potential employers.</p>
<p>Here’s what you’ll learn:</p>
<p><strong>1. Introduction to Data Analytics in Google Cloud</strong><br>Get a solid foundation in cloud data analysis. You’ll define the field, explore the roles and responsibilities of a cloud data analyst, and understand how data analytics drives business value. This module sets the stage for your journey, introducing you to the tools and concepts you’ll use throughout the program.</p>
<p><strong>2. Data Management and Storage in the Cloud</strong><br>Dive into how data is structured, organized, and stored in the cloud. You’ll get hands-on with data lakehouse architecture and tools like BigQuery, learning how to manage data efficiently and securely. This section is crucial for understanding how to handle large-scale datasets in a modern cloud environment.</p>
<p><strong>3. Data Transformation in the Cloud</strong><br>Follow the journey of data from collection to insight. You’ll learn to use SQL and other tools to clean, transform, and prepare data for analysis. By mastering data transformation, you’ll be able to turn messy, raw data into clean, usable information.</p>
<p><strong>4. The Power of Storytelling: How to Visualize Data in the Cloud</strong><br>Master the art of data visualization. This course teaches you how to turn complex data into clear, compelling stories using cloud-based visualization tools, making your insights accessible to any audience. You’ll learn the five key stages of visualizing data in the cloud, from planning to building impactful dashboards.</p>
<p><strong>5. Put It All Together: Prepare for a Cloud Data Analyst Job</strong><br>Apply everything you’ve learned in a capstone project that simulates real-world challenges. You’ll combine your skills in analysis, visualization, and communication to solve a comprehensive data problem, building a portfolio piece that demonstrates your ability to analyze, visualize, and communicate data insights.</p>
<h3 id="heading-learn-by-doing">Learn by Doing</h3>
<p>The course is designed to be watched alongside <a target="_blank" href="https://www.cloudskillsboost.google/paths/420">the Google Cloud Skills Boost platform</a>, where you’ll find interactive labs and practice environments. You get 35 free credits per month for labs, and if you want to move faster, you can pay for unlimited access. These labs give you the chance to practice your skills in real cloud environments, reinforcing your learning with practical experience. Complete the program and you’ll earn a Google Cloud Data Analytics Certificate.</p>
<h3 id="heading-start-your-journey-today">Start Your Journey Today</h3>
<p>Whether you’re just starting out or looking to advance your career, this course is for anyone who wants to harness the power of data. If you’re interested in technology, business, or problem-solving, this course will give you the tools you need to succeed in the fast-growing field of data analytics.</p>
<p>Watch the <a target="_blank" href="https://youtu.be/GAdgTK2Esn4">full course on the freeCodeCamp.org YouTube channel</a> and begin your journey into cloud data analytics (10-hour watch).</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/GAdgTK2Esn4" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ What is a Kalman Filter? How to Simplify Noisy Data in Navigation and Finance ]]>
                </title>
                <description>
                    <![CDATA[ In a world where precision is key, handling noisy data effectively is crucial for solving complex problems. Whether you're trying to control a rocket or forecast the stock market, the ability to get good data from an uncertain environment is importan... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/what-is-a-kalman-filter-with-python-code-examples/</link>
                <guid isPermaLink="false">66ba5353f77647345442b9d5</guid>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Data Science ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tiago Capelo Monteiro ]]>
                </dc:creator>
                <pubDate>Wed, 07 Aug 2024 13:42:54 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/08/pexels-skitterphoto-63901.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In a world where precision is key, handling noisy data effectively is crucial for solving complex problems.</p>
<p>Whether you're trying to control a rocket or forecast the stock market, the ability to get good data from an uncertain environment is important.</p>
<p>This is exactly the problem Kalman filters help solve. Kalman filters offer a solution that help you deal with noisy data in many fields.</p>
<p>In this article, we'll discuss:</p>
<ul>
<li><a class="post-section-overview" href="#heading-driving-through-fog-kalman-filters-as-your-headlights">Driving Through Fog: Kalman Filters as Your Headlights</a></li>
<li><a class="post-section-overview" href="#heading-what-are-kalman-filters">What are Kalman Filters?</a></li>
<li><a class="post-section-overview" href="#heading-kalman-filters-in-action-a-step-by-step-code-tutorial">Kalman Filters in Action: A Step-by-Step Code Example</a></li>
<li><a class="post-section-overview" href="#heading-conclusion-navigating-nonlinear-data-with-advanced-techniques">Conclusion: Navigating Nonlinear Data with Advanced Techniques</a></li>
</ul>
<h2 id="Driving">Driving Through Fog: Kalman Filters as Your Headlights</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/pexels-eberhardgross-1287075.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by eberhard grossgasteiger: https://www.pexels.com/photo/forest-under-clouds-1287075/</em></p>
<p>Imagine you are driving through a dense fog with limited visibility.</p>
<p>To reach the destination, you rely on your senses and your car's navigation system that combines real-time data with a predetermined map.</p>
<p>As you move, the car navigation system is always constantly adjusting to get the destination, and you are always relying on your senses to drive the car well.</p>
<p>This process is very similar to how a Kalman Filter works.</p>
<p>It is constantly updating, and it refines estimates based on incoming data. Even though that data is full of noise and uncertainty.</p>
<p>By integrating past information with current information, a Kalman Filter gives you a clear picture of where you are and where you're headed.</p>
<h2 id="heading-what-are-kalman-filters">What are Kalman Filters?</h2>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/pexels-mikebirdy-170811.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by <a target="_blank" href="https://www.pexels.com/photo/blue-bmw-sedan-near-green-lawn-grass-170811/">Mike Bird on Pexels</a></em></p>
<p>A Kalman filter is a math algorithm used to find the state of a dynamic system from many noisy measurements.</p>
<p>It is often used for systems that change over time – like tracking the position of a moving object.</p>
<h3 id="heading-how-does-a-kalman-filter-work">How Does a Kalman Filter Work?</h3>
<p>The Kalman filter predicts your current state based on past data, like the map and your previous location.</p>
<p>When new data appears, like new GPS signals, the filter compares the new data with its prediction and adjusts its estimate.</p>
<p>Even if the data is noisy, the Kalman filter uses a smart averaging process to improve the estimation. Like how you balance what your navigation system tells you and what you see on the road.</p>
<p>By always integrating new data with past data, Kalman filters help you know where you are and where you are going. This way, it is possible to predict things even in uncertain conditions.</p>
<h3 id="heading-why-are-kalman-filters-used-in-engineering">Why are Kalman Filters used in engineering?</h3>
<p>Since Kalman filters are able to handle incomplete data, they are widely used to make good predictions even when the measurements are not certain.</p>
<p>This makes them very useful for:</p>
<ul>
<li><strong>Navigation Systems</strong>: Estimating the position and velocity of vehicles.</li>
<li><strong>Robotics</strong>: Helping robots understand their environment and position.</li>
<li><strong>Finance</strong>: Filtering out noise from stock price data to predict trends.</li>
</ul>
<p>This way, they are very adaptive and can process real-time information</p>
<h3 id="heading-what-problem-did-kalman-filters-solve">What problem did Kalman Filters solve?</h3>
<p>Kalman filters were developed by Rudolf Kalman in the early 1960s to solve the problem of managing uncertainty and noise in data</p>
<p>Nowadays, they are great for extracting meaningful information from noisy data.</p>
<p>Mathematically, Kalman Filters are called linear quadratic estimators.</p>
<p>This is because, in the process of estimating the future based on current and past data, Kalman filters use:</p>
<ul>
<li>Linear algebra: The study of vectors and matrices used to solve linear equations.</li>
<li>Quadratic optimization: Finding the optimal solution for problems with squared terms</li>
</ul>
<h2 id="Kalman">Kalman Filters in Action: A Step-by-Step Code Tutorial</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/pexels-captainsopon-3402846.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by capt.sopon: https://www.pexels.com/photo/gray-airplane-control-panel-3402846/</em></p>
<p>Kalman Filters were created to handle linear systems – that is, systems that follow predictable patterns.</p>
<p>In this code example, we will implement an Extended Kalman Filter. This is a variant that was created to handle non-linear data (in other words, systems that have unpredictable or changing patterns).</p>
<p>Here's the full code (which we'll break down below):</p>
<pre><code><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> filterpy.kalman <span class="hljs-keyword">import</span> ExtendedKalmanFilter <span class="hljs-keyword">as</span> EKF
<span class="hljs-keyword">from</span> filterpy.common <span class="hljs-keyword">import</span> Q_discrete_white_noise

def fx(x, dt):
    <span class="hljs-string">""</span><span class="hljs-string">" State transition function for the nonlinear system. "</span><span class="hljs-string">""</span>
    # Example: x<span class="hljs-string">' = [x[0] + x[1]*dt, x[1]]
    F = np.array([x[0] + x[1]*dt, x[1]])
    return F

def hx(x):
    """ Measurement function for the nonlinear system. """
    # Example: z = [x[0]]
    return np.array([x[0]])

def jacobian_F(x, dt):
    """ Jacobian of the state transition function. """
    return np.array([[1, dt],
                     [0, 1]])

def jacobian_H(x):
    """ Jacobian of the measurement function. """
    return np.array([[1, 0]])

# Initialize EKF
ekf = EKF(dim_x=2, dim_z=1)

# Initial state
ekf.x = np.array([0, 1])

# Initial state covariance
ekf.P = np.eye(2)

# Process noise covariance
ekf.Q = Q_discrete_white_noise(dim=2, dt=1, var=0.1)

# Measurement noise covariance
ekf.R = np.array([[0.1]])

# Define the state transition and measurement functions
ekf.F = jacobian_F
ekf.H = jacobian_H

# Control input
dt = 1.0  # time step

# Simulated measurements
measurements = [1, 2, 3, 4, 5]

for z in measurements:
    # Predict step
    ekf.predict_update(z, HJacobian=jacobian_H, Hx=hx, Fx=fx, args=(dt,), hx_args=())

    # Print the current state estimate
    print("Estimated state:", ekf.x)</span>
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/1-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Full code</em></p>
<p>Let's see the code block by block.</p>
<h3 id="heading-import-the-libraries">Import the Libraries</h3>
<pre><code><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> filterpy.kalman <span class="hljs-keyword">import</span> ExtendedKalmanFilter <span class="hljs-keyword">as</span> EKF
<span class="hljs-keyword">from</span> filterpy.common <span class="hljs-keyword">import</span> Q_discrete_white_noise
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/2-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Importing Libraries</em></p>
<p>In this part of the code we import the Python libraries we need:</p>
<ul>
<li><strong><code>import numpy as np</code></strong>: This imports a tool called <a target="_blank" href="https://numpy.org/">NumPy</a>, which helps us work with numbers and lists of numbers (like a spreadsheet).</li>
<li><strong><code>from [filterpy](https://filterpy.readthedocs.io/en/latest/).kalman import ExtendedKalmanFilter as EKF</code></strong>: This brings in a tool called <code>ExtendedKalmanFilter</code> from the <code>filterpy</code> library. We will use this tool, named <code>EKF</code> here, to track things that change over time in a way that's not straight-line simple.</li>
<li><strong><code>from [filterpy](https://filterpy.readthedocs.io/en/latest/).common import Q_discrete_white_noise</code></strong>: This imports a function that helps us set up noise, which is like the natural "fuzziness" or uncertainty in our system.</li>
</ul>
<h3 id="heading-define-how-the-system-works">Define How the System Works</h3>
<pre><code>def fx(x, dt):
    <span class="hljs-string">""</span><span class="hljs-string">" State transition function for the nonlinear system. "</span><span class="hljs-string">""</span>
    # Example: x<span class="hljs-string">' = [x[0] + x[1]*dt, x[1]]
    return np.array([x[0] + x[1]*dt, x[1]])

def hx(x):
    """ Measurement function for the nonlinear system. """
    # Example: z = [x[0]]
    return np.array([x[0]])</span>
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/3-3.png" alt="Image" width="600" height="400" loading="lazy">
<em>Define How the System Works</em></p>
<p>In this code we define how the system will work:</p>
<ul>
<li><strong><code>fx(x, dt)</code></strong>: This function describes how our system changes over time. It says the new position is the old position plus speed times time (<code>x[0] + x[1]*dt</code>). The speed (<code>x[1]</code>) stays the same.</li>
<li><strong><code>hx(x)</code></strong>: This function tells us what we can measure from the system. Here, it says we can measure the position (<code>x[0]</code>).</li>
</ul>
<h3 id="heading-define-how-changes-affect-the-system">Define How Changes Affect the System</h3>
<pre><code>def jacobian_F(x, dt):
    <span class="hljs-string">""</span><span class="hljs-string">" Jacobian of the state transition function. "</span><span class="hljs-string">""</span>
    <span class="hljs-keyword">return</span> np.array([[<span class="hljs-number">1</span>, dt],
                     [<span class="hljs-number">0</span>, <span class="hljs-number">1</span>]])

def jacobian_H(x):
    <span class="hljs-string">""</span><span class="hljs-string">" Jacobian of the measurement function. "</span><span class="hljs-string">""</span>
    <span class="hljs-keyword">return</span> np.array([[<span class="hljs-number">1</span>, <span class="hljs-number">0</span>]])
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/4-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Define How the System Works</em></p>
<p>In this code we define how changes affect the system:</p>
<ul>
<li><strong><code>jacobian_F(x, dt)</code></strong>: This function shows us how sensitive the system is to changes in time and position. It helps the filter predict changes more accurately by considering these sensitivities.</li>
<li><strong><code>jacobian_H(x)</code></strong>: This function tells us how sensitive our measurement is to changes in position. It helps the filter adjust the prediction based on new measurements.</li>
</ul>
<h3 id="heading-set-up-the-kalman-filter">Set Up the Kalman Filter</h3>
<pre><code># Initialize EKF
ekf = EKF(dim_x=<span class="hljs-number">2</span>, dim_z=<span class="hljs-number">1</span>)

# Initial state
ekf.x = np.array([<span class="hljs-number">0</span>, <span class="hljs-number">1</span>])
print(<span class="hljs-string">"Initial state:"</span>, ekf.x)

# Initial state covariance
ekf.P = np.eye(<span class="hljs-number">2</span>)
print(<span class="hljs-string">"Initial state covariance:\n"</span>, ekf.P)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/5-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Set Up the Kalman Filter</em></p>
<p>In this part of the code, we create a very simple Kalman filter:</p>
<ul>
<li><strong><code>ekf = EKF(dim_x=2, dim_z=1)</code></strong>: This creates an Extended Kalman Filter that tracks two things (position and speed) and one measurement (position).</li>
<li><strong><code>ekf.x = np.array([0, 1])</code></strong>: This sets the starting position to <code>0</code> and speed to <code>1</code>.</li>
</ul>
<p>It prints out:</p>
<pre><code>Initial state: [<span class="hljs-number">0</span> <span class="hljs-number">1</span>]
</code></pre><ul>
<li><strong><code>ekf.P = np.eye(2)</code></strong>: This is a way of saying we aren't very sure about our starting guesses. It's like saying "let's start from here, but we are open to changes."</li>
</ul>
<p>It prints out:</p>
<pre><code>Initial state covariance:
 [[<span class="hljs-number">1.</span> <span class="hljs-number">0.</span>]
 [<span class="hljs-number">0.</span> <span class="hljs-number">1.</span>]]
</code></pre><h3 id="heading-describe-uncertainty-in-the-system">Describe Uncertainty in the System</h3>
<pre><code># Process noise covariance
ekf.Q = Q_discrete_white_noise(dim=<span class="hljs-number">2</span>, dt=<span class="hljs-number">1</span>, <span class="hljs-keyword">var</span>=<span class="hljs-number">0.1</span>)
print(<span class="hljs-string">"Process noise covariance:\n"</span>, ekf.Q)

# Measurement noise covariance
ekf.R = np.array([[<span class="hljs-number">0.1</span>]])
print(<span class="hljs-string">"Measurement noise covariance:\n"</span>, ekf.R)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/6-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Describe Uncertainty in the System</em></p>
<ul>
<li><strong><code>ekf.Q = Q_discrete_white_noise(dim=2, dt=1, var=0.1)</code></strong>: This sets how much randomness or unpredictability we expect in the system itself. It's like saying, "things might not move exactly as we think."</li>
</ul>
<p>It prints out:</p>
<pre><code>Process noise covariance:
 [[<span class="hljs-number">0.025</span> <span class="hljs-number">0.05</span> ]
 [<span class="hljs-number">0.05</span>  <span class="hljs-number">0.1</span>  ]]
</code></pre><ul>
<li><strong><code>ekf.R = np.array([[0.1]])</code></strong>: This sets how much we trust our measurements. A smaller number means we trust them more.</li>
</ul>
<pre><code>Measurement noise covariance:
 [[<span class="hljs-number">0.1</span>]]
</code></pre><h3 id="heading-simulate-data-and-initial-state">Simulate Data and Initial State</h3>
<pre><code># Control input
dt = <span class="hljs-number">1.0</span>  # time step

# Simulated measurements
measurements = [<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>, <span class="hljs-number">4</span>, <span class="hljs-number">5</span>]

# True initial state <span class="hljs-keyword">for</span> comparison (not used <span class="hljs-keyword">in</span> the EKF)
true_state = np.array([<span class="hljs-number">0</span>, <span class="hljs-number">1</span>])
print(<span class="hljs-string">"\nTrue initial state:"</span>, true_state)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/7-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Simulate Data and Initial State</em></p>
<ul>
<li><strong><code>dt = 1.0</code></strong>: This is the time between each step of our simulation.</li>
<li><strong><code>measurements = [1, 2, 3, 4, 5]</code></strong>: These are the pretend measurements we will use to test the filter.</li>
<li><strong><code>true_state = np.array([0, 1])</code></strong>: This is the real starting position and speed of our system, used for comparison.</li>
</ul>
<p>It gives:</p>
<pre><code>True initial state: [<span class="hljs-number">0</span> <span class="hljs-number">1</span>]
</code></pre><h3 id="heading-simulate-real-system-changes">Simulate Real System Changes</h3>
<pre><code># Simulate the <span class="hljs-literal">true</span> state evolution (<span class="hljs-keyword">for</span> comparison)
true_states = [true_state[<span class="hljs-number">0</span>]]
<span class="hljs-keyword">for</span> _ <span class="hljs-keyword">in</span> range(len(measurements) - <span class="hljs-number">1</span>):
    true_state = fx(true_state, dt)
    true_states.append(true_state[<span class="hljs-number">0</span>])

print(<span class="hljs-string">"\nSimulated true states (for reference):"</span>, true_states)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/8-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Simulate Real System Changes</em></p>
<ul>
<li><strong>Simulating True States</strong>: This part calculates what the real position should be over time using the way the system works (<code>fx</code>). It's like having a perfect GPS to check against our estimates.</li>
</ul>
<pre><code>Simulated <span class="hljs-literal">true</span> states (<span class="hljs-keyword">for</span> reference): [<span class="hljs-number">0</span>, <span class="hljs-number">1.0</span>, <span class="hljs-number">2.0</span>, <span class="hljs-number">3.0</span>, <span class="hljs-number">4.0</span>]
</code></pre><h3 id="heading-filter-steps-to-estimate-the-state">Filter Steps to Estimate the State</h3>
<pre><code><span class="hljs-keyword">for</span> i, z <span class="hljs-keyword">in</span> enumerate(measurements):
    print(f<span class="hljs-string">"\nStep {i+1}:"</span>)
    print(<span class="hljs-string">"Measurement:"</span>, z)

    # Predict step
    ekf.predict(u=<span class="hljs-number">0</span>)  # Use predict_x <span class="hljs-keyword">if</span> you need to customize the prediction
    print(<span class="hljs-string">"Predicted state before update:"</span>, ekf.x)

    # Update step
    ekf.update(z, HJacobian=jacobian_H, Hx=hx, args=(), hx_args=())
    print(<span class="hljs-string">"Updated state after measurement:"</span>, ekf.x)
    print(<span class="hljs-string">"State covariance after update:\n"</span>, ekf.P)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/9.png" alt="Image" width="600" height="400" loading="lazy">
<em>Filter Steps to Estimate the State</em></p>
<p><strong>Loop Through Measurements</strong>: This loop goes through each fake measurement one by one.</p>
<ul>
<li><strong>Predict Step (<code>ekf.predict(u=0)</code>)</strong>: Before looking at the new measurement, the filter makes a guess about where the position and speed are now.</li>
<li><strong>Update Step (<code>ekf.update</code>)</strong>: After the guess, the filter sees the new measurement and adjusts its guess to be closer to this measurement, balancing the new information with what it previously predicted.</li>
</ul>
<p>Here are the results:</p>
<pre><code>Step <span class="hljs-number">1</span>:
Measurement: <span class="hljs-number">1</span>
Predicted state before update: [<span class="hljs-number">0.</span> <span class="hljs-number">1.</span>]
Updated state after measurement: [<span class="hljs-number">0.91111111</span> <span class="hljs-number">1.04444444</span>]
State covariance after update:
 [[<span class="hljs-number">0.09111111</span> <span class="hljs-number">0.00444444</span>]
 [<span class="hljs-number">0.00444444</span> <span class="hljs-number">1.09777778</span>]]

Step <span class="hljs-number">2</span>:
Measurement: <span class="hljs-number">2</span>
Predicted state before update: [<span class="hljs-number">0.91111111</span> <span class="hljs-number">1.04444444</span>]
Updated state after measurement: [<span class="hljs-number">1.49614396</span> <span class="hljs-number">1.31876607</span>]
State covariance after update:
 [[<span class="hljs-number">0.05372751</span> <span class="hljs-number">0.0251928</span> ]
 [<span class="hljs-number">0.0251928</span>  <span class="hljs-number">1.1840617</span> ]]

Step <span class="hljs-number">3</span>:
Measurement: <span class="hljs-number">3</span>
Predicted state before update: [<span class="hljs-number">1.49614396</span> <span class="hljs-number">1.31876607</span>]
Updated state after measurement: [<span class="hljs-number">2.15857605</span> <span class="hljs-number">1.95145631</span>]
State covariance after update:
 [[<span class="hljs-number">0.0440489</span>  <span class="hljs-number">0.0420712</span> ]
 [<span class="hljs-number">0.0420712</span>  <span class="hljs-number">1.25242718</span>]]

Step <span class="hljs-number">4</span>:
Measurement: <span class="hljs-number">4</span>
Predicted state before update: [<span class="hljs-number">2.15857605</span> <span class="hljs-number">1.95145631</span>]
Updated state after measurement: [<span class="hljs-number">2.91071524</span> <span class="hljs-number">2.95437384</span>]
State covariance after update:
 [[<span class="hljs-number">0.04084552</span> <span class="hljs-number">0.05446424</span>]
 [<span class="hljs-number">0.05446424</span> <span class="hljs-number">1.30228131</span>]]

Step <span class="hljs-number">5</span>:
Measurement: <span class="hljs-number">5</span>
Predicted state before update: [<span class="hljs-number">2.91071524</span> <span class="hljs-number">2.95437384</span>]
Updated state after measurement: [<span class="hljs-number">3.74022237</span> <span class="hljs-number">4.27039095</span>]
State covariance after update:
 [[<span class="hljs-number">0.03970292</span> <span class="hljs-number">0.06298888</span>]
 [<span class="hljs-number">0.06298888</span> <span class="hljs-number">1.33648045</span>]]
</code></pre><h2 id="Beyond">Conclusion: Navigating Nonlinear Data with Advanced Techniques</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/pexels-noellegracephotos-906055.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by <a target="_blank" href="https://www.pexels.com/photo/close-up-photography-of-magnifying-glass-906055/">Noelle Otto on Pexels</a></em></p>
<p>Kalman Filters are a powerful tool for extracting accurate estimates from noisy and incomplete data. </p>
<p>Variants like the Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF) have been developed to address non-linearities in data. </p>
<p>However, these variants can still face challenges related to stability and accuracy when applied to complex non-linear systems. </p>
<p>This is due to their reliance on linear approximations, which may not capture the full dynamics of highly non-linear processes.</p>
<p>To overcome these limitations, alternative methods such Neural Network-based approaches have gained attention. </p>
<p>Neural Networks can learn complex patterns directly from data, offering a robust solution for highly non-linear scenarios.</p>
<p>Despite these advancements, Kalman Filters remain an important tool in various fields of science and economics due to their simplicity, efficiency, and effectiveness in a wide range of applications. </p>
<p>As technology continues to evolve, the integration of Kalman Filters with other advanced techniques will likely enhance their capability to navigate the challenges of non-linear data more effectively.</p>
<p>Here is the full code:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code">https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code</a></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Unlock the Power of Data Analytics with Free Google Services ]]>
                </title>
                <description>
                    <![CDATA[ The ability to analyze and interpret data is indispensable across many different industries. And you don't even need to buy any software to expertly analyize data. We just posted a course on the freeCodeCamp.org YouTube channel that will teach you ab... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/data-analytics-with-google-stack/</link>
                <guid isPermaLink="false">66b201f4a2135cc2539a2169</guid>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ youtube ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Beau Carnes ]]>
                </dc:creator>
                <pubDate>Mon, 25 Mar 2024 19:57:04 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/03/dataanalytics.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>The ability to analyze and interpret data is indispensable across many different industries. And you don't even need to buy any software to expertly analyize data.</p>
<p>We just posted a course on the freeCodeCamp.org YouTube channel that will teach you about data analytics through hands-on projects, leveraging a suite of free Google services. This course from Analyst Adithya offers a practical approach to understanding and applying data analytics tools.</p>
<h4 id="heading-data-analysis-with-google-sheets">Data Analysis with Google Sheets</h4>
<p>The course starts with Google Sheets, a versatile tool that goes beyond basic spreadsheet functionalities. You will learn how to perform sophisticated data analysis tasks, manipulate data sets, and employ formulas and functions to uncover insights. Google Sheets' accessibility and user-friendly interface make it an excellent starting point for those new to data analytics, providing a solid foundation in data manipulation and analysis.</p>
<h4 id="heading-sql-with-google-bigquery">SQL with Google BigQuery</h4>
<p>As you progress, you are introduced to Google BigQuery, a powerful tool for querying massive datasets using SQL (Structured Query Language). This segment of the course is designed to bridge the gap between traditional spreadsheet analysis and more advanced database management. By exploring BigQuery, you will gain proficiency in SQL, learning how to execute complex queries, join tables, and extract meaningful information from large data sets.</p>
<h4 id="heading-python-with-google-colab">Python with Google Colab</h4>
<p>The course then teaches Python programming with Google Colab, an environment that allows users to write and execute Python code through their browsers. This section is particularly exciting for those looking to harness the power of Python for data analysis, offering a hands-on experience with coding, data manipulation, and the application of libraries like Pandas and NumPy. Google Colab's collaborative features also introduce learners to the concept of shared, interactive computing.</p>
<h4 id="heading-data-visualization-with-google-looker-studio">Data Visualization with Google Looker Studio</h4>
<p>Finally, the course ends with data visualization using Google Looker Studio (formerly known as Google Data Studio), where learners will discover how to transform their analyzed data into compelling, interactive reports and dashboards. This module emphasizes the importance of data storytelling, teaching participants how to present their findings visually to communicate insights effectively and influence decision-making.</p>
<h3 id="heading-conclusion">Conclusion</h3>
<p> By covering a range of Google services, the course provides a broad perspective on data analytics, ensuring that you are well-versed in various tools and methodologies. </p>
<p>Watch the full course on <a target="_blank" href="https://www.youtube.com/watch?v=NnSIKA77pD8">the freeCodeCamp.org YouTube channel</a> (3-hour watch).</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/NnSIKA77pD8" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Learn Data Analytics with New Advanced Certifications from Google ]]>
                </title>
                <description>
                    <![CDATA[ Google has been growing a list of certifications for the past few years. And they've just added some advanced coursework to the repertoire.  In this article, I'll introduce you to the Google Certifications and detail what's new. I'll focus on the Dat... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/starting-and-excelling-in-data-analytics/</link>
                <guid isPermaLink="false">66b8de24abe19f6180038a35</guid>
                
                    <category>
                        <![CDATA[ data analysis ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Google ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Eamonn Cottrell ]]>
                </dc:creator>
                <pubDate>Fri, 07 Apr 2023 20:47:20 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/04/Data-Analytics-Advanced-Certifications.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Google has been growing a list of certifications for the past few years. And they've just added some advanced coursework to the repertoire. </p>
<p>In this article, I'll introduce you to the Google Certifications and detail what's new. I'll focus on the Data Analytics pathway, but will include resources to their other programs.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/learn.gif" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-video-walkthrough">Video Walkthrough</h2>
<p>Here's a video walkthrough of what we'll discuss below.</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/dogpnO3IU_8" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<h2 id="heading-grow-with-google-overview">Grow With Google Overview</h2>
<p>Career Certifications from Google are a part of their "Grow with Google" initiative. A portion of their <a target="_blank" href="https://grow.google/our-mission/">mission page</a> is in the picture below 👇. </p>
<p>Grow with Google is an assortment of training and learning resources for small business owners, job seekers, veterans, educators, developers and startups to learn vital digital skills.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-42.png" alt="Image" width="600" height="400" loading="lazy">
<em>From Grow with Google's mission: https://grow.google/our-mission/</em></p>
<p>Part of Grow with Google are the <a target="_blank" href="https://grow.google/certificates/#?modal_active=none">Career Certificates pathways</a> which offer flexible online training programs in <a target="_blank" href="https://grow.google/certificates/digital-marketing-ecommerce/#?modal_active=none">Digital Marketing &amp; E-commerce</a>, <a target="_blank" href="https://grow.google/certificates/it-support/#?modal_active=none">IT Support</a>, <a target="_blank" href="https://grow.google/certificates/data-analytics/#?modal_active=none">Data Analytics</a>, <a target="_blank" href="https://grow.google/certificates/project-management/#?modal_active=none">Project Management</a>, and <a target="_blank" href="https://grow.google/certificates/ux-design/#?modal_active=none">UX Design</a>.👇</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-47.png" alt="Image" width="600" height="400" loading="lazy">
<em>picture of Grow with Google's Career Certificates page</em></p>
<p>The proposition is attractive: no experience necessary to begin the programs, learn at your own pace, have a well-regarded certification from Google at the end of it and be qualified for a new entry-level job in careers with a median salary of $72,000 across the certificate fields. </p>
<p>The median salaries range from $57,000 for <a target="_blank" href="https://grow.google/certificates/it-support/#?modal_active=none">IT Support</a> to $112,000 for <a target="_blank" href="https://grow.google/certificates/ux-design/#?modal_active=none">UX Design</a>.</p>
<p>The entire certificate program is a partnership between Google, who has crafted the courses, and Coursera, which is one of the OG MOOCS that hosts the coursework. </p>
<p>This of course begs the question: how much does it cost? 🤔</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/howmuch.gif" alt="Image" width="600" height="400" loading="lazy">
<em>gif of man saying, how much money?</em></p>
<p>The good news is that you can nab the knowledge for free. And if you fork over a little bit, you can have the official certificate to display proudly on that portion of your LinkedIn page rarely ever scrolled to. 😀</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-58.png" alt="Image" width="600" height="400" loading="lazy">
<em>pic of my LinkedIn licenses &amp; certifications</em></p>
<h3 id="heading-the-free-version">The Free Version:</h3>
<p>Yes, the knowledge is free. You can audit any of these courses at Coursera for free. So, while there's not an option to audit the whole certificate, you can click into the individual courses and audit them one by one.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/audit.png" alt="Image" width="600" height="400" loading="lazy">
<em>picture of audit option in coursera courser</em></p>
<p> I've clicked into the Foundations of Data Science course and selected audit. In the screenshot below, you can see that I now have access to all the coursework.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-62.png" alt="Image" width="600" height="400" loading="lazy">
<em>screenshot of Foundations of Data Science coursework on coursera</em></p>
<p>What don't you get with the free version?</p>
<p>Well, you don't get that nice certificate "proving" you did the work. But if you've got the knowledge, you can muster together projects that will do a better job of proving that anyway. </p>
<p>And you also may not be able to turn in certain assignments for grades, depending on the course.</p>
<h3 id="heading-the-not-free-version">The Not Free Version:</h3>
<p>Coursera costs $39 a month for any of the individual Google certificates, including the new advanced ones. And since it's at your own pace, if you work fast, you can finish in one, two, or three months instead of the projected six...saving you some $$$. 💵💰</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-59.png" alt="Image" width="600" height="400" loading="lazy">
<em>pic of Coursera's individual course pricing</em></p>
<p>As is now common, Coursera also offers a more expensive Plus program for $59 a month or $399 a year. This gives unlimited access to courses and could be a better deal if you plan on plowing through many courses during a year.</p>
<p>But, you'd have to be flying through them for this to be valuable considering you could by 10 individual courses for $390.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-60.png" alt="Image" width="600" height="400" loading="lazy">
<em>pic of coursera plus costs</em></p>
<p>My opinion: if you're strapped for cash, start knocking out the free audit versions. But, it's pretty cheap to grab a certificate if that's important to you. Particularly if you work hard and fast 👉 it could cost as little as one month's $39 fee.</p>
<h2 id="heading-my-experience-with-the-data-analytics-cert">My Experience with the Data Analytics Cert</h2>
<p>I took the Data Analytics Certification a while back, and can attest that it was a fantastic overview of the field. I'd been using Excel heavily before taking the course, and was familiar with programming and database concepts, so I was able to move at a brisk pace.</p>
<p>I spent 2-3 months working through the "6 month" course, and certainly could have gone faster. Six months is conservative, even for a novice, in my opinion.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-57.png" alt="Image" width="600" height="400" loading="lazy">
<em>screenshot of my data analytics certificate</em></p>
<p>This certificate took some flak because it used R instead of Python for the programming portion. But I thought it was well laid out and offered a good introduction to data analytics.</p>
<p>Would it make someone hirable? </p>
<p>Well, it's just one piece in that puzzle. Relying on any of these certifications alone to land a job is perhaps unwise. It will take a combination of the skills learned, the networking you're willing to do, and your continued work through projects to prove your skills and land a job.</p>
<p>On the whole, I recommend it, especially if you're new to Excel, SQL, or programming.</p>
<p>And if you're wondering, yes, I ponied up the cash and enrolled in the program. It took me a little over two months and so I paid $117 +tax.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-63.png" alt="Image" width="600" height="400" loading="lazy">
<em>My payment history for Data Analytics coursework</em></p>
<h2 id="heading-advanced-certificates">Advanced Certificates</h2>
<p>My foray into the data analytics was one of curiosity more than anything. I was not actively seeking new employment – I simply wanted to sharpen and expand my skills. And that's exactly what piqued my interest in the new, advanced certificates that are now available.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/advanced.png" alt="Image" width="600" height="400" loading="lazy">
<em>pic of banner advertising new advanced Certificates</em></p>
<p>In addition to Industry Specializations (<a target="_blank" href="https://blog.google/outreach-initiatives/grow-with-google/industry-specializations/">added in October 2022</a>), there are now three advanced certificates (<a target="_blank" href="https://blog.google/outreach-initiatives/grow-with-google/advanced-google-data-analytics-career-certificates/">as of April 2023</a>) with more likely in the future: <a target="_blank" href="https://grow.google/certificates/data-analytics/?advanced#?modal_active=none">Advanced Data Analytics</a>, <a target="_blank" href="https://grow.google/certificates/data-analytics/?advanced#?modal_active=none">Business Intelligence</a>, and <a target="_blank" href="https://grow.google/certificates/it-support/?advanced#?modal_active=none">IT Automation with Python</a>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-64.png" alt="Image" width="600" height="400" loading="lazy">
<em>pic of advanced certificate listings from Grow with Google</em></p>
<p>According to Google's announcement post, the Data Analytics Certificate program was the most popular professional certificate on Coursera, so it's unsurprising that this field has received the first advanced follow-up courses. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/popular-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Excerpt from announcement post</em></p>
<p>And the field is well paid, making it an attractive aspiration for career transitioners and students alike.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-65.png" alt="Image" width="600" height="400" loading="lazy">
_pic of advanced data analytics certificate details from https://grow.google/certificates/data-analytics/?advanced#?modal<em>active=none</em></p>
<p>Of particular note, Google has wisely opted to go with Python and Jupyter Notebook (industry standard tools) as detailed in the above card for the Advanced Data Analytics Certificate.</p>
<p>Below, you can see similar details for the Business Intelligence Certificate which focuses more heavily on modeling, dashboards, and engaging with stakeholders via SQL, Tableau, and BigQuery.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-67.png" alt="Image" width="600" height="400" loading="lazy">
_pic of details from business intelligence certificate from https://grow.google/certificates/data-analytics/?advanced#?modal<em>active=none</em></p>
<p>I'm excited to try out both courses and see how deep they go after the introductory, pretty surface level course. And right now, it has that same fresh, new, adventurous feel as when a new WoW server launches 🧙‍♂️. Only 3 enrolled in the Data Analytics Advanced degree!😲😅</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/start.png" alt="Image" width="600" height="400" loading="lazy">
<em>pic of advanced data analytics certificate page on Coursera</em></p>
<p>And there are only 4 in the Business Intelligence program! This is sure to change quickly, and indeed, there are a few hundred in the individual courses already. Looks like many will indeed opt to audit the courses before jumping into the paid program.</p>
<h2 id="heading-coursework-overview">Coursework Overview</h2>
<p>Here are the respective courses in the two new advanced certificates. I encourage you to go check them out yourselves for further details, or watch my <a target="_blank" href="https://youtu.be/dogpnO3IU_8">video walkthrough</a>, but this will give you an idea of what to expect.</p>
<h3 id="heading-data-analytics-courses">Data Analytics Courses</h3>
<p><a target="_blank" href="https://www.coursera.org/google-certificates/advanced-data-analytics-certificate?utm_source=google&amp;utm_medium=institutions&amp;utm_campaign=gwgsite">Go here</a> to check out the overview of each data analytics course.</p>
<p>Click into each course below to explore in more detail and see all the modules within each course. This program is estimated by Google to take six months to complete.</p>
<ol>
<li><a target="_blank" href="https://www.coursera.org/learn/foundations-of-data-science?specialization=advanced-data-analytics-certificate">Foundations of Data Science</a></li>
<li><a target="_blank" href="https://www.coursera.org/learn/get-started-with-python?specialization=advanced-data-analytics-certificate">Get Started with Python</a></li>
<li><a target="_blank" href="https://www.coursera.org/learn/go-beyond-the-numbers-translate-data-into-insight?specialization=advanced-data-analytics-certificate">Go Beyond the Numbers: Translate Data into Insights</a></li>
<li><a target="_blank" href="https://www.coursera.org/learn/the-power-of-statistics?specialization=advanced-data-analytics-certificate">The Power of Statistics</a></li>
<li><a target="_blank" href="https://www.coursera.org/learn/regression-analysis-simplify-complex-data-relationships?specialization=advanced-data-analytics-certificate">Regression Analysis: Simplify Complex Data Relationships</a></li>
<li><a target="_blank" href="https://www.coursera.org/learn/the-nuts-and-bolts-of-machine-learning?specialization=advanced-data-analytics-certificate">The Nuts and Bolts of Machine Learning</a></li>
<li><a target="_blank" href="https://www.coursera.org/learn/google-advanced-data-analytics-capstone?specialization=advanced-data-analytics-certificate">Google Advanced Data Analytics Capstone</a></li>
</ol>
<p>You can see from the screenshot below the estimated hours to complete each module as well as the higher $118,000 median salary for entry-level roles:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-68.png" alt="Image" width="600" height="400" loading="lazy">
<em>screenshot of coursework for Advanced Data Analytics</em></p>
<h3 id="heading-business-intelligence-courses">Business Intelligence Courses</h3>
<p><a target="_blank" href="https://www.coursera.org/professional-certificates/google-business-intelligence">Go here</a> to check out the overview of each business intelligence course.</p>
<p>Click into each course below to explore in more detail and see all the modules within each course. This shorter program is estimated by Google to take two months to complete.</p>
<ol>
<li><a target="_blank" href="https://www.coursera.org/learn/foundations-of-business-intelligence?specialization=business-intelligence-certificate">Foundations of Business Intelligence</a></li>
<li><a target="_blank" href="https://www.coursera.org/learn/the-path-to-insights-data-models-and-pipelines?specialization=business-intelligence-certificate">The Path to Insights: Data Models and Pipelines</a></li>
<li><a target="_blank" href="https://www.coursera.org/learn/decisions-decisions-dashboards-and-reports?specialization=business-intelligence-certificate">Decisions, Decisions: Dashboards and Reports</a></li>
</ol>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/image-69.png" alt="Image" width="600" height="400" loading="lazy">
<em>screenshot of Business Intelligence certificate coursework</em></p>
<h2 id="heading-after-the-certificate">After the Certificate</h2>
<p>Google has also developed an <a target="_blank" href="https://grow.google/employers/#?modal_active=none">employer consortium</a> of over 150 companies which will consider recent graduates of Google's programs for hiring. While this is certainly no guarantee of employment, if you're in the market it is a nice touch, and will be another helpful tool in your search.</p>
<h2 id="heading-summary">Summary</h2>
<p>I hope this has been a helpful overview for you! </p>
<p>I make weekly spreadsheet and coding videos on my <a target="_blank" href="https://www.youtube.com/@eamonncottrell?sub_confirmation=1">YouTube channel</a>. Come check it out if you find this helpful! Here's a link to my Google Sheets playlist where I walk through basic and advanced topics alike:</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/undefined" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<p>You can also find me on <a target="_blank" href="https://www.linkedin.com/in/eamonncottrell/">LinkedIn</a>. Have a great one!👋</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Get Started in Data Analytics – A Roadmap for Beginners ]]>
                </title>
                <description>
                    <![CDATA[ Hello and welcome to the world of data analysis! If you're considering a career in this field, you're in good company. Data analysis is a growing and exciting field that's becoming increasingly important in today's data-driven world. Let's face it, w... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/data-analytics-roadmap/</link>
                <guid isPermaLink="false">66d46097f855545810e934bf</guid>
                
                    <category>
                        <![CDATA[ data analysis ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Jeremiah Oluseye ]]>
                </dc:creator>
                <pubDate>Thu, 23 Mar 2023 00:06:48 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/03/dataa.JPG" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Hello and welcome to the world of data analysis! If you're considering a career in this field, you're in good company. Data analysis is a growing and exciting field that's becoming increasingly important in today's data-driven world.</p>
<p>Let's face it, we're all drowning in data these days. From social media posts to financial transactions to medical records, there's no shortage of information to sift through.</p>
<p>That's where data analysts come in. They're the ones who help us make sense of all that data and turn it into valuable insights.</p>
<p>And those insights can be game-changing. They can help businesses improve their products and services, governments make more informed policy decisions, and individuals make better choices in their personal and professional lives.</p>
<p>But it's not just about the impact. Data analysis can also be quite lucrative. According to recent studies, the median salary for a data analyst in the US is around $70,000 per year, and that number can climb even higher with experience and expertise.</p>
<p>Of course, like any profession, data analysis has its challenges. There's the occasional headache-inducing data set, the ever-present threat of imposter syndrome, and the endless debates over the best programming language or data visualization tool. But hey, if you're up for a challenge, this could be the field for you.</p>
<p>So if you're trying to be a Data Analyst, this article is for you. Hopefully, it saves you a lot of time and effort and you don't have to waste your time learning a whole lot of irrelevant things like I once did.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/tenor-1.gif" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-step-1-get-to-know-the-role-of-a-data-analyst">Step 1: Get to Know the Role of a Data Analyst</h2>
<p>Are you considering a career as a data analyst? That's fantastic! Let's take a moment to grasp the gist of it before moving forward.</p>
<p>You see, a data analyst's job involves more than just crunching figures and creating charts (although those things can be pretty cool, too). It involves leveraging data to find insights and address issues. This means coming up with the correct questions, organizing and evaluating the data, and explaining your conclusions to others.</p>
<p>Some core skills and activities that a data analyst typically performs include:</p>
<ul>
<li><p>Collecting and analyzing large data sets to identify patterns, trends, and insights that can inform business decisions.</p>
</li>
<li><p>Using statistical tools and techniques to draw insights from data.</p>
</li>
<li><p>Developing and implementing data collection systems and other strategies that optimize efficiency and data quality.</p>
</li>
<li><p>Collaborating with other teams to identify business needs and develop data solutions that address them.</p>
</li>
<li><p>Communicating findings and insights to stakeholders in a clear and actionable way.</p>
</li>
</ul>
<p>So if you're not a math prodigy or a computer whiz, don't worry. Statistics, computer science, and business are just a few of the numerous disciplines that are incorporated into the multidisciplinary topic of data analysis. Everyone who enjoys learning, solving problems, and making a difference can really enjoy it.</p>
<p>So how can you find out more about what a data analyst does? You can start by looking through the numerous free resources that are accessible online. For example, <a target="_blank" href="https://www.freecodecamp.org/news/what-does-a-data-analyst-do-data-analyst-job-description/">here an article that discusses what data analysts actually do</a>. And <a target="_blank" href="https://www.freecodecamp.org/news/data-analyst-vs-data-scientist-whats-the-difference/">here's one that discusses data analyst vs data science roles</a>.</p>
<p>Many blogs, podcasts, and YouTube channels offer entertaining and informative content on the subject of data analysis. I will be dropping some YouTube channels that have helped me over the years below.</p>
<p>To gain a feel of the skills and qualities needed, you can also network with other data analysts, go to meetings or seminars, and study job descriptions.</p>
<p>Don't forget to consider whether dealing with data is something you enjoy. Do you enjoy finding patterns and solving puzzles? Do you want to change the world for the better? Data analysis may be the ideal career choice for you if the answer is yes.</p>
<p>The first step in your path is to understand what a data analyst does. Enjoy yourself, take your time, and don't be hesitant to ask questions.</p>
<h2 id="heading-step-2-explore-job-requirements-for-data-analyst-roles">Step 2: Explore Job Requirements for Data Analyst Roles</h2>
<p>Now that you have a better understanding of the role of a data analyst, it's time to start looking at what employers are looking for. After all, you want to make sure that your skills and knowledge match up with what's required in the job market.</p>
<p>But before you start panicking about not having enough experience, remember that every company is different. Some may prioritize programming skills, while others may value communication and business acumen. That's why it's important to do your research and find out what specific skills and qualifications are most in demand in your desired industry or company.</p>
<p>So how do you go about finding this information? Well, one great place to start is by checking out job listings and descriptions on job boards like LinkedIn, Indeed, or Glassdoor. This can give you a good sense of the key requirements and qualifications for different data analyst roles.</p>
<p>Some examples of what job listings might ask for include:</p>
<ul>
<li><p>Proficiency in SQL and experience working with large datasets</p>
</li>
<li><p>Familiarity with Python and data visualization tools like Tableau or Power BI</p>
</li>
<li><p>Strong analytical skills and the ability to draw insights from complex data</p>
</li>
<li><p>Experience with statistical analysis and modeling techniques</p>
</li>
<li><p>Excellent communication skills and the ability to explain complex findings to both technical and non-technical audiences</p>
</li>
</ul>
<p>But don't stop there! You can also reach out to people who work in the field or who have job titles that interest you. Ask them about their experience and what skills they think are most important for success in their role. You might even want to consider setting up informational interviews to learn more about the field and get advice on how to get started.</p>
<p>And speaking of getting started, it's important to remember that there's no substitute for hands-on experience. As tempting as it may be to spend all your time watching tutorials, you'll learn much more quickly and effectively by actually building things and working on real data analysis projects.</p>
<p>So take some time to explore job requirements, but don't forget to keep building your skills and gaining practical experience. With a little effort and a lot of curiosity, you'll be well on your way to becoming a successful data analyst.</p>
<h2 id="heading-step-3-get-comfortable-with-math-and-statistics">Step 3: Get Comfortable with Math and Statistics</h2>
<p>Okay, I know what you're thinking. Math and statistics? Yikes!</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/im-out-gif-11-1.gif" alt="Image" width="600" height="400" loading="lazy"></p>
<p>But hear me out before you run for the hills. For a data analyst to be able to make sense of data and derive valuable insights from it, having a fundamental understanding of these concepts is essential.</p>
<p>So what fundamental statistical concepts and formulas should you be familiar with?</p>
<p>Now, to start, there are measures of central tendency known as mean, median, and mode, which can give you an idea of the typical value in a dataset. You should know how to calculate them.</p>
<p>The standard deviation is a measure of how widely distributed the data are from the average, and you should be able to calculate it.</p>
<p>And in order to find relationships between variables and generate predictions based on those associations, you should also be familiar with correlation and regression.</p>
<p>But it's not just about numbers. You'll also need a rudimentary understanding of linear algebra, which is employed in many data analysis approaches. <a target="_blank" href="https://www.freecodecamp.org/news/linear-algebra-full-course/">Here's an in-depth course (and textbook)</a> to get you started.</p>
<p>You may need to employ matrices to modify and manipulate data, or you may need to use <a target="_blank" href="https://www.freecodecamp.org/news/data-science-with-python-8-ways-to-do-linear-regression-and-measure-their-speed-b5577d75f8b/">linear regression</a> to forecast future values based on historical trends.</p>
<p>If you don't have a solid math background, this may seem difficult. But don't be concerned! There are numerous resources available to assist you in your learning.</p>
<p>For example, Khan Academy offers lessons and practice tasks in math and statistics. If you prefer books, "Data Science for Beginners" by Andrew Park is an excellent resource that covers both statistical and mathematical principles in an accessible manner.</p>
<p>freeCodeCamp is developing a math curriculum <a target="_blank" href="https://www.freecodecamp.org/news/freecodecamp-foundational-math-curriculum/">which you can read about here</a>.</p>
<p>And <a target="_blank" href="https://www.freecodecamp.org/news/statistics-for-data-science/">here's a guide on the statistics you need to know</a> to get into data science and pursue fields like Machine Learning.</p>
<p>The key is to start small and build up your knowledge gradually. Don't be afraid to ask questions or seek help when you need it. With a little practice and persistence, you'll soon find that math and statistics are actually kind of fun (no, seriously!).</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/raw.gif" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-step-4-master-excel-for-data-analysis">Step 4: Master Excel for Data Analysis</h2>
<p>Excel is a vital tool in a data analyst's arsenal. It's used by virtually every organization out there, and mastering it will help you clean, manipulate, and analyze data with ease.</p>
<p>With Excel, you can create formulas and functions to perform calculations, pivot tables and charts to visualize data, and use data analysis tools to make predictions and identify patterns. Excel is particularly useful for regression analysis, forecasting, and scenario analysis.</p>
<p>If you're serious about becoming a data analyst, it's essential to master Excel. Fortunately, there are plenty of online resources available to help you learn. Check out <a target="_blank" href="https://www.youtube.com/@excelisfun">ExcelIsFun</a>, <a target="_blank" href="https://www.youtube.com/playlist?list=PLmejDGrsgFyCZ4YC5s8mgdQztj7zt5to5">Excel Chandoo</a>, <a target="_blank" href="https://www.youtube.com/playlist?list=PLWPirh4EWFpEpO6NjjWLbKSCb-wx3hMql">Tutorials Point,</a> <a target="_blank" href="https://www.youtube.com/@AshutoshKumaryt">Ashutosh Kumar</a> , and <a target="_blank" href="https://www.youtube.com/@MyOnlineTrainingHub">MyOnlineTrainingHub</a> for tutorials on Youtube. Also, the following courses will guide you on how to get the most out of Excel.</p>
<ol>
<li><p><a target="_blank" href="https://www.coursera.org/learn/excel-data-analysis?irclickid=WskXxw2EKxyNRBjSCewfUQQZUkARwUz2LzeJ2A0&amp;irgwc=1&amp;utm_medium=partners&amp;utm_source=impact&amp;utm_campaign=1359419&amp;utm_content=b2c">Introduction to Data Analysis Using Excel</a> by Coursera</p>
</li>
<li><p><a target="_blank" href="https://www.freecodecamp.org/news/data-analysis-with-python-for-excel-users-course/">Data Analysis with Python for Excel Users</a> on freeCodeCamp's YouTube channel</p>
</li>
<li><p><a target="_blank" href="https://www.coursera.org/specializations/excel?irclickid=WskXxw2EKxyNRBjSCewfUQQZUkARwUQvLzeJ2A0&amp;irgwc=1&amp;utm_medium=partners&amp;utm_source=impact&amp;utm_campaign=1359419&amp;utm_content=b2c">Excel Skills for Business Specialization</a> by Coursera</p>
</li>
<li><p><a target="_blank" href="https://www.edx.org/course/analyzing-and-visualizing-data-with-excel-2?irclickid=RrSQ3tWpyxyNTDdSXVVnIRUdUkARwR0WLzeJ2A0&amp;utm_source=affiliate&amp;utm_medium=guru99&amp;utm_campaign=Online%20Tracking%20Link_&amp;utm_content=ONLINE_TRACKING_LINK&amp;irgwc=1">Analyzing and Visualizing Data with Excel</a> by EdX</p>
</li>
</ol>
<p>Remember, Excel is just one tool in your data analysis toolkit. But it's a crucial one that you'll use daily as a data analyst. By mastering Excel, you'll be well-equipped to handle any data-related task that comes your way.</p>
<p>Now let's move on to the next skill and also one of the most important skills you'll need as a data analyst.</p>
<h2 id="heading-step-5-master-sql-for-data-extraction">Step 5: Master SQL for Data Extraction</h2>
<p>SQL (Structured Query Language) is a critical tool in data analysis. As a data analyst, one of your primary responsibilities is to extract data from databases, and SQL is the language used to do so.</p>
<p>SQL is more than just running basic queries like SELECT, FROM, and WHERE. It's a complex language that allows you to manipulate and transform data in countless ways. SQL is used for joining data from multiple tables, filtering and aggregating data, and creating new tables and views.</p>
<p>To be an effective data analyst, it's essential to master SQL. You should be comfortable with writing queries, creating tables, and understanding how to optimize your queries for performance.</p>
<p>Fortunately, there are many resources available to help you learn SQL. Some great places to start are <a target="_blank" href="https://www.khanacademy.org/computing/computer-programming/sql/">Khan Academy SQL</a>, W3Schools, <a target="_blank" href="https://sqlzoo.net/wiki/SQL_Tutorial">SQLZoo</a>, <a target="_blank" href="https://sqlbolt.com/">SQLbolt</a>, <a target="_blank" href="https://www.youtube.com/@LukeBarousse">Luke Barousse</a>, <a target="_blank" href="https://www.youtube.com/@AlexTheAnalyst">Alex the Analyst</a>, <a target="_blank" href="https://www.youtube.com/@TheOyinbooke">Microsoft Power Tools</a>, and finally some SQL games like SQL island and SQL Murder.</p>
<p>Additionally, there are many online courses and books available that cover SQL in-depth. Here are a few to get you started:</p>
<ol>
<li><p><a target="_blank" href="https://www.freecodecamp.org/news/sql-and-databases-full-course/">Learn SQL and Databases – Full Course for Beginners</a></p>
</li>
<li><p><a target="_blank" href="https://www.freecodecamp.org/news/learn-sql-free-relational-database-courses-for-beginners/#relational-database-freecodecamp-curriculum">Relational Database curriculum from freeCodeCamp</a></p>
</li>
<li><p><a target="_blank" href="https://www.freecodecamp.org/news/learn-sql-free-relational-database-courses-for-beginners/">Collection of free relational database courses</a></p>
</li>
</ol>
<p>By mastering SQL, you'll be able to extract valuable insights from databases and manipulate data in ways that provide meaningful business insights.</p>
<h2 id="heading-step-6-learn-python-for-data-analysis">Step 6: Learn Python for Data Analysis</h2>
<p>I know there's a lot of speculation as to whether or not a data analyst needs Python – some say they do while some say they don't.</p>
<p>For me, I'd like to say it depends on the company you're working for – but it's nice to have an edge by learning Python as it's one of the most widely used programming languages in the world of data analysis. Python is known for its simplicity, readability, and versatility, making it a popular choice for data analysts.</p>
<p>Python has a <a target="_blank" href="https://www.freecodecamp.org/news/python-data-science-course-matplotlib-pandas-numpy/">vast array of libraries and tools</a> that can make data analysis easier, such as Pandas for data manipulation and analysis, NumPy for scientific computing, and Matplotlib for data visualization. It also has the <a target="_blank" href="https://www.freecodecamp.org/news/python-automation-scripts/">ability to automate tasks</a>, making data analysis more efficient and effective.</p>
<p><a target="_blank" href="https://www.freecodecamp.org/news/learn-data-analysis-with-python-course/">Learning Python for data analysis</a> is a great investment in your career as a data analyst. Not only will it allow you to work with powerful libraries, but it will also open up many opportunities to work with larger datasets and more complex analyses.</p>
<p>There are many resources available to help you learn Python, from free online courses to paid online programs and textbooks. Some resources include <a target="_blank" href="https://www.youtube.com/@freecodecamp">freeCodeCamp</a>, DataCamp, <a target="_blank" href="https://www.youtube.com/@codebasics">CodeBasis</a>, <a target="_blank" href="https://www.youtube.com/@programmingwithmosh">Programming with Mosh</a> and <a target="_blank" href="https://learn.microsoft.com?wt.mc_id=studentamb_207021">Microsoft Learn.</a></p>
<p>By learning Python, you'll be able to perform more complex data analysis, automate tasks, and work with a broader range of datasets, making you a valuable asset in any data-focused organization.</p>
<h2 id="heading-step-7-master-a-data-visualization-tool">Step 7: Master a Data Visualization Tool</h2>
<p>As a data analyst, it's essential to be able to communicate your findings in a clear and concise manner. One way to do this is through data visualization. Data visualization tools like <a target="_blank" href="https://www.freecodecamp.org/news/python-in-powerbi/">PowerBI</a> and Tableau can help you create interactive charts, graphs, and dashboards that make it easy for others to understand your findings. We'll talk about them more in a minute.</p>
<p>Here's a <a target="_blank" href="https://www.freecodecamp.org/news/tableau-for-data-science-and-data-visualization-crash-course/">Tableau for Data Science and Data Visualization course</a> you can check out.</p>
<p>While SQL is great for querying and manipulating data, it can't fully bring your data to life. This is where a data visualization tool comes in. These tools allow you to transform your data into insightful and easy-to-understand visualizations that can be shared with stakeholders.</p>
<p>You can <a target="_blank" href="https://www.freecodecamp.org/news/learn-data-visualization-in-this-free-17-hour-course/">learn data visualization basics in this in-depth free course</a> on freeCodeCamp's YouTube channel.</p>
<p>PowerBI is a great choice for data visualization as it is easy to learn and integrate with other Microsoft products. This makes it an ideal tool for organizations that use Microsoft Office. Tableau is also a popular choice and has a strong community of users and a wide range of features.</p>
<p>Learning a data visualization tool like PowerBI or Tableau will enable you to create compelling visualizations that help you better understand your data and communicate your findings to others. There are many online courses and tutorials available to help you learn these tools, such as the official <a target="_blank" href="https://learn.microsoft.com/en-us/training/modules/get-started-with-power-bi/">Microsoft PowerBI training</a> and <a target="_blank" href="https://trailheadacademy.salesforce.com/classes/TVA101-Tableau-Visual-Analytics">Tableau's own training courses</a>.</p>
<p>By mastering a data visualization tool, you'll be able to create interactive and engaging visualizations that will help you better understand your data and communicate your findings to others, making you an invaluable asset to any data-focused organization.</p>
<p>You can also dive into other popular data viz tools like D3.js - <a target="_blank" href="https://www.freecodecamp.org/news/data-visualization-using-d3-course/">here's a course on it to get you started</a>.</p>
<h2 id="heading-step-8-network-with-other-data-analysts-and-developers">Step 8: Network with Other Data Analysts and Developers</h2>
<p>Networking is an essential part of any profession, and data analytics is no exception. By networking with other data analysts and developers, you can learn from their experiences, get insights on the latest industry trends and technologies, and potentially find job opportunities.</p>
<p>Here are a few ways to network with others in the field:</p>
<ol>
<li><p>Attend industry events: Look for conferences, meetups, and other events related to data analytics and attend them. This is a great way to meet others in the field and learn about new developments and technologies.</p>
</li>
<li><p>Join online communities: There are many online communities for data analysts and developers, such as forums, LinkedIn groups, and social media groups. Join these communities and participate in discussions to connect with others in the field.</p>
</li>
<li><p>Reach out to others: Don't be afraid to reach out to other data analysts and developers, whether through social media, email, or in person. Introduce yourself, ask for advice, and build relationships.</p>
</li>
</ol>
<p>Remember, networking is a two-way street. Be willing to offer help and advice to others in the field as well. By building a strong network of contacts in the data analytics field, you can enhance your career opportunities and stay up to date on the latest industry trends and technologies.</p>
<h2 id="heading-step-9-dont-forget-about-soft-skills">Step 9: Don't Forget about "Soft Skills"</h2>
<p>One final skill that I think needs to be worked on before you can be a great DA is your soft skills which involves your ability to communicate, solve problems etc</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/03/Analyst.jpeg" alt="Image" width="600" height="400" loading="lazy"></p>
<p>After all is said and done, practice and practice and build projects.</p>
<h2 id="heading-conclusion">Conclusion.</h2>
<p>Becoming a data analyst requires dedication, hard work, and a passion for data analysis. Following the steps outlined in this roadmap will help you gain the necessary skills and knowledge to become a successful data analyst.</p>
<p>From understanding the role of a data analyst, to mastering SQL and Python, to networking with other developers, each step is crucial to achieving success in this field.</p>
<p>Remember to stay curious, never stop learning, and always be willing to adapt to new technologies and methodologies. With determination and persistence, you can achieve your goal of becoming a proficient data analyst and unlock a world of exciting career opportunities.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ What is Data Analytics? Data Analysis and Definition for Beginners ]]>
                </title>
                <description>
                    <![CDATA[ Data analytics is the process of collecting, organizing, and analyzing raw data from different sources. You can then gain insights that'll help organizations make important predictions and decisions.  Data analytics mostly involves studying data tren... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/what-is-data-analytics-data-analysis-and-definition-for-beginners/</link>
                <guid isPermaLink="false">66b0a389b23875658c0760c9</guid>
                
                    <category>
                        <![CDATA[ data analysis ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ihechikara Abba ]]>
                </dc:creator>
                <pubDate>Fri, 18 Nov 2022 21:55:08 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/11/stephen-dawson-qwtCeJ5cLYs-unsplash.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Data analytics is the process of collecting, organizing, and analyzing raw data from different sources. You can then gain insights that'll help organizations make important predictions and decisions. </p>
<p>Data analytics mostly involves studying data trends over a given period, and then extracting useful information from these trends. </p>
<h2 id="heading-why-is-data-analytics-important">Why Is Data Analytics Important?</h2>
<p><strong>More precise decision making process:</strong> Data analytics helps organizations make more accurate decisions based on the insights gotten from data trends over time. </p>
<p>For example, a company selling different products can figure out what time of the year different products sell higher. This will enable them boost production of such products at the required time. </p>
<p>A better decision making process will eliminate the need for guess work, and minimize losses and avoidable risks. </p>
<p><strong>Improved customer satisfaction:</strong> When you're able to serve customers, you retain them and keep business going. Insights gotten from data analytics can help you understand exactly what your customers want and when to act. </p>
<p>Data analytics also enables businesses to identify their target audience easily.</p>
<p><strong>Improved business strategy:</strong> Data analytics helps organizations channel their resources towards the most efficient strategies. </p>
<p><strong>Performance evaluation:</strong> Data analytics can help organizations evaluate how well or badly they've performed over a specified period. This will enable them make important decisions for the future of the organization. </p>
<p>Although the points listed above seem to be from the business point of view, that's not the only industry where data analytics is important. </p>
<p>You can see data analytics being used in healthcare, education, agriculture, and so on. </p>
<h2 id="heading-types-of-data-analytics">Types of Data Analytics</h2>
<p>There are mainly four different types of data analytics:</p>
<ul>
<li><strong>Descriptive analytics:</strong> This type of analytics has to do with <strong>what</strong> happened with analyzed data over a specified period of time. </li>
<li><strong>Diagnostic analytics:</strong> Diagnostic data analytics shows the "<strong>why</strong>" in a data trend. This involves having a deeper look into why certain patterns were present in the data.</li>
<li><strong>Predictive analytics:</strong> The goal here is to foretell what is <strong>expected to happen</strong> in the future based on the outcomes of analyzed data over time. </li>
<li><strong>Prescriptive analytics:</strong> In prescriptive analytics, the results from data analysis is used to make recommendations on <strong>what to do next</strong>.</li>
</ul>
<h2 id="heading-what-is-the-difference-between-data-analysis-and-data-analytics">What Is the Difference Between Data Analysis and Data Analytics?</h2>
<p>You'll come across different definitions of data analytics and data analysis. </p>
<p>Some sources would define data analytics and data analysis as the same. Others would use them interchangeably. </p>
<p>Although, they are closely related, these terms have slightly different meanings. They are similar because they aid in the decision making process.</p>
<h3 id="heading-what-is-data-analysis">What Is Data Analysis?</h3>
<p>Data analysis is the process of studying what has happened in the past in a dataset. There is no need to extend this definition. </p>
<p>Data analysis studies the <strong>why</strong> and <strong>how</strong> of data trends. Yes, it involves data collection, organization, and "analysis". </p>
<p>"How did the users respond to a new feature?".</p>
<p>"Why did the rate of purchase of a product fall during a particular period?".</p>
<p>Data analysts can make use of programming languages when analyzing data or <a target="_blank" href="https://www.freecodecamp.org/news/data-visualization-tools-guide/">data visualization tools</a>. </p>
<h3 id="heading-what-is-data-analytics">What Is Data Analytics?</h3>
<p>Data analytics is the process of taking insights gained from the analysis of past data trends, and making predictions or decisions for the future. </p>
<p>In the beginning of the article, we defined data analytics to include both analysis and analytics. This is mainly as a convention. </p>
<p>Analytics is used to proffer solutions or make recommendations. </p>
<h2 id="heading-summary">Summary</h2>
<p>There is data everywhere. We create them on a daily basis. But data in its raw form has no real meaning. </p>
<p>In order to understand the behavior of data over time, we have to group the data together, study them, and derive useful insights.</p>
<p>This article explained what data analytics is, the importance of data analytics, and the types of data analytics. </p>
<p>We also explained the difference between data analysis and data analytics. </p>
<p>Thank you for reading!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Data Analyst VS Data Scientist – What's the Difference? ]]>
                </title>
                <description>
                    <![CDATA[ Data analyst and data scientist are two career paths in big data. And while they do have similarities, each requires different skills.  The basic difference between the two is that a data scientist works to capture data while a data analyst tries to ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/data-analyst-vs-data-scientist-whats-the-difference/</link>
                <guid isPermaLink="false">66adf0a53bf50764799b9ca7</guid>
                
                    <category>
                        <![CDATA[ data analysis ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Data Science ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data scientist ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Kolade Chris ]]>
                </dc:creator>
                <pubDate>Thu, 17 Nov 2022 16:18:21 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/11/web-g1c2368440_1280.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Data analyst and data scientist are two career paths in big data. And while they do have similarities, each requires different skills. </p>
<p>The basic difference between the two is that a data scientist works to capture data while a data analyst tries to gain insights from that data.</p>
<p>This article is for you if you’re interested in a career in big data and you don’t know whether you'd want to be a data analyst or data scientist. It will also help you if you just want to know the differences between a data analyst and a data scientist. </p>
<h2 id="heading-what-well-cover">What We'll Cover</h2>
<ul>
<li><a class="post-section-overview" href="#heading-what-is-data-analytics-and-who-is-a-data-analyst">What is Data Analytics and Who is a Data Analyst?</a><ul>
<li><a class="post-section-overview" href="#heading-what-does-a-data-analyst-do">What does a Data Analyst Do?</a>  </li>
<li><a class="post-section-overview" href="#heading-how-to-become-a-data-analyst">How to Become a Data Analyst</a></li>
</ul>
</li>
<li><a class="post-section-overview" href="#heading-what-is-data-science-and-who-is-a-data-scientist">What is Data Science and Who is a Data Scientist? </a><ul>
<li><a class="post-section-overview" href="#heading-what-does-a-data-scientist-do">What does a Data Scientist Do?</a></li>
<li><a class="post-section-overview" href="#heading-how-to-become-a-data-scientist">How to Become a Data Scientist</a> </li>
</ul>
</li>
<li><a class="post-section-overview" href="#heading-what-are-the-differences-between-data-analyst-and-data-scientist">What are the Differences between Data Analyst and Data Scientist?</a></li>
<li><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></li>
</ul>
<h2 id="heading-what-is-data-analytics-and-who-is-a-data-analyst">What is Data Analytics and Who is a Data Analyst?</h2>
<p><strong>Data analytics</strong> bridges the gap between <strong>data science</strong> and <strong>business analytics</strong>. It is the systematic approach of processing raw data and subsequently extracting meaningful information from it. </p>
<p>The information extracted from the raw data is the focus of data analysis. The professional who does this analysis is a <strong>data analyst</strong>.</p>
<h3 id="heading-what-does-a-data-analyst-do">What does a Data Analyst Do?</h3>
<p>Data analysts make use of statistical and logical techniques to evaluate data. They use tools such as SQL to query databases and extract the needed information that can help companies make better decisions. </p>
<p>To dig into and assess the information from this data, a data analyst uses programming languages like R, SAS, and Python, and tools like <a target="_blank" href="https://www.freecodecamp.org/news/data-visualization-using-d3-course/">D3</a>, <a target="_blank" href="https://www.freecodecamp.org/news/tableau-for-data-science-and-data-visualization-crash-course/">Tableau</a>, and <a target="_blank" href="https://powerbi.microsoft.com/en-us/">Power BI</a>.</p>
<p>In addition, a data analyst cleans up the database by getting rid of redundant and unusable data.</p>
<h3 id="heading-how-to-become-a-data-analyst">How to Become a Data Analyst</h3>
<p>To become a data analyst, you can earn a relevant degree from an accredited college or university, attend a bootcamp, or learn it yourself. </p>
<p>You can learn to become a data analyst yourself because building a career in a certain field in tech is all about <strong>skills</strong>. Once you have those skills and you can put them into practical use, then you can become a data analyst. </p>
<p>Some job requirements for data analysts include degrees and some don’t. So there’s room for anyone who doesn’t have a degree but has the skills.</p>
<p>As a data analyst, the skills you need are: </p>
<ul>
<li>Soft skills (critical thinking, communication, and others)</li>
<li>Data visualization (D3, Tableau, Power BI)</li>
<li>SQL and (probably) NoSQL</li>
<li>Statistics </li>
<li>Spreadsheets (Excel, Google Sheets, and others)</li>
<li>A few programming languages like Python, R, SAS, and JavaScript for D3</li>
<li>Machine learning </li>
</ul>
<p>It doesn’t end there. You should try to work on projects that make you appear employable to recruiters. You should also try to get an entry-level job that can help you put those skills into real-world practice. And if you can’t find an entry-level job, then you can consider volunteering. </p>
<p>Here are a few resources you can use to get started:</p>
<ol>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-data-analysis-with-python-course/">Learn Data Analysis with Python</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/exploratory-data-analysis-with-numpy-pandas-matplotlib-seaborn/">What is Data Analysis? Full Handbook</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/data-analysis-with-python-for-excel-users-course/">Data Analysis with Python for Excel Users</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/what-does-a-data-analyst-do-data-analyst-job-description/">What does a Data Analyst Do?</a></li>
</ol>
<h2 id="heading-what-is-data-science-and-who-is-a-data-scientist">What is Data Science and Who is a Data Scientist?</h2>
<p>Data science is the development of strategies for capturing data and preparing it for analysis. It also involves processing and developing data models with programming languages like R and Python, then deploying those models into applications. The professional who develops these strategies is called a data scientist.</p>
<h3 id="heading-what-does-a-data-scientist-do">What does a Data Scientist Do?</h3>
<p>A data scientist is more focused on developing and implementing tools that help data analysts analyze the data and extract the needed information from it. </p>
<p>This means data scientists spend their time developing models and preparing algorithms. And if the organization needs to deploy a model, data scientists are in charge of that.</p>
<h3 id="heading-how-to-become-a-data-scientist">How to Become a Data Scientist</h3>
<p>Most data science job openings require a relevant degree such as Statistics and Computer Science. But on a personal note, I’ve seen data science openings that don’t require degrees. </p>
<p>Towards the end of this article, I will link an article that shows you where to see those data science job openings. </p>
<p>Once again, what matters is the skills. Once you have those skills and can put them into use, then you can get a job as a data scientist. </p>
<p>Some of the skills you need to become a data scientist are:</p>
<ul>
<li>Mathematics</li>
<li>Programming (Python, R, SAS)</li>
<li>Statistics</li>
<li>Linear algebra</li>
<li>Machine learning</li>
<li>Cloud computing</li>
<li>SQL and NoSQL (Most openings won’t require NoSQL but it’s a good skill to learn)</li>
<li>Apache Hadoop</li>
<li>Calculus</li>
</ul>
<p>Here are some resources to get you started:</p>
<ol>
<li><a target="_blank" href="https://www.freecodecamp.org/news/hands-on-data-science-course/">Learn the Basics of Data Science - Hands-On Course</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/python-data-science-course-matplotlib-pandas-numpy/">Python for Data Science Course</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/top-statistics-concepts-to-know-before-getting-into-data-science/">Top Statistics Concepts to Know Before Getting Into Data Science</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/23-common-data-science-interview-questions-for-beginners/">Data Science Interview Questions for Beginners</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/first-steps-to-learn-data-science-or-ml-after-the-roadmap/">Programming, Math, and Science Concepts to Know for Data Science</a></li>
</ol>
<h2 id="heading-what-are-the-differences-between-data-analyst-and-data-scientist">What are the Differences between Data Analyst and Data Scientist?</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Basis</td><td>Data Scientist</td><td>Data Analyst</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Programming</strong></td><td>Advance use of languages like Python, R, and SAS</td><td>Basic Knowledge of Python, R, and SAS</td></tr>
<tr>
<td><strong>Skills</strong></td><td>Advanced programming languages, Statistics, Machine learning, cloud computing</td><td>Basic programming languages, statistics, probability, Spreadsheets, Visualization tools</td></tr>
<tr>
<td><strong>Work</strong></td><td>Spend more time developing models, tools, and creating algorithms to ease analysis</td><td>Spend more time writing queries to retrieve data and process data into meaningful information</td></tr>
<tr>
<td><strong>Degree</strong></td><td>Foundational technical background with Bachelor's degree in Computer Science, Statistics, or Infomation systems. Master's degree in Data Science.</td><td>Foundational technical background with Bachelor's degree in Computer Science, Statistics, or Infomation systems. Master's degree in Data Analytics</td></tr>
<tr>
<td><strong>Salary</strong></td><td>$144,729 /year base pay in the US (Indeed)</td><td>$71,717 /year base pay in the US (Indeed)</td></tr>
</tbody>
</table>
</div><h2 id="heading-conclusion">Conclusion</h2>
<p>Data scientist and data analyst are both in-demand career paths you can follow in big data. If you’re confused about which to take get into between the two, here are some things to consider:</p>
<ul>
<li>if you’re well-versed in Mathematics, Statistics, and computer science, either of the two is good for you</li>
<li>if you want to create advanced machine learning models, you should consider getting into <strong>data science</strong></li>
<li>if you are interested in analytics, you’d probably make a great <strong>data analyst</strong>.</li>
</ul>
<p>There’s no black-and-white guide to help you choose between becoming a data scientist and a data analyst. And it's not helpful to say one is better than the other. </p>
<p>In the end, what matters is solving problems and helping humanity learn and improve, not how much a data analyst makes or how much a data scientist makes.</p>
<h3 id="heading-more-general-readings">More General Readings</h3>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/how-to-find-remote-jobs/">freeCodeCamp article on how to find remote jobs</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/what-is-data-science-what-a-data-scientist-actually-does/">What is Data Science</a></li>
<li><a target="_blank" href="https://ischoolonline.berkeley.edu/data-science/what-is-data-analytics/">What is Data Analytics</a> </li>
</ul>
<p>Thank you for reading. </p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Read and Write Data using Azure Databricks ]]>
                </title>
                <description>
                    <![CDATA[ Azure Databricks is a data analytics platform hosted on Microsoft Azure that helps you analyze data using Apache Spark.  Databricks helps you create data apps more quickly. This in turn brings to light valuable insights from your data and helps you c... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-read-and-write-data-using-azure-databricks/</link>
                <guid isPermaLink="false">66ba2dd292031ccc9bfc64d6</guid>
                
                    <category>
                        <![CDATA[ Azure ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analysis ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Salim Oyinlola ]]>
                </dc:creator>
                <pubDate>Mon, 12 Sep 2022 18:49:37 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/09/download--1-.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Azure Databricks is a data analytics platform hosted on Microsoft Azure that helps you analyze data using Apache Spark. </p>
<p>Databricks helps you create data apps more quickly. This in turn brings to light valuable insights from your data and helps you create robust Artificial Intelligence solutions. </p>
<p>Azure Databricks also combines the strength of Databricks as an end-to-end Apache Spark platform with the scalability and security of Microsoft's Azure platform. </p>
<p>In this tutorial, you will learn how to get started with the platform in Microsoft Azure and see how to perform data interactions including reading, writing, and analyzing datasets. By the end of this tutorial, you will be able to use Azure Databricks to read multiple file types, both with and without a schema. </p>
<h3 id="heading-prerequisites"><strong>Prerequisites</strong></h3>
<p>You will need a valid and active Microsoft Azure account.</p>
<ul>
<li><a target="_blank" href="https://azure.microsoft.com/en-us/free/">Free Azure Trial</a>: With this option, you will start with $100 Azure credit and will have 30 days to use it in addition to free services.</li>
<li><a target="_blank" href="https://azure.microsoft.com/en-us/free/students/">Azure for Students</a>: This offer is available for students only. With this option, you will start with $100 Azure credit with no credit card required. You'll get access to popular services for free whilst you have your credit.</li>
</ul>
<h2 id="heading-how-to-create-your-databricks-workspace"><strong>How to Create Your Databricks Workspace</strong></h2>
<p>You must create an Azure Databricks workspace in your Azure subscription before you can utilize Azure Databricks. Go to the <a target="_blank" href="https://portal.azure.com/">Azure portal</a> to do this. As long as you've created a valid and active Microsoft Azure account, this will function.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-137.png" alt="Image" width="600" height="400" loading="lazy">
<em>The Microsoft Azure Home Page</em></p>
<p>Once there, click the <code>Create a resource</code> button.</p>
<p>On the search prompt in the Create a resource page, search for <code>Azure Databricks</code> and select the <code>Azure Databricks</code> option.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-138.png" alt="Image" width="600" height="400" loading="lazy">
<em>The Microsoft Azure page showing the list of popular resources</em></p>
<p>Open the <code>Azure Databricks</code> tab and create an instance. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-140.png" alt="Image" width="600" height="400" loading="lazy">
<em>The Azure Databricks pane.</em></p>
<p>Click the blue <code>Create</code> button (arrow pointed at it) to create an instance. </p>
<p>Then enter the project details before clicking the <code>Review + create</code> button.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-142.png" alt="Image" width="600" height="400" loading="lazy">
<em>The Azure Databricks configuration page</em></p>
<p>It is important to note that the <code>Subscription</code> option shown above will differ from yours. It will depend on the Azure subscription you have available on your account. </p>
<p>Fill the <code>Workspace name</code> field with a globally unique name. Mine is named <code>salim-freeCodeCamp-databricks1</code>. </p>
<p>Enter the location closest to where you are in the <code>Region</code> option. A region is a set of physical data centers that serve as servers. Since I am based in Lagos, Nigeria, I selected <code>South Africa North</code>. </p>
<p>Select the <code>Standard</code> option which includes Apache Spark with Azure AD in the <code>Pricing Tier</code> option. </p>
<p>With all the configurations set, click the <code>Review + create</code> button. The validation process usually takes about two minutes. </p>
<p>With the validation and deployment processes completed for the workspace, launch the workspace using the <code>Launch Workspace</code> button that appears. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-144.png" alt="Image" width="600" height="400" loading="lazy">
<em>The home page for the created instance of Azure databricks - <code>salim-freeCodeCamp-databricks</code></em></p>
<p>Click on the button and you will automatically be signed in using the Azure Directory Single Sign On.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-145.png" alt="Image" width="600" height="400" loading="lazy">
<em>Signing into the workspace of the integration of Microsoft Azure and Databricks</em></p>
<p>The Microsoft Azure Databricks home page will come up in a new tab as shown below:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-146.png" alt="Image" width="600" height="400" loading="lazy">
<em>The Microsoft Azure Databricks home page</em></p>
<p>With the workspace launch, create a cluster using the <code>Create a cluster</code> option on the left of the page.</p>
<p>After you have clicked the button and you have created any prior, you will pick one and build on it. Else, you will have to create a new cluster using the <code>Create Cluster</code> button. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-148.png" alt="Image" width="600" height="400" loading="lazy">
<em>Set the configurations for the Azure Databricks cluster</em></p>
<p>To create the cluster, you have to set the configurations. Choose the <code>Single node</code> option, changing from the <code>Multi node</code> default option, and maintain the other options as default. </p>
<p>Click the <code>Create Cluster</code> button at the bottom of the page. Note that this will take a few minutes and that if the dataset is large, you can explore the <code>Multi node</code> option. </p>
<p>Having created the cluster, import some ready-to-use notebooks by navigating to <code>Workspace</code> <code>&gt;</code> <code>Users</code> <code>&gt;</code> <code>your_account</code> on the left taskbar.</p>
<p>Right-click and select the <code>Import</code> option on the dropdown menu.</p>
<p>With the cluster created, you will then have to import some ready to use notebooks. </p>
<p>To do this, using the left taskbar, you will navigate through <code>Workspace</code> <code>&gt;</code> <code>Users</code> <code>&gt;</code> <code>your_account</code> . Then right-click to see the dropdown menu. You will then select the <code>Import</code> option on the dropdown menu.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-150.png" alt="Image" width="600" height="400" loading="lazy">
<em>The <code>import</code> button will be used to import the dataset to be used</em></p>
<p>Once you click on the <code>Import</code> button, you will then select the <code>URL</code> option and paste the following URL:</p>
<pre><code>https:<span class="hljs-comment">//github.com/salimcodes/microsoft-learning-paths-databricks-notebooks/blob/master/data-engineering/DBC/03-Reading-and-writing-data-in-Azure-Databricks.dbc</span>
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-151.png" alt="Image" width="600" height="400" loading="lazy">
<em>The database folder named <code>03-Reading-and-writing-data-in-Azure-Databricks.dbc</code> will be used,</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-152.png" alt="Image" width="600" height="400" loading="lazy">
<em>You will see he list of files in the <code>03-Reading-and-writing-data-in-Azure-Databricks.dbc</code> database folder</em></p>
<p>The image above is what the workspace will like after downloading the file. As such, you have created a Databricks workspace. </p>
<h2 id="heading-how-to-read-the-data-in-csv-format"><strong>How to Read the Data in CSV Format</strong></h2>
<p>Open the file named <code>Reading Data - CSV</code>.</p>
<p>Upon opening the file, you will see the notebook shown below:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-153.png" alt="Image" width="600" height="400" loading="lazy">
<em>You will see that the cluster created earlier has not been attached.</em></p>
<p>On the top left corner, you will change the dropdown which initially shows <code>Detached</code> to your cluster's name. Mine is named <code>Salim Oyinlola's freeCodeCamp Cluster</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/10/image-154.png" alt="Image" width="600" height="400" loading="lazy">
<em>The cluster initially created is now attahed to the python notebook</em></p>
<p>With your cluster attached, you will then run all the cells one after the other.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/09/image-261.png" alt="image-261" width="600" height="400" loading="lazy">
<em>Running the first cell of the python notebook will initialize the classroom variables &amp; function, mount the dataset and create user-specific database</em></p>
<p>At its core, the notebook simply reads the data in <code>csv</code> format. Then it adds an option that tells the reader that the data contains a header and to use that header to determine our column names. </p>
<p>You can also add an option that tells the reader to infer each column's data types (also known as a schema).</p>
<p>It is important to note that data can be read in different formats such as JSON (with or without schemas), parquet, and table and views. To achieve this, you can simply run the respective notebooks for each format.</p>
<h2 id="heading-how-to-write-data-into-a-parquet-file"><strong>How to Write Data into a Parquet File</strong></h2>
<p>Just as there are many ways to read data, there are many ways to write data. But in this notebook, we'll get a quick peek of how to write data back out to Parquet files.</p>
<p>Apache Parquet is a column storage file format that Hadoop systems (such as Spark and Hive) use. The file format is cross-platform, language independent, and it stores data in a column layout using a binary representation.</p>
<p>Parquet files, which effectively store large datasets, have the extension <code>.parquet</code>.</p>
<p>Like what you did when reading data, you will also run the cells one after the other.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/09/image-275.png" alt="image-275" width="600" height="400" loading="lazy">
<em>The cell to write data into a parquet file</em></p>
<p>Integral to writing into the parquet file is creating a DataFrame. You will be creating one by running this cell.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/09/image-276.png" alt="image-276" width="600" height="400" loading="lazy">
<em>This cell shows that the existing files are being overwritten</em></p>
<p>The <code>.mode"overwrite"</code> method shown below implies that by writing DataFrame to parquet files, you are replacing existing files.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/09/image-277.png" alt="image-277" width="600" height="400" loading="lazy">
<em>The file has been written and saved in an output location.</em></p>
<p>At its core, the notebook reads a <code>.tsv</code> file (the same used to read for the <code>.csv</code> file) and writes it back out as a Parquet file.</p>
<h2 id="heading-how-to-delete-the-azure-databricks-instance-optional"><strong>How to Delete the Azure Databricks Instance (Optional)</strong></h2>
<p>Finally, the Azure resources that you created in this tutorial can incur ongoing costs. To avoid such costs, it is important to delete the resource or resource group that contains all those resources. You can do that by using the Azure portal.</p>
<ul>
<li>Navigate to the Azure portal.</li>
<li>Navigate to the resource group that contains your Azure Databricks instance.</li>
<li>Select <code>Delete resource group</code>.</li>
<li>Type the name of the resource group in the confirmation text box.</li>
<li>Select <code>Delete</code>.</li>
</ul>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>In this tutorial, you have learned the basics about reading and writing data in Azure Databricks.</p>
<p>You now understand the basics of Azure Databricks, including what it is, how to install it, how to read CSV and parquet files, and how to read parquet files into the Databricks file system (DBFS) using compression options.</p>
<p>Finally, I share my writings on <a target="_blank" href="https://twitter.com/SalimOpines">Twitter</a> if you enjoyed this article and want to see more.</p>
<p>Thank you for reading :)</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Perform Data Augmentation in NLP Projects ]]>
                </title>
                <description>
                    <![CDATA[ By Davis David In machine learning, you need to have a large amount of data in order to achieve strong model performance.  Using a method known as data augmentation, you can create more data for your machine learning project. Data augmentation is a c... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-perform-data-augmentation-in-nlp-projects/</link>
                <guid isPermaLink="false">66d84eb9ef84e4cc27cfbe33</guid>
                
                    <category>
                        <![CDATA[ data analysis ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ natural language processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ nlp ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Fri, 24 Jun 2022 15:33:57 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/06/1_eproIleJllsp0enh6HA2Hw.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Davis David</p>
<p>In machine learning, you need to have a large amount of data in order to achieve strong model performance. </p>
<p>Using a method known as data augmentation, you can create more data for your machine learning project. Data augmentation is a collection of techniques that manage the process of automatically generating high-quality data on top of existing data.</p>
<p>In computer vision applications, augmenting approaches are quite common. If you are working on a computer vision project (like image classification), for instance, you can apply dozens of techniques to each image: shift, modify color intensities, scale, rotate, crop, and more.</p>
<p>If you have a tiny dataset for your ML project or wish to reduce overfitting in your machine learning models, it may be a good idea to apply data augmentation approaches.</p>
<blockquote>
<p>“We don’t have better algorithms. We just have more data.”- Peter <a target="_blank" href="https://research.google/people/author205/?ref=hackernoon.com">Norvig</a></p>
</blockquote>
<p>In the field of Natural Language Processing (NLP), the tremendous complexity of language makes it difficult to augment the text. The process of augmenting text data is more challenging and not as straightforward as you might expect.</p>
<p>In this article, you will learn how to use a library called <a target="_blank" href="https://github.com/QData/TextAttack?ref=hackernoon.com">TextAttack</a> to improve data for natural language processing.</p>
<h2 id="heading-what-is-textattack">What is TextAttack?</h2>
<p>TextAttack is a Python framework that was built by the <a target="_blank" href="https://qdata.github.io/qdata-page/?ref=hackernoon.com">QData team</a> for the purpose of conducting adversarial attacks, adversarial training, and data augmentation in natural language processing. </p>
<p>TextAttack has components that can be utilized independently for a variety of basic natural language processing tasks, including sentence encoding, grammar checking, and word substitution.</p>
<p>TextAttack excels in performing the following three functions:</p>
<ol>
<li>Adversarial attacks (Python: <code>**textattack.Attack**</code>, Bash: <code>**textattack attack**</code>).</li>
<li>Data augmentation (Python: <code>**textattack.augmentation.Augmenter**</code>, Bash: <code>**textattack augment**</code>).</li>
<li>Model training (Python: <code>**textattack.Trainer**</code>, Bash: <code>**textattack train**</code>).</li>
</ol>
<p>For this article, we will focus on how to use the TextAttack library for data augmentation.</p>
<h2 id="heading-how-to-install-texattack">How to Install TexAttack</h2>
<p>To use this library, make sure you have Python 3.6 or above in your environment.</p>
<p>Run the following command to install textAttack:</p>
<pre><code class="lang-python">pip install textattack
</code></pre>
<p><strong>Note:</strong> Once you have installed TexAttack, you can run it via the Python module or via the command line.</p>
<h2 id="heading-data-augmentation-techniques-for-text-data">Data Augmentation Techniques for Text Data</h2>
<p>The TextAttack library has various augmentation techniques that you can use in your NLP project to add more text data. </p>
<p>Here are some of the techniques that you can apply:</p>
<h3 id="heading-charswapaugmenter-technique"><code>CharSwapAugmenter</code> technique</h3>
<p>This technique augments words by swapping characters out for other characters.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> textattack.augmentation <span class="hljs-keyword">import</span> CharSwapAugmenter

text = <span class="hljs-string">"I have enjoyed watching that movie, it was amazing."</span>

charswap_aug = CharSwapAugmenter()

charswap_aug.augment(text)
</code></pre>
<p>[‘I have enjoyed watching that omvie, it was amazing.’]</p>
<p>The Augmenter has swapped the word <strong>“movie”</strong> for <strong>“omvie”</strong>.</p>
<h3 id="heading-deletionaugmenter-technique"><code>DeletionAugmenter</code> technique</h3>
<p>This one augments the text by deleting some parts of the text to make new text.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> textattack.augmentation <span class="hljs-keyword">import</span> DeletionAugmenter

text = <span class="hljs-string">"I have enjoyed watching that movie, it was amazing."</span>

deletion_aug = DeletionAugmenter()

deletion_aug.augment(text)
</code></pre>
<p>[‘I have watching that, it was amazing.’]</p>
<p>This method has removed the word <strong>“enjoyed”</strong> to create a new augmented text.</p>
<h3 id="heading-easydataaugmenter-technique"><code>EasyDataAugmenter</code> technique</h3>
<p>This augments the text with a combination of different methods, such as:</p>
<ul>
<li>Randomly swapping the positions of the words in the sentence.</li>
<li>Randomly removing words from the sentence.</li>
<li>Randomly inserting a random synonym of a random word at a random location.</li>
<li>Randomly replacing words with their synonyms.</li>
</ul>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> textattack.augmentation <span class="hljs-keyword">import</span> EasyDataAugmenter

text = <span class="hljs-string">"I was billed twice for the service and this is the second time it has happened"</span>

eda_aug = EasyDataAugmenter()

eda_aug.augment(text)
</code></pre>
<p>[‘I was billed twice for the service and this is the second time it has happen’, ‘I was billed twice for the one service and this is the second time it has happened’, ‘I billed twice for the service and this is the second time it has happened’,<br>‘I was billed twice for the this and service is the second time it has happened’]</p>
<p>As you can see from the augmented texts, it shows different results based on the methods applied. For example in the first augmented text, the last word has been modified from <strong>“happened”</strong> to <strong>“happen”</strong>.</p>
<h3 id="heading-wordnetaugmenter-technique"><code>WordNetAugmenter</code> technique</h3>
<p>This technique can augment the text by replacing it with synonyms from the WordNet thesaurus.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> textattack.augmentation <span class="hljs-keyword">import</span> WordNetAugmenter

text = <span class="hljs-string">"I was billed twice for the service and this is the second time it has happened"</span>

wordnet_aug = WordNetAugmenter()

wordnet_aug.augment(text)
</code></pre>
<p>[‘I was billed twice for the service and this is the second time it has pass’]</p>
<p>This method has changed the word <strong>“happened”</strong> to <strong>“pass”</strong> in order to create a new augmented text.</p>
<h3 id="heading-how-to-create-your-own-augmenter">How to Create Your Own Augmenter</h3>
<p>Importing transformations and constraints from <code>textattack.transformations</code> and <code>textattack.constraints</code> allows you to build your own augmenter from the ground up. </p>
<p>The following is an illustration of the use of the <code>WordSwapRandomCharacterDeletion</code> algorithm to produce augmentations of a string:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> textattack.transformations <span class="hljs-keyword">import</span> WordSwapRandomCharacterDeletion
<span class="hljs-keyword">from</span> textattack.transformations <span class="hljs-keyword">import</span> CompositeTransformation
<span class="hljs-keyword">from</span> textattack.augmentation <span class="hljs-keyword">import</span> Augmenter

my_transformation = CompositeTransformation([WordSwapRandomCharacterDeletion()])
augmenter = Augmenter(transformation=my_transformation, transformations_per_example=<span class="hljs-number">3</span>)

text = <span class="hljs-string">'Siri became confused when we reused to follow her directions.'</span>

augmenter.augment(text)
</code></pre>
<p>[‘Siri became cnfused when we reused to follow her directions.’, ‘Siri became confused when e reused to follow her directions.’, ‘Siri became confused when we reused to follow hr directions.’]</p>
<p>The output shows different augmented texts after implementing the <code>WordSwapRandomCharacterDeletion</code> method. For example, in the first augmented text, the method randomly removes the character “<strong>o”</strong> in the word “<strong>confused”.</strong></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this article, you have learned the significance of data augmentation for your Machine Learning projects. You've also learned how to execute data augmentation for textual data using the TextAttack library.</p>
<p>To the best of my knowledge, these techniques are the most effective approaches available to do the task for your NLP project. Hopefully they’ll be of use to you in your work.</p>
<p>You can also try to use other available augmentation techniques from the TextAttack library such as:</p>
<ul>
<li>EmbeddingAugmenter</li>
<li>CheckListAugmenter</li>
<li>CLAREAugmenter</li>
</ul>
<p>If you learned something new or enjoyed reading this article, please share it so that others can see it. Until then, see you in the next post!</p>
<p>You can also find me on Twitter <a target="_blank" href="https://twitter.com/Davis_McDavid?ref=hackernoon.com">@Davis_McDavid</a>.</p>
<p>And you can read more articles like this <a target="_blank" href="https://hackernoon.com/u/davisdavid">here</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ What Does a Data Analyst Do? Data Analyst Job Description ]]>
                </title>
                <description>
                    <![CDATA[ We live in a digital world, and with each passing day, we create 2.5 quintillion bytes of data. That amount of data is generated by activities like browsing the web, using social media sites and streaming platforms, and communicating via instant mess... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/what-does-a-data-analyst-do-data-analyst-job-description/</link>
                <guid isPermaLink="false">66b1e4ca96a9e0a75592bbdd</guid>
                
                    <category>
                        <![CDATA[ data analysis ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Data Science ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Dionysia Lemonaki ]]>
                </dc:creator>
                <pubDate>Mon, 06 Jun 2022 20:44:16 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/06/carlos-muza-hpjSkU2UYSU-unsplash.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>We live in a digital world, and with each passing day, we create 2.5 <em>quintillion</em> bytes of data.</p>
<p>That amount of data is generated by activities like browsing the web, using social media sites and streaming platforms, and communicating via instant messaging applications with friends and family. </p>
<p>These are just a few of the online activities we take part in on a daily basis.</p>
<p>And that means data is everywhere.</p>
<p>Because of how present data is in our lives and how it affects and powers most of the things we do nowadays, companies have found ways and strategies to use data to process information, predict future trends, and accelerate their growth and profit.</p>
<p>An example of this is that a company will use data to understand the appropriate audience for its product/service. </p>
<p>Then, it will use data to target that audience with content and advertisements to drive sales.</p>
<p>Behind those strategies are data analysts. Their job is to solve problems using data.  </p>
<p>They gather the available data and clean it. Then they process and interpret it and make correlations and relationships between the different data points.</p>
<p>Finally, they make decisions based on the analysis results, which will help solve the problem at hand.</p>
<p>In this article, I will first explain what data analysis is, what data analysts do on a day-to-day basis on their job, as well as the qualifications and technical skills you need to become a data analyst yourself. </p>
<p>Here is what we will cover:</p>
<ol>
<li><a class="post-section-overview" href="#definition">What is data analysis?</a></li>
<li><a class="post-section-overview" href="#responsibilities">What does a data analyst do on a day to day basis?</a><ol>
<li><a class="post-section-overview" href="#business-needs">Understand business needs</a></li>
<li><a class="post-section-overview" href="#gather-data">Gather data</a></li>
<li><a class="post-section-overview" href="#rel-databases">Work with relational databases</a></li>
<li><a class="post-section-overview" href="#organize-data">Organize and clean data</a></li>
<li><a class="post-section-overview" href="#patterns">Analyze and identify patterns in data</a></li>
<li><a class="post-section-overview" href="#visualize">Visualize and Present Data</a></li>
<li><a class="post-section-overview" href="#reports">Conduct Reports On Data Analysis Results</a></li>
</ol>
</li>
<li><a class="post-section-overview" href="#skills">What skills and qualifications do you need to become a data analyst?</a></li>
</ol>
<h2 id="heading-what-is-data-analysis">What Is Data Analysis? <a></a></h2>
<p>Data analysis is the process of analyzing and turning raw data into practical insights. By utilizing those insights, businesses and organizations can make more informed decisions.</p>
<p>In more detail, data analysis includes:</p>
<ul>
<li>Identifying a problem and then identifying the kind and type of data needed for the analysis. This stage also involves creating various hypotheses.</li>
<li>Extracting and gathering information from multiple sources.</li>
<li>Cleaning the extracted data. Data is usually raw and messy, and is not usable in that form. This stage involves fixing typos, correcting errors, filling in missing values, and standardizing the data.</li>
<li>Analyzing data to confirm the initial hypotheses.</li>
<li>Interpreting the results to understand the problem at hand and what is going on. This stage involves presenting the findings and results of the analysis to business executives, key stakeholders, and decision-makers clearly and concisely. Data analysts will use the data to tell a compelling story that will determine the future and next steps for the business.</li>
</ul>
<p>To learn more about data analysis, the data analysis process, and why data analysis is beneficial for businesses, <a target="_blank" href="https://www.freecodecamp.org/news/what-is-data-analysis/">read this article which covers these topics in more detail</a>.</p>
<h2 id="heading-what-does-a-data-analyst-do-on-a-day-to-day-basis">What Does A Data Analyst Do On A Day to Day Basis? <a></a></h2>
<p>In the following sections, I've listed some of the daily tasks and responsibilities of a data analyst.</p>
<p>These will depend on the company (size and sector) and the team the data analyst is on.</p>
<h3 id="heading-understand-business-needs">Understand Business Needs <a></a></h3>
<p>Data analysts need to understand how the business is currently performing and what the current business needs are.</p>
<p>They need to understand what problem the business is trying to solve and what direction it wants to head in the future.</p>
<p>Trying to figure out all those needs will involve working with management and different departments in the company.</p>
<p>It will also involve asking questions about why the data analyst is conducting the analysis in the first place and turning all the questions into hypotheses and actionable tasks.</p>
<h3 id="heading-gather-data">Gather Data <a></a></h3>
<p>Data analysts typically find data themselves and collect it from multiple sources.</p>
<p>The method of gathering data will depend on the type of data they are using.  The way of extracting data will depend on whether the data is quantitative (numerical) or qualitative (non-numerical).</p>
<p>The data that needs to be collected should be relevant to the problem at hand.</p>
<p>Some of the ways of gathering information are:</p>
<ul>
<li>Conducting surveys on user satisfaction</li>
<li>Viewing customer feedback</li>
<li>Tracking website visits and social media analytics</li>
<li>Querying for the most searched keywords</li>
<li>Checking which ads get clicked on the most</li>
</ul>
<p>A data analyst is also responsible for improving the existing processes for gathering and collecting data and looking for ways that the process can be streamlined and more effective.</p>
<h3 id="heading-work-with-relational-databases">Work With Relational Databases <a></a></h3>
<p>When looking for, pulling, and storing information, data analysts will mostly query and work with relational databases using SQL (Structured Query Language) - the language used for communicating with relational databases.</p>
<p>In relational databases, data is stored in rows and columns, and all data points have defined and pre-established relationships with one another.</p>
<p>Data analysts will also be involved in developing and improving the systems and structure of the relational database by modeling the data and defining the schema, which is the behind-the-scenes logic in relational databases.</p>
<h3 id="heading-organize-and-clean-data">Organize and Clean Data <a></a></h3>
<p>When data analysts first gather the required data, most often than not, it will not be in a usable state.</p>
<p>Data analysts will improve the quality and format of the data by filling in missing values, removing duplicates, and identifying outliers and errors.</p>
<p>Cleaning is one of the most necessary tasks in the data analyst’s job since the analysis results will depend on it. </p>
<p>If data is not cleaned, there is a very high chance that the results of the analysis will be skewed.</p>
<h3 id="heading-analyze-and-identify-patterns-in-data">Analyze and Identify Patterns In Data <a></a></h3>
<p>A big part of a data analyst’s job is to interpret complex data and find patterns.</p>
<p>Data analysts connect the dots between a problem and the data available.</p>
<p>Their job is to predict future trends and translate those predictions into something useful, such as answering questions and coming up with actionable next steps for the company.</p>
<h3 id="heading-visualize-and-present-data">Visualize and Present Data <a></a></h3>
<p>A big part of the job of a data analyst is to visualize data with graphs and charts that will answer the initial questions and problems defined in the early stages of the analysis.</p>
<p>Visualizing data is achieved by developing and maintaining dashboards and using data visualization tools such as Tableau.</p>
<p>Visualizing data is a way of communicating findings and presenting data and valuable business insights to the company, which includes stakeholders, leaders, and executives, in a way that is clear and easily understood by non-technical folks.</p>
<p>The presentation will inform business decisions and play a big part in the company's strategy.</p>
<h3 id="heading-conduct-reports-on-data-analysis-results">Conduct Reports On Data Analysis Results <a></a></h3>
<p>Once the data analysis process has come to an end, the data analyst needs to build and write reports that summarize the key findings and insights gathered.</p>
<h2 id="heading-what-skills-and-qualifications-do-you-need-to-become-a-data-analyst">What Skills and Qualifications Do You Need To Become A Data Analyst? <a></a></h2>
<p>One of the first questions you may have about starting a career in data analysis is whether you need a university degree to enter the field.</p>
<p>Although it is not necessary per se, the majority of companies list a university degree as one of the minimum requirements for the job. And that degree could be in computer science, math, statistics, finance, or business.</p>
<p>So, although you can learn the hard technical skills needed for a data analyst job on your own, having a university degree in one of those disciplines definitely wouldn't hurt.</p>
<p>The next question you may have is, what are the hard skills necessary for the job?</p>
<p>Some of them include:</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-microsoft-excel/">Microsoft Excel skills</a> and <a target="_blank" href="https://www.freecodecamp.org/news/learn-google-sheets/">Google Sheets skills</a>. Excel and Google Sheets are two spreadsheet tools used for extracting insights and figuring out patterns from data.</li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-sql-in-10-minutes/">Knowledge of SQL</a>. As mentioned earlier, SQL is a database query language and is used for interacting with data stored in relational databases. As a data analyst, you need to know how to use SQL to perform CRUD (Create Read Update Delete) operations on the data stored in relational databases.</li>
<li>Knowledge of a data visualization tool. A data visualization tool such as <a target="_blank" href="https://www.freecodecamp.org/news/tableau-for-data-science-and-data-visualization-crash-course/">Tableau</a> is used for presenting data insights by creating interactive dashboards.</li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/free-statistics-course/">Knowledge of statistcs</a>. As a data analyst, a big part of the job is working with numbers, so you need to know at least the basics of statistics by taking an intro to statistics course. And some <a target="_blank" href="https://www.freecodecamp.org/news/learn-algebra-to-improve-your-programming-skills/">foundational knowledge in math</a> can be of great help too.</li>
<li>Knowledge of a programming language. Knowing a programming language will help data analysts pull data from various sources, automate repetitive data analysis tasks, and analyze and visualize data to spot patterns to extract meaning.<a target="_blank" href="https://www.freecodecamp.org/learn/scientific-computing-with-python/">Python</a> is a beginner-friendly programming language and is popular for data analysis. <a target="_blank" href="https://www.freecodecamp.org/news/r-programming-course/">Another language commonly used is R</a>, which was specifically designed for performing statistical analysis. That said, it has a steep learning curve.</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Hopefully, you found this guide helpful and got some insight into what the role of a data analyst entails and how you can get started with data analysis.</p>
<p>To learn more, check out freeCodeCamp's <a target="_blank" href="https://www.freecodecamp.org/learn/data-analysis-with-python/">data analysis with Python certification</a>. You will learn data analysis concepts using Python and Python libraries such as NumPy and Pandas. In the end, you will also build 5 projects to claim your certification.</p>
<p>Thank you for reading!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ What is Data Analysis? ]]>
                </title>
                <description>
                    <![CDATA[ Data are everywhere nowadays. And with each passing year, the amount of data we are producing will only continue to increase. There is a large amount of data available, but what do we do with all that data? How is it all used? And what does all that ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/what-is-data-analysis/</link>
                <guid isPermaLink="false">66b1e4e9ab763c1471e56798</guid>
                
                    <category>
                        <![CDATA[ data analysis ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Dionysia Lemonaki ]]>
                </dc:creator>
                <pubDate>Tue, 31 May 2022 15:59:04 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/05/luke-chesser-JKUTrJ4vK00-unsplash.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Data are everywhere nowadays. And with each passing year, the amount of data we are producing will only continue to increase.</p>
<p>There is a large amount of data available, but what do we do with all that data? How is it all used? And what does all that data mean?</p>
<p>It’s not much use if we just collect and store data in a spreadsheet or database and don't look at it, explor it, or research it.</p>
<p>Data analysts use tools and processes to derive meaning from data. They are responsible for collecting, manipulating, investigating, analyzing,  gathering insights, and gaining knowledge from it. </p>
<p>This is one of the reasons data analysts are very high in demand: they play an integral role in business and science.</p>
<p>In this article, I will first go over what data analysis means as a term and explain why it is so important.</p>
<p>I will also break down the data analysis process and list some of the necessary skills required for conducting data analysis.</p>
<p>Here is an overview of what we will cover:</p>
<ol>
<li><a class="post-section-overview" href="#data-intro">What is data?</a></li>
<li><a class="post-section-overview" href="#data-analysis-intro">What is data analysis?</a></li>
<li><a class="post-section-overview" href="#importance">Why is data analysis important?</a><ol>
<li><a class="post-section-overview" href="#targeting">Effective  customer targeting</a></li>
<li><a class="post-section-overview" href="#performance">Measure success and performance</a></li>
<li><a class="post-section-overview" href="#problem-solving">Problem solving</a></li>
</ol>
</li>
<li><a class="post-section-overview" href="#overview">An overview of the data analysis process</a><ol>
<li><a class="post-section-overview" href="#step-1">Step 1: recognising and identifying the questions that need answering</a></li>
<li><a class="post-section-overview" href="#step-2">Step 2: collecting  raw data</a></li>
<li><a class="post-section-overview" href="#step-3">Step 3: cleaning the data</a></li>
<li><a class="post-section-overview" href="#step-4">Step 4:  analyzing the data</a></li>
<li><a class="post-section-overview" href="#step-5">Step 5:  sharing the results</a></li>
</ol>
</li>
<li><a class="post-section-overview" href="#skills">What skills are required for data analysis?</a><ol>
<li><a class="post-section-overview" href="#maths">A good grasp of  maths and statistics</a></li>
<li><a class="post-section-overview" href="#sql">Knowledge of SQL and Relational Databases</a></li>
<li><a class="post-section-overview" href="#programming-language">Knowledge of a programming language</a></li>
<li><a class="post-section-overview" href="#data-viz">Knowledge of data visualization tools</a></li>
<li><a class="post-section-overview" href="#excel">Knowledge of Excel</a></li>
</ol>
</li>
</ol>
<h2 id="heading-what-is-data-meaning-and-definition-of-data">What Is Data? Meaning and Definition of Data <a></a></h2>
<p>Data refers to collections of facts and individual pieces of information.</p>
<p>Data is vital for decision-making, planning, and even telling a story.</p>
<p>There are two broad and general types of data:</p>
<ul>
<li>Qualitative data</li>
<li>Quantitative data</li>
</ul>
<p><strong>Qualitative data</strong> is data expressed in non-numerical characters. </p>
<p>It is expressed as images, videos, text documents, or audio.</p>
<p>This type of data can’t be measured or counted. </p>
<p>It is used to determine how people feel about something – it’s about people's feelings, motivations, opinions, perceptions and involves bias. </p>
<p>It is descriptive and aims to answer questions such as ‘Why’, ‘How’, and ‘What’.</p>
<p>Qualitative data is gathered from observations, surveys, or user interviews.</p>
<p><strong>Quantitative data</strong> is expressed in numerical characters.</p>
<p>This type of data is countable, measurable, and comparable.</p>
<p>It is about amounts of numbers and involves things such as quantity and the average of numbers.</p>
<p>It aims to answer questions such as  ‘How much, ‘How many’, ‘How often’, ‘and 'How long’.</p>
<p>The act of collecting, analyzing, and interpreting quantitative data is known as performing statistical analysis. </p>
<p>Statistical analysis helps uncover underlying patterns and trends in data.</p>
<h2 id="heading-what-is-data-analysis-a-definition-for-beginners">What Is Data Analysis? A Definition For Beginners <a></a></h2>
<p>Data analysis is the act of turning raw, messy data into useful insights by cleaning the data up, transforming it, manipulating it, and inspecting it. </p>
<p>The insights gathered from the data are then presented visually in the form of charts, graphs, or dashboards.</p>
<p>The insights discovered can help aid the company’s or organization’s growth. Decision-makers will be able to come to an actionable conclusion and make the right business decisions.</p>
<p>Extracting knowledge from raw data will help the company/organization take steps towards achieving greater customer reach, improving performance, and increasing profit.</p>
<p>At its core, data analysis is about identifying and predicting trends and figuring out patterns, correlations, and relationships in the available data, and finding solutions to complex problems.</p>
<h2 id="heading-why-is-data-analysis-important">Why Is Data Analysis Important? <a></a></h2>
<p>Data equals knowledge.</p>
<p>This means that data analysis is integral for every business.</p>
<p>It can be useful and greatly beneficial for every department, whether it's administration, accounting, logistics, marketing, design, or engineering, to name a few.</p>
<p>Below I will explain why exploring data and giving data context and meaning is really important.</p>
<h3 id="heading-data-analysis-improves-customer-targeting">Data Analysis Improves Customer Targeting <a></a></h3>
<p>By analyzing data, you understand your competitors, and you will be able to match your product/service to the current market needs.</p>
<p>It also helps you determine the appropriate audience and demographic best suited to your product or service.</p>
<p>This way, you will be able to come up with an effective pricing strategy to make sure that your product/service will be profitable. </p>
<p>You will also be able to create more targeted campaigns and know what methods and forms of advertising and content to use to reach your audience directly and effectively. </p>
<p>Knowing the right audience for your product or service will transform your whole strategy. It will become more customer-oriented and customized to fit customers' needs.</p>
<p>Essentially, with the appropriate information and tools, you will be able to figure out how your product or service can be of value and high quality.</p>
<p>You'll also be able to make sure that your product or service helps solve a problem for your customers. </p>
<p>This is especially important in the product development phases since it cuts down on expenses and saves time.</p>
<h3 id="heading-data-analysis-measures-success-and-performance">Data Analysis Measures Success and Performance <a></a></h3>
<p>By analyzing data, you can measure how well your product/service performs in the market compared to others.</p>
<p>You are able to identify the stronger areas that have seen the most success and desired results. And you will be able to identify weaker areas that are facing problems.</p>
<p>Additionally, you can predict what areas could possibly face problems before the problem actually occurs. This way, you can take action and prevent the problem from happening.</p>
<p>Analyzing data will give you a better idea of what you should focus more on and what you should focus less on going forward.</p>
<p>By creating performance maps, you can then go on to set goals and identify potential opportunities.</p>
<h3 id="heading-data-analysis-can-aid-problem-solving">Data Analysis Can Aid Problem Solving <a></a></h3>
<p>By performing data analysis on relevant, correct, and accurate data, you will have a better understanding of the right choices you need to make and how to make more informed and wiser decisions. </p>
<p>Data analysis means having better insights, which helps improve decision-making and leads to solving problems.</p>
<p>All the above will help a business grow.</p>
<p>Not analyzing data, or having insufficient data, could be one of the reasons why your business is not growing. </p>
<p>If that is the case, performing data analysis will help you come up with a more effective strategy for the future. </p>
<p>And if your business is growing, analyzing data will help it grow even further. </p>
<p>It will help reach its full potential and meet different goals – such as boosting customer retention, finding new customers, or providing a smoother and more pleasant customer experience.</p>
<h2 id="heading-an-overview-of-the-data-analysis-process">An Overview Of The Data Analysis Process <a></a></h2>
<h3 id="heading-step-1-recognising-and-identifying-the-questions-that-need-answering">Step 1: Recognising and Identifying The Questions That Need Answering <a></a></h3>
<p>The first step in the data analysis process is setting a clear objective.</p>
<p>Before setting out to gather a large amount of data, it is important to think of why you are actually performing the data analysis in the first place.</p>
<p>What problem are you trying to solve? </p>
<p>What is the purpose of this data analysis?</p>
<p>What are you trying to do?</p>
<p>What do you want to achieve? </p>
<p>What is the end goal? </p>
<p>What do you want to gain from the analysis? </p>
<p>Why do you even need data analysis?</p>
<p>At this stage, it is paramount to have an insight and understanding of your business goals.</p>
<p>Start by defining the right questions you want to answer and the immediate and long-term business goals. </p>
<p>Identify what is needed for the analysis, what kind of data you would need, what data you want to track and measure, and think of a specific problem you want to solve.</p>
<h3 id="heading-step-2-collecting-raw-data">Step 2: Collecting  Raw Data <a></a></h3>
<p>The next step is to identify what type of data you want to collect – whether it will be qualitative (non-numerical, descriptive ) or quantitative (numerical).</p>
<p>The way you go about collecting the data and the sources you gather from will depend on whether it is qualitative or quantitative.</p>
<p>Some of the ways you could collect relevant and suitable data are:</p>
<ul>
<li>By viewing the results of user groups, surveys, forms, questionnaires, internal documents, and interviews that have already been conducted in the business.</li>
<li>By viewing customer reviews and feedback on customer satisfaction.</li>
<li>By viewing transactions and purchase history records, as well as sales and financial figure reports created by the finance or marketing department of the business.</li>
<li>By using a customer relationship management system (CRM) in the company.</li>
<li>By monitoring website and social media activity and monthly visitors.</li>
<li>By monitoring social media engagement.</li>
<li>By tracking commonly searched keywords and search queries.</li>
<li>By checking which ads are regularly clicked on.</li>
<li>By checking customer conversion rates.</li>
<li>By checking email open rates.</li>
<li>By comparing the company’s data to competitors using third-party services.</li>
<li>By querying a database.</li>
<li>By gathering data through open data sets using web scraping. <a target="_blank" href="https://www.freecodecamp.org/news/how-to-scrape-websites-with-python-2/">Web scraping</a> is the act of extracting and collecting data and content from websites.</li>
</ul>
<h3 id="heading-step-3-cleaning-the-data">Step 3: Cleaning The Data <a></a></h3>
<p>Once you have gathered the data from multiple sources, it is important to understand the structure of that data.</p>
<p>It is also important to check if you have gathered all the data you needed and if any crucial data is missing.</p>
<p>If you used multiple sources for the data collection, your data will likely be unstructured. </p>
<p>Raw, unstructured data is not usable. Not all data is necessarily good data.</p>
<p>Cleaning data is the most important part of the data analysis process and one on which data analysts spend most of their time.</p>
<p>Data needs to be cleaned, which means correcting errors, polishing, and sorting through the data. </p>
<p>This could include:</p>
<ul>
<li>Looking for <a target="_blank" href="https://www.freecodecamp.org/news/what-is-an-outlier-definition-and-how-to-find-outliers-in-statistics/">outliers</a> (values that are unusually big or small).</li>
<li>Fixing typos.</li>
<li>Removing errors.</li>
<li>Removing duplicate data.</li>
<li>Managing inconsistencies in the format.</li>
<li>Checking for missing values or correcting incorrect data.</li>
<li>Checking for inconsistencies</li>
<li>Getting rid of irrelevant data and data that is not useful or needed for the analysis.</li>
</ul>
<p>This step will ensure that you are focusing on and analyzing the correct and appropriate data and that your data is high-quality.</p>
<p>If you analyze irrelevant or incorrect data, it will affect the results of your analysis and have a negative impact overall.</p>
<p>So, the accuracy of your end analysis will depend on this step.</p>
<h3 id="heading-step-4-analyzing-the-data">Step 4: Analyzing The Data <a></a></h3>
<p>The next step is to analyze the data based on the questions and objectives from step 1.</p>
<p>There are four different data analysis techniques used, and they depend on the goals and aims of the business:</p>
<ul>
<li><strong>Descriptive Analysis</strong>: This step is the initial and fundamental step in the analysis process. It provides a summary of the collected data and aims to answer the question: “<strong>What</strong> happened?”. It goes over the key points in the data and emphasizes what has already taken place.</li>
<li><strong>Diagnostic Analysis</strong>: This step is about using the collected data and trying to understand the cause behind the issue at hand and identify patterns. It aims to answer the question: “<strong>Why</strong> has this happened?”.</li>
<li><strong>Predictive Analysis</strong>: This step is about detecting and predicting future trends and is important for the future growth of the business. It aims to answer the question: “<strong>What is likely to happen</strong> in the future?</li>
<li><strong>Prescriptive Analysis:</strong> This step is about gathering all the insights from the three previous steps, making recommendations for the future, and creating an actionable plan. It aims to answer the question: “<strong>What needs to be done?</strong>”</li>
</ul>
<h3 id="heading-step-5-sharing-the-results">Step 5: Sharing The Results <a></a></h3>
<p>The last step is to interpret your findings. </p>
<p>This is usually done by creating reports, charts, graphs, or interactive dashboards using data visualization tools.</p>
<p>All the above will help support the presentation of your findings and the results of your analysis to stakeholders, business executives, and decision-makers.</p>
<p>Data analysts are storytellers, which means having strong communication skills is important.</p>
<p>They need to showcase the findings and present the results in a clear, concise, and straightforward way by taking the data and creating a narrative. </p>
<p>This step will influence decision-making and the future steps of the business.</p>
<h2 id="heading-what-skills-are-required-for-data-analysis">What Skills Are Required For Data Analysis? <a></a></h2>
<h3 id="heading-a-good-grasp-of-maths-and-statistics">A Good Grasp Of Maths And Statistics <a></a></h3>
<p>The amount of maths you will use as a data analyst will vary depending on the job. Some jobs may require working with maths more than others. </p>
<p>You don’t necessarily need to be a math wizard, but with that said, having at least a fundamental understanding of math basics can be of great help.</p>
<p>Here are some math courses to get you started:</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-algebra-to-improve-your-programming-skills/">College Algebra – Learn College Math Prerequisites with this Free 7-Hour Course</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/precalculus-learn-college-math-prerequisites-with-this-free-5-hour-course/">Precalculus – Learn College Math Prerequisites with this Free 5-Hour Course</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/maths-for-programmers/">Math for Programmers Course</a></li>
</ul>
<p>Data analysts need to have good knowledge of statistics and probability for gathering and analyzing data, figuring out patterns, and drawing conclusions from the data. </p>
<p>To get started, take an intro to statistics course, and then you can move on to more advanced topics:</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/free-statistics-course/">Learn College-level Statistics in this free 8-hour course</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/if-you-want-to-learn-data-science-take-a-few-of-these-statistics-classes-9bbabab098b9#.esdiw8wnk">If you want to learn Data Science, take a few of these statistics classes</a></li>
</ul>
<h3 id="heading-knowledge-of-sql-and-relational-databases">Knowledge of SQL and Relational Databases <a></a></h3>
<p>Data analysts need to know how to interact with relational databases to extract data.</p>
<p>A database is an electronic storage localization for data. Data can be easily retrieved and searched through.</p>
<p>A relational database is structured in format and all data items stored have pre-defined relationships with each other.</p>
<p>SQL stands for <strong>S</strong>tructured <strong>Q</strong>uery <strong>L</strong>anguage and is the language used for querying and interacting with relational databases.</p>
<p>By writing SQL queries you can perform CRUD (Create, Read, Update, and Delete) operations on data.</p>
<p>To learn SQL, check out the following resources:</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-sql-in-10-minutes/">SQL Commands Cheat Sheet – How to Learn SQL in 10 Minutes</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-sql-free-relational-database-courses-for-beginners/">Learn SQL – Free Relational Database Courses for Beginners</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/learn/relational-database/">Relational Database Certification</a></li>
</ul>
<h3 id="heading-knowledge-of-a-programming-language">Knowledge Of A Programming Language <a></a></h3>
<p>To further organize and manipulate databases, data analysts benefit from knowing a programming language.</p>
<p>Two of the most popular ones used in the data analysis field are Python and R.</p>
<p>Python is a general-purpose programming language, and it is very beginner-friendly thanks to its syntax that resembles the English language. It is also one of the most used technical tools for data analysis.</p>
<p>Python offers a wealth of packages and libraries for data manipulation, such as Pandas and NumPy, as well as for data visualization, such as Matplotlib.</p>
<p>To get started, <a target="_blank" href="https://www.freecodecamp.org/news/how-to-learn-python/">first see how to go about learning Python as a complete beginner</a>.</p>
<p>Once you understand the fundamentals, you can move on to learning about Pandas, NumPy, and Matplotlib.</p>
<p>Here are some resources to get you started:</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/python-pandas-functions/">How to Get Started with Pandas in Python – a Beginner's Guide</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/the-ultimate-guide-to-the-pandas-library-for-data-science-in-python/">The Ultimate Guide to the Pandas Library for Data Science in Python</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/the-ultimate-guide-to-the-numpy-scientific-computing-library-for-python/">The Ultimate Guide to the NumPy Package for Scientific Computing in Python</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/numpy-python-tutorial/">Learn NumPy and start doing scientific computing in Python</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/how-to-analyze-data-with-python-pandas/">How to Analyze Data with Python, Pandas &amp; Numpy - 10 Hour Course</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/matplotlib-course-learn-python-data-visualization/">Matplotlib Course – Learn Python Data Visualization</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/python-data-science-course-matplotlib-pandas-numpy/">Python Data Science – A Free 12-Hour Course for Beginners. Learn Pandas, NumPy, Matplotlib, and More.</a></li>
</ul>
<p>R is a language used for statistical analysis and data analysis. That said, it is not as beginner-friendly as Python.</p>
<p>To get started learning it, check out the following courses:</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/r-programming-language-explained/">R Programming Language Explained</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/r-programming-course/">Learn R programming language basics in just 2 hours with this free course on statistical programming</a></li>
</ul>
<h3 id="heading-knowledge-of-data-visualization-tools">Knowledge of data visualization tools <a></a></h3>
<p>Data visualization is the graphical interpretation and presentation of data. </p>
<p>This includes creating graphs, charts, interactive dashboards, or maps that can be easily shared with other team members and important stakeholders.</p>
<p>Data visualization tools are essentially used to tell a story with data and drive decision-making.</p>
<p>One of the most popular data visualization tools used is Tableau.</p>
<p>To learn Tableau, check out the following course:</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/tableau-for-data-science-and-data-visualization-crash-course/">Tableau for Data Science and Data Visualization - Crash Course</a></li>
</ul>
<h3 id="heading-knowledge-of-excel">Knowledge of Excel <a></a></h3>
<p>Excel is one of the most essential tools used in Data analysis.</p>
<p>It is used for storing, structuring, and formatting data, performing calculations, summarizing data and identifying trends, sorting data into categories, and creating reports. </p>
<p>You can also use Excel to create charts and graphs.</p>
<p>To learn how to use Excel, check out the following courses:</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-microsoft-excel/">Learn Microsoft Excel - Full Video Course</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/excel-classes-online-free-excel-training-courses/">Excel Classes Online – 11 Free Excel Training Courses</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/data-analysis-with-python-for-excel-users-course/">Data Analysis with Python for Excel Users Course</a></li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>This marks the end of the article – thank you so much for making it to the end!</p>
<p>Hopefully this guide was helpful, and it gave you some insight into what data analysis is, why it is important, and what skills you need to enter the field.</p>
<p>Thank you for reading!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Use Object Storage for Data Parallelization and Experimentation ]]>
                </title>
                <description>
                    <![CDATA[ By using big data, companies can learn a lot about how their businesses are performing. Analytics on sales, churn rates, and other basic metrics are available in almost real time as data comes in.  Then there are more complex analyses that you'll nee... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-use-object-storage-for-parallelization-and-experimentation/</link>
                <guid isPermaLink="false">66bb5252b0a396d22e4116fa</guid>
                
                    <category>
                        <![CDATA[ big data ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analysis ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ storage ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ry Vee ]]>
                </dc:creator>
                <pubDate>Mon, 27 Sep 2021 14:09:57 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2021/09/article-cover-pic.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By using big data, companies can learn a lot about how their businesses are performing. Analytics on sales, churn rates, and other basic metrics are available in almost real time as data comes in. </p>
<p>Then there are more complex analyses that you'll need to do. At times relationships between two seemingly unrelated data sets can provide surprising insights and unveil important opportunities for the organization.</p>
<p>Data scientists and engineers are continuing to improve how they break down and work on data. Experimentation entails discovering the right correlations among data points. </p>
<p>This means they also need to do some sort of parallelization of such data and resulting models. Parallelization simply means that the same data set is being operated upon in many different ways without damaging the integrity of the original data.</p>
<p>In this article we are going to talk about how you can make sure you're doing such experimentation and parallel processing efficiently and that it provides the maximum insights. We will be tackling different concepts related to data storage and data versioning.</p>
<h1 id="heading-block-storage-vs-object-storage">Block Storage vs Object Storage</h1>
<p>For the uninitiated, we first must understand the difference between block and object storage and why the latter is the better option when dealing with data experimentation.</p>
<p><img src="https://lh4.googleusercontent.com/p8F4n7jqjmQtqquQasDGPEj1eRdxhNIsdMFxX9gIM03w6r6u-VRzU6rn2gMqdF1U3lrGOrjWEPwlBFzR-0cYVHWBWF7tigFiS4m_EtYjw0bU4tPATeWsZNYTFwpZTbyLBAzxqmbX=s0" alt="Image" width="1000" height="500" loading="lazy">
_<a target="_blank" href="https://res.cloudinary.com/practicaldev/image/fetch/s--PYImgKrK--/c_imagga_scale,f_auto,fl_progressive,h_500,q_auto,w_1000/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4519hl0nf6aze73pyvsr.png">Image source</a>_</p>
<h2 id="heading-what-is-block-storage">What is Block Storage?</h2>
<p>It is called “block storage” (also known as <a target="_blank" href="https://www.snia.org/education/storage_networking_primer/san/what_san">SAN</a>) because each dataset (in the form of files) is grouped into blocks stored in disks. </p>
<p>A classic example of block storage is the file system on your personal computer. For enterprise-level use-cases, it is scaled through a network of hard drives connected through fiber optic cables. </p>
<p>There are a few disadvantages to using block storage. First, if a sector (or a block) becomes corrupted, it can damage the files. Another problem is the lack of scalability (expanding the network of fiber optic cables is costly).</p>
<h2 id="heading-what-is-object-storage">What is Object Storage?</h2>
<p>In object storage, data is stored as objects. Each object contains the actual data, called the blob, a unique identifier (UUID), and metadata, which contains information about the object (such as timestamp, version, and author).</p>
<p>Object storage makes it cost-effective to scale your data store—you don’t need complex hardware for this. It also makes data retrieval faster as each object can be retrieved through its UUID. </p>
<p>This is in contrast to block storage, where each data location needs to be identified before the actual information can be retrieved.</p>
<p>One disadvantage of using object storage is that data can only be written once and cannot be updated. But this isn’t really a disadvantage as we will see further on in this article.</p>
<h2 id="heading-what-problems-does-object-storage-solve">What Problems Does Object Storage Solve?</h2>
<p>As we have already seen, data retrieval can be incredibly fast with object storage (no matter the size of the data store). But when it comes to data experimentation and data parallelization, object storage shines the brightest.</p>
<p>As mentioned before, you can't overwrite any data already stored as an object. This ensures object storage is protected from unwanted (or unauthorized) data destruction or updating. That’s great to know if you do a lot of data processing where accidental corruption of information could happen.</p>
<p>One other problem that object storage can solve is that it doesn’t require data to be structured. As companies produce and consume tremendous amounts of information every moment, often non-structured data (such as PDFs, videos, images) are not so easily processed into useful forms (such as for analytics or dashboards). </p>
<p>With object storage, this is now possible. You can now use non-structured data to develop machine learning models.</p>
<p>With data storage, it’s possible to have different versions of the same blob (with different metadata). As there is Git for code version control, we can have similar ways of managing different versions of the same data.</p>
<p>This brings us to the concept of data lakes.</p>
<h2 id="heading-what-are-data-lakes">What are Data Lakes?</h2>
<p>Data lakes are central repositories of data that don’t care which format such data is in. </p>
<p>Companies produce and consume tremendous amounts of data. Such data traditionally sits in silos because they belong to different departments or are in different forms (for example, videos aren’t stored in the same directory as the data in the MySQL database). </p>
<p>With data lakes, any department in the enterprise can store information without the need to pre-process it. Likewise, any data can be retrieved and analyzed by anybody from any department.</p>
<p>Data lakes are important because they make data analytics extremely fast and convenient.</p>
<h2 id="heading-how-data-experimentation-and-parallelization-work-with-object-storage">How Data Experimentation and Parallelization Work with Object Storage</h2>
<p>As with developing software, working with data requires us to utilize tools that can aid us in our workflow. A powerful open source tool for experimenting with data and performing parallelization (that is working on the same data to create different sets of machine learning models) is LakeFS.</p>
<p>LakeFS is an open source platform that provides Git-like capabilities when working with data. This means you can create branches (allowing you to experiment with data) and commit versions of data (and data models).</p>
<h3 id="heading-why-is-this-git-like-feature-important">Why is this Git-like feature important?</h3>
<p>First, you need to make sure that your data lake is <a target="_blank" href="https://mariadb.com/resources/blog/acid-compliance-what-it-means-and-why-you-should-care/">ACID</a> compliant. This means that your data changes can happen in isolation (in branches). Thus, the integrity of the data is maintained in the master branch (until such changes are ready to be merged).</p>
<p>Another important feature of LakeFS is continuous integration of data (again, much like in software development). Enterprises need to incorporate new data quickly and without being disrupted. Therefore, this ability to have a <a target="_blank" href="https://www.infoworld.com/article/3271126/what-is-cicd-continuous-integration-and-continuous-delivery-explained.html">CI/CD</a> workflow is invaluable. </p>
<p>So, let’s see how we can get started with using LakeFS with our object storage experimentation and parallelization.</p>
<h3 id="heading-how-to-install-lakefs">How to Install LakeFS</h3>
<p>Locally you can install LakeFS by running the following command in your terminal:*  </p>
<p><img src="https://lh4.googleusercontent.com/pTYRbQlB2_Mp8j_XGxUOvBI0PLf5kuuT1tYV5AxcPmrnq8K5sjLCUBwQqp4klk4rnraQnK9OD5hrudEFUwBLNcvmyNGQqDPkLQ_DkVBoVgCUfITIFdS6d1RxtkTFG_T40ZV0ia0L=s0" alt="Image" width="501" height="111" loading="lazy">
<em><a target="_blank" href="https://carbon.now.sh/?bg=rgba%28171%2C+184%2C+195%2C+1%29&amp;t=seti&amp;wt=none&amp;l=application%2Fx-sh&amp;ds=true&amp;dsyoff=20px&amp;dsblur=68px&amp;wc=true&amp;wa=true&amp;pv=56px&amp;ph=56px&amp;ln=false&amp;fl=1&amp;fm=Hack&amp;fs=14px&amp;lh=133%25&amp;si=false&amp;es=2x&amp;wm=false&amp;code=curl%2520https%253A%252F%252Fcompose.lakefs.io%2520%257C%2520docker-compose%2520-f%2520-%2520up%250A">Code source</a></em></p>
<p>_*This is assuming you have Docker and Docker-Compose installed in your system. If you don’t have Docker and Docker-Compose, you may try other installation methods <a target="_blank" href="https://docs.lakefs.io/quickstart/more_quickstart_options.html">here</a>._</p>
<p>Now visit <a target="_blank" href="http://127.0.0.1:8000/setup">http://127.0.0.1:8000/setup</a> in your browser to verify you have installed it correctly.</p>
<h3 id="heading-how-to-create-a-repository-in-lakefs">How to Create a Repository in LakeFS</h3>
<p>Once you’ve verified that LakeFS is installed correctly, go ahead and create an admin user.</p>
<p><img src="https://lh5.googleusercontent.com/kRpsNjJe60f7fiIEFC0O5ZbY88F9g-F4X-GRtl8L8WiVJ_sDiKcnz-0jmprZc-bVkfq029fYhq4K-jdBXyBQttc012Nv4v6j2vbJvk4jnbs71BF9Wulo_5JwsvmSjRE1nkQ-ltRe=s0" alt="Image" width="1627" height="923" loading="lazy">
<em><a target="_blank" href="https://docs.lakefs.io/assets/img/setup.png">Image source</a></em></p>
<p><img src="https://lh3.googleusercontent.com/oez-1Q1JH6Q_cqUh0tKE1bW-IbEXg92UP4NVkTy_o-vVETELASw8R8CoPS5ogWDZNl4hH8W3cb68_PvEECO1os9U1sgfJFA2PMnc1J57wEjomp9SrN0ZZK-OXoOjJpZcF-LPZlhu=s0" alt="Image" width="1627" height="921" loading="lazy">
_<a target="_blank" href="https://docs.lakefs.io/assets/img/setup_done.png">Image source</a>_</p>
<p>Click on the login link and log in as an administrator. </p>
<p>On the page to which you get redirected, click on Create Repository. A popup will appear:</p>
<p><img src="https://lh6.googleusercontent.com/2abxJeRjLk7IRzhohW7jlG3cKQKH4kRCjIyVbQkHe_Fa9qdcGPdrbcTsFRhW7lv3S5LQtfa4xBmnNu0wRqhFSvwi1hp5_ARB_fRJlcLgz1TmDa_a9DQ-apmcIiclMLwsgfuyoD9P=s0" alt="Image" width="1622" height="907" loading="lazy">
_<a target="_blank" href="https://docs.lakefs.io/assets/img/create_repo_local.png">Image source</a>_</p>
<p>Congratulations! You now have your first repository. This is the main “bucket” in which you are going to store your data. </p>
<p>Next, we’ll start adding some data.</p>
<h3 id="heading-how-to-add-data-to-your-lakefs-repository">How to Add Data to your LakeFS Repository</h3>
<p>Visit <a target="_blank" href="https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html">here</a> to install AWS CLI.</p>
<p>With the credentials created during the admin-user creation phase, configure a new connection profile:</p>
<p><img src="https://lh5.googleusercontent.com/D9FDuc11VgqsUr5LfN2UE_zTQYSKinNHB_saQxvr0MJj2yurnDCTqEC0cWA-dvOj3TYGMxJq52Una4zpaG6hrImrAaOWA43V1nMsUg0NpI9XIj8lKF6THD3ZoC0BNMqd-uRUsS6p=s0" alt="Image" width="645" height="204" loading="lazy">
<em><a target="_blank" href="https://carbon.now.sh/?bg=rgba%28171%2C+184%2C+195%2C+1%29&amp;t=seti&amp;wt=none&amp;l=application%2Fx-sh&amp;ds=true&amp;dsyoff=20px&amp;dsblur=68px&amp;wc=true&amp;wa=true&amp;pv=56px&amp;ph=56px&amp;ln=false&amp;fl=1&amp;fm=Hack&amp;fs=14px&amp;lh=133%25&amp;si=false&amp;es=2x&amp;wm=false&amp;code=aws%2520configure%2520--profile%2520local%250A%2523%2520output%253A%250A%2523%2520AWS%2520Access%2520Key%2520ID%2520%255BNone%255D%253A%2520AKIAJVHTOKZWGCD2QQYQ%250A%2523%2520AWS%2520Secret%2520Access%2520Key%2520%255BNone%255D%253A%2520****************************************%250A%2523%2520Default%2520region%2520name%2520%255BNone%255D%253A%250A%2523%2520Default%2520output%2520format%2520%255BNone%255D%253A%250A">Code source</a></em></p>
<p>To test if the connection is working, run the following:</p>
<p><img src="https://lh5.googleusercontent.com/oP-iisEz7w9qQM-zQaAUcdhXj_YMRGamhV-AwwNfFsDVm_p4HcKlGsw0sVD0aJS-Q-3rCy3VlhtcvtBxJgFCrHQLXrPB7ZyHVril1iGeWKP_mqPPrxizpw8NNAGWdNc2ZF36mfX4=s0" alt="Image" width="560" height="148" loading="lazy">
<em><a target="_blank" href="https://carbon.now.sh/?bg=rgba%28171%2C+184%2C+195%2C+1%29&amp;t=seti&amp;wt=none&amp;l=application%2Fx-sh&amp;ds=true&amp;dsyoff=20px&amp;dsblur=68px&amp;wc=true&amp;wa=true&amp;pv=56px&amp;ph=56px&amp;ln=false&amp;fl=1&amp;fm=Hack&amp;fs=14px&amp;lh=133%25&amp;si=false&amp;es=2x&amp;wm=false&amp;code=aws%2520--endpoint-url%253Dhttp%253A%252F%252Flocalhost%253A8000%2520--profile%2520local%2520s3%2520ls%250A%2523%2520output%253A%250A%2523%25202021-06-15%252013%253A43%253A03%2520example-repo%250A">Code source</a></em></p>
<p>Now, to copy files into the main branch:</p>
<p><img src="https://lh5.googleusercontent.com/Z_3sbfX6IMJzPkYeejJ1O9ftjkO3c4kPk_rlCJ1iOP2FgTnJTZ03cB8C8Ml2u4bet4cvBS60rHt7Ns-xgLWix422-w3ZvpGQCyeGKgBDd0Oog-sV-E4XSpV4ARpoYeQhR2INZV_H=s0" alt="Image" width="847" height="148" loading="lazy">
<em><a target="_blank" href="https://carbon.now.sh/?bg=rgba%28171%2C+184%2C+195%2C+1%29&amp;t=seti&amp;wt=none&amp;l=application%2Fx-sh&amp;ds=true&amp;dsyoff=20px&amp;dsblur=68px&amp;wc=true&amp;wa=true&amp;pv=56px&amp;ph=56px&amp;ln=false&amp;fl=1&amp;fm=Hack&amp;fs=14px&amp;lh=133%25&amp;si=false&amp;es=2x&amp;wm=false&amp;code=aws%2520--endpoint-url%253Dhttp%253A%252F%252Flocalhost%253A8000%2520--profile%2520local%2520s3%2520cp%2520.%252Ffoo.txt%2520s3%253A%252F%252Fexample-repo%252Fmain%252F%250A%2523%2520output%253A%250A%2523%2520upload%253A%2520.%252Ffoo.txt%2520to%2520s3%253A%252F%252Fexample-repo%252Fmain%252Ffoo.txt%250A">Code source</a></em></p>
<p>Just note that we need to prefix the path with the name of the branch we want to use.</p>
<p>Now, we will see the file we’ve added in the UI:</p>
<p><img src="https://lh6.googleusercontent.com/F8UCd8s43wM0y4WgRhHWy04p2rzBQ1ccvUZhppCzl30fE0FJEpMQb7Y1X06x-WDx3J9I5LELQv4FtFKOYWJqU2E9dENB5MMqjsv-MYfLI-oCEXLekhWH9xTcazm1-_Fmo4NxgDb_=s0" alt="Image" width="1627" height="934" loading="lazy">
_<a target="_blank" href="https://docs.lakefs.io/assets/img/object_added.png">Image source</a>_</p>
<p>Next, we will need to know how to commit and create branches. To do that, we will need to install the LakeFS CLI.</p>
<h3 id="heading-how-to-install-the-lakefs-cli">How to Install the LakeFS CLI</h3>
<p>You need to first download the binary file <a target="_blank" href="https://docs.lakefs.io/#downloads">here</a>. </p>
<p>Again, we need to use the earlier created admin credentials:</p>
<p><img src="https://lh4.googleusercontent.com/KQntIwi6YaOyp2kKvKLxeYs4Il4czCGCv8fj2_PFhg2Bqy2RRGNNQtLsCxS8YT57DEH-Q63obz7emujS5tST4aoPx0qb4XLjJV3AeKEwRwQGATfJd6us3BA5Svo7Lz_i3k_Smy7N=s0" alt="Image" width="552" height="204" loading="lazy">
<em><a target="_blank" href="https://carbon.now.sh/?bg=rgba%28171%2C+184%2C+195%2C+1%29&amp;t=seti&amp;wt=none&amp;l=application%2Fx-sh&amp;ds=true&amp;dsyoff=20px&amp;dsblur=68px&amp;wc=true&amp;wa=true&amp;pv=56px&amp;ph=56px&amp;ln=false&amp;fl=1&amp;fm=Hack&amp;fs=14px&amp;lh=133%25&amp;si=false&amp;es=2x&amp;wm=false&amp;code=lakectl%2520config%250A%2523%2520output%253A%250A%2523%2520Config%2520file%2520%252Fhome%252Fjanedoe%252F.lakectl.yaml%2520will%2520be%2520used%250A%2523%2520Access%2520key%2520ID%253A%2520AKIAJVHTOKZWGCD2QQYQ%250A%2523%2520Secret%2520access%2520key%253A%2520****************************************%250A%2523%2520Server%2520endpoint%2520URL%253A%2520http%253A%252F%252Flocalhost%253A8000%252Fapi%252Fv1%250A">Code source</a></em></p>
<p>Here are some of the commands we can run to try it out:</p>
<p><img src="https://lh4.googleusercontent.com/4HuuBJwfpif6TzMS5spkzhkLQf_TC-rZ6WMjAiOOrsv3z8iF2vaTtKTjzicnm5qDjXmLq_aSGqXvAF7RE43BWd9hGB7gUSb76w1bt6ntyLJgAVFBMLwP7uYRPLFUd-1G27kVER7O=s0" alt="Image" width="721" height="521" loading="lazy">
<em><a target="_blank" href="https://carbon.now.sh/?bg=rgba%28171%2C+184%2C+195%2C+1%29&amp;t=seti&amp;wt=none&amp;l=application%2Fx-sh&amp;ds=true&amp;dsyoff=20px&amp;dsblur=68px&amp;wc=true&amp;wa=true&amp;pv=56px&amp;ph=56px&amp;ln=false&amp;fl=1&amp;fm=Hack&amp;fs=14px&amp;lh=133%25&amp;si=false&amp;es=2x&amp;wm=false&amp;code=lakectl%2520branch%2520list%2520lakefs%253A%252F%252Fexample-repo%250A%2523%2520output%253A%250A%2523%2520%252B----------%252B------------------------------------------------------------------%252B%250A%2523%2520%257C%2520REF%2520NAME%2520%257C%2520COMMIT%2520ID%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%2520%257C%250A%2523%2520%252B----------%252B------------------------------------------------------------------%252B%250A%2523%2520%257C%2520main%2520%2520%2520%2520%2520%257C%2520a91f56a7e11be1348fc405053e5234e4af7d6da01ed02f3d9a8ba7b1f71499c8%2520%257C%250A%2523%2520%252B----------%252B------------------------------------------------------------------%252B%250A%2520%2520%2520%2520%2520%250Alakectl%2520commit%2520lakefs%253A%252F%252Fexample-repo%252Fmain%2520-m%2520%27added%2520our%2520first%2520file%21%27%250A%2523%2520output%253A%250A%2523%2520Commit%2520for%2520branch%2520%2522main%2522%2520done.%250A%2523%2520%250A%2523%2520ID%253A%2520901f7b21e1508e761642b142aea0ccf28451675199655381f65101ea230ebb87%250A%2523%2520Timestamp%253A%25202021-06-15%252013%253A48%253A37%2520%252B0300%2520IDT%250A%2523%2520Parents%253A%2520a91f56a7e11be1348fc405053e5234e4af7d6da01ed02f3d9a8ba7b1f71499c8%250A%2520%2520%250Alakectl%2520log%2520lakefs%253A%252F%252Fexample-repo%252Fmain%250A%2523%2520output%253A%2520%2520%250A%2523%2520commit%2520901f7b21e1508e761642b142aea0ccf28451675199655381f65101ea230ebb87%250A%2523%2520Author%253A%2520Example%2520User%2520%253Cuser%2540example.com%253E%250A%2523%2520Date%253A%25202021-06-15%252013%253A48%253A37%2520%252B0300%2520IDT%250A%2520%2520%2520%2520%2520%2520%2520%250A%2520%2520%2520%2520%2520%2520added%2520our%2520first%2520file%21%250A%2520%2520%2520%2520%2520%2520%2520">Code source</a></em></p>
<p>You can find all the other commands, such as branch creation, and so on, <a target="_blank" href="https://docs.lakefs.io/reference/commands.html">online</a>.</p>
<p>There you have it! Now, you can work with your data any way you like. Experiment without guilt and create multiple versions of your data models.</p>
<h2 id="heading-in-closing">In Closing</h2>
<p>In this article, we covered a bit of ground. We learned the different kinds of data storage mechanisms and why object storage has a lot of edge when dealing with data experimentations and parallelism. </p>
<p>Next, we looked into data lakes and LakeFS, which is a powerful tool for working with data.</p>
<p>At first, it might seem a daunting task. But, as we’ve shown here, with the right set of tools and knowledge, there’s a lot you can accomplish.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Google BigQuery Beginner's Guide – How to Analyze Large Datasets ]]>
                </title>
                <description>
                    <![CDATA[ By Ambreen Khan Gone are the days of storing your data in a CSV file or an Excel spreadsheet. If you want to quickly analyze millions of data rows in seconds, BigQuery is the way to go. In this getting started guide, we'll learn about BigQuery and ho... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/google-bigquery-beginners-guide/</link>
                <guid isPermaLink="false">66d45d99aad1510d0766b5e3</guid>
                
                    <category>
                        <![CDATA[ bigquery ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analysis ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Google ]]>
                    </category>
                
                    <category>
                        <![CDATA[ google cloud ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Mon, 12 Jul 2021 11:26:39 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2021/07/web-1.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Ambreen Khan</p>
<p>Gone are the days of storing your data in a CSV file or an Excel spreadsheet. If you want to quickly analyze millions of data rows in seconds, BigQuery is the way to go.</p>
<p>In this getting started guide, we'll learn about BigQuery and how we can use it to query and analyze data.</p>
<h2 id="heading-what-is-bigquery">What is BigQuery?</h2>
<p>BigQuery is an enterprise data warehouse that many companies use who need a fully-managed cloud based solution for their massive datasets. </p>
<p>BigQuery's serverless architecture allows you to quickly execute standard SQL queries and analyze millions of data rows in seconds.  You can then store your data both in Google Cloud Storage in files and buckets or in BigQuery storage. </p>
<p>BigQuery also has excellent integrations with other GCP products, like Data Flow and Data Studio that makes it a great choice for data analytics tasks.</p>
<h2 id="heading-before-you-begin">Before You Begin:</h2>
<p>We are going to query tables in a public dataset that Google has provided to try out BigQuery using the Google Cloud Platform. Therefore, this guide assumes that:</p>
<ul>
<li>You have an access on <a target="_blank" href="https://cloud.google.com/free/?gclid=CjwKCAjw55-HBhAHEiwARMCsziVtllCq8mRIWlXVVztmn6HkzAlkuajtZeYMInLQmykNGfbEjz2tfRoCFs0QAvD_BwE&amp;gclsrc=aw.ds">Google Cloud Platform</a>.</li>
<li>You have already created a <a target="_blank" href="https://cloud.google.com/bigquery/docs/quickstarts/quickstart-web-ui#before-you-begin">Google Cloud project</a>.</li>
<li>Google sandbox environment is up and running. </li>
</ul>
<h2 id="heading-how-to-access-a-public-dataset">How to Access a Public Dataset</h2>
<p>A public dataset is available to the general public through the <a target="_blank" href="https://cloud.google.com/public-datasets">Google Cloud Public Dataset Program</a>. We'll use a Hacker News dataset that contains all stories and comments from Hacker News from its launch in 2006 to present. Let's get started.</p>
<p>Navigate to <a target="_blank" href="https://console.cloud.google.com/marketplace/product/y-combinator/hacker-news">Hacker News dataset</a> and click the VIEW DATASET button. It will take you to the Google Cloud Platform login screen. Login to the account and it will open the BigQuery Editor window with the dataset. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/07/image-51.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-how-the-bigquery-interface-is-organized">How the BigQuery Interface Is Organized</h2>
<p>BigQuery is structured as a hierarchy with 4 levels:</p>
<ul>
<li>Projects: Top-level containers that store the data</li>
<li>Datasets: Within projects, datasets allow you to organize your data and hold one or more tables of data</li>
<li>Tables: Within datasets, tables hold actual data.</li>
<li>Jobs: task performed on data such as running queries, loading data, and exporting data.</li>
</ul>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/07/image-53.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><strong>Note:</strong> Please note that while working with tables, you'll also notice that:</p>
<ul>
<li>Tables are broken out by day meaning that you will need to use a wildcard, or * to pull a larger date range.</li>
<li>There is also an “intraday” table that will give you data for the last 24 hours.</li>
</ul>
<h2 id="heading-how-to-check-the-table-schema">How to Check the Table Schema</h2>
<p>Click on the table name. This will allow you to see what columns are in the table, as well as some buttons to perform various operations on the table.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/07/image-55.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-how-to-preview-the-data">How to Preview the Data</h2>
<p>Use the preview button to get a sample of some rows in the table. <a target="_blank" href="https://cloud.google.com/bigquery/docs/best-practices-costs#avoid_select_">Don’t do a <code>SELECT *</code> in BigQuery</a>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/07/image-56.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-how-to-query-big-data">How to Query Big Data</h2>
<p>SQL statements are used to perform various database tasks, such as querying data, creating tables, and updating databases.</p>
<h3 id="heading-basic-queries">Basic Queries</h3>
<p>Basic queries contain the following components:</p>
<ul>
<li><code>SELECT</code> (required): identifies the columns to be included in the query</li>
<li><code>FROM</code> (required): the table that contains the columns in the SELECT statement</li>
<li><code>WHERE</code>: a condition for filtering records</li>
<li><code>ORDER BY</code>: Used to sort the result-set in ascending or descending order.</li>
<li><code>GROUP BY</code>: how to aggregate data in the result set</li>
</ul>
<h2 id="heading-how-to-compose-a-query-in-bigquery">How to Compose a Query in BigQuery</h2>
<p>For our first query, let’s find out what are the top 5 domains shared in Hacker News in year 2021 so far (query executed on July 9th 2021).</p>
<p>Click the <strong>Compose New query</strong> button. It will open the editor tab.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/07/image-41.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Write your first query as below:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> REGEXP_EXTRACT(<span class="hljs-keyword">url</span>, <span class="hljs-string">'//([^/]*)/?'</span>) <span class="hljs-keyword">domain</span>, <span class="hljs-keyword">COUNT</span>(*) total
<span class="hljs-keyword">FROM</span> <span class="hljs-string">`bigquery-public-data.hacker_news.full`</span>
<span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">url</span>!=<span class="hljs-string">''</span> <span class="hljs-keyword">AND</span> <span class="hljs-keyword">EXTRACT</span>(<span class="hljs-keyword">YEAR</span> <span class="hljs-keyword">FROM</span> <span class="hljs-built_in">timestamp</span>)=<span class="hljs-number">2021</span>
<span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> <span class="hljs-keyword">domain</span> <span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> total <span class="hljs-keyword">DESC</span> <span class="hljs-keyword">LIMIT</span> <span class="hljs-number">5</span>
</code></pre>
<p>You'll notice that BigQuery debugs your code as you construct it. If the query is valid, then a check mark appears along with the amount of data that the query will process. This helps you determine the cost of running the query. </p>
<p>If the query is invalid, then an exclamation point appears along with an error message.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/07/image-59.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>To run this query, click on the Run button. In a few seconds, you should see results returned from the query:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/07/image-60.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>You can click on the <strong>JSON</strong> tab if you want the results in JSON format. You'll also find interesting details under the 'Execution details' column.</p>
<h2 id="heading-how-to-query-multiple-tables-using-a-wildcard-table"><strong>How to Query Multiple Tables Using a Wildcard Table</strong></h2>
<p>Wildcard tables enable you to query multiple tables using concise SQL statements. A wildcard table represents a union of all the tables that match the wildcard expression:</p>
<p><code>FROM</code>tablename.stories_*`` </p>
<h3 id="heading-tablesuffix-pseudo-column">_TABLE_SUFFIX Pseudo Column</h3>
<p>Queries with wildcard tables support the <code>_TABLE_SUFFIX</code> pseudo column in the <code>WHERE</code> clause. To restrict a query so that it scans only a specified set of tables, use the <code>_TABLE_SUFFIX</code> pseudo column in a <code>WHERE</code> clause with a condition that is a constant expression.</p>
<p>Using <code>_TABLE_SUFFIX</code> can greatly reduce the number of bytes scanned, which helps reduce the cost of running your queries.</p>
<h3 id="heading-how-to-get-data-by-providing-a-date-range">How to Get Data by Providing a Date Range</h3>
<pre><code>WHERE _TABLE_SUFFIX BETWEEN
    FORMAT_DATE(‘%Y%m%d’,DATE_SUB(CURRENT_DATE(), INTERVAL <span class="hljs-number">36</span> MONTH))
    AND
    FORMAT_DATE(‘%Y%m%d’,DATE_SUB(CURRENT_DATE(), INTERVAL <span class="hljs-number">1</span> DAY))
</code></pre><h3 id="heading-how-to-use-unnest-to-flatten-the-date">How to Use UNNEST to Flatten the Date</h3>
<p>To convert an <code>ARRAY</code> into a set of rows, also known as "flattening," use the <a target="_blank" href="https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#unnest_operator"><code>UNNEST</code></a> operator. <code>UNNEST</code> takes an <code>ARRAY</code> and returns a table with a single row for each element in the <code>ARRAY</code>:</p>
<pre><code>SELECT * FROM UNNEST ([<span class="hljs-string">'Ambreen'</span>, <span class="hljs-string">'Abdul'</span>, <span class="hljs-string">'Adam'</span>, <span class="hljs-string">'David'</span>]) AS names;
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2021/07/image-45.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-how-to-save-and-share-queries">How to Save and Share Queries</h2>
<p>You can save your queries for later use. There are 3 types of saved queries:</p>
<ul>
<li><strong>Private:</strong> Private saved queries are visible only to the user who creates them.</li>
<li><strong>Project-level:</strong> Project-level saved queries are visible to members of the predefined BigQuery IAM roles with the required <a target="_blank" href="https://cloud.google.com/bigquery/docs/saving-sharing-queries#permissions">permissions</a>.</li>
<li><strong>Public:</strong> Public saved queries are visible to anyone with a link to the query.</li>
</ul>
<h2 id="heading-summary">Summary</h2>
<p>BigQuery is much more sophisticated than what we explored in this simple tutorial. You can also export Firebase Analytics data to BigQuery, which will let you run sophisticated ad hoc queries against your analytics data. </p>
<p>And with BigQuery ML, you can create and execute machine learning models using standard SQL queries. </p>
<p>If you’re feeling excited and want to learn more about BigQuery, check out the links below.</p>
<h2 id="heading-resources">Resources:</h2>
<ul>
<li><a target="_blank" href="https://support.google.com/analytics/answer/4419694?hl=en#zippy=%2Cin-this-article">BigQuery cookbook</a> </li>
<li><a target="_blank" href="https://cloud.google.com/bigquery/docs/querying-wildcard-tables#filtering_selected_tables_using_table_suffix">Filtering selected tables using _TABLE_SUFFIX</a> </li>
<li><a target="_blank" href="https://firebase.googleblog.com/2017/03/bigquery-tip-unnest-function.html">BigQuery Tip: The UNNEST Function</a></li>
<li><a target="_blank" href="https://towardsdatascience.com/bigquery-unnest-how-to-work-with-nested-data-in-bigquery-f27006a64c3">BigQuery UNNEST: How to work with nested data in BigQuery</a></li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Why You Should Learn Data Analytics ]]>
                </title>
                <description>
                    <![CDATA[ Ok. I'm not suggesting that everyone needs to become a fully-credentialed data scientist. But these days, learning core data skills may provide as much value as basic financial literacy or knowledge of a simple coding language. This isn't just about ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/why-you-should-learn-data-analytics/</link>
                <guid isPermaLink="false">66b9967817d9592471979c33</guid>
                
                    <category>
                        <![CDATA[ data ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analysis ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ David Clinton ]]>
                </dc:creator>
                <pubDate>Mon, 28 Jun 2021 13:44:39 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2021/06/statistic-1820320_640.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Ok. I'm not suggesting that <em>everyone</em> needs to become a fully-credentialed data scientist. But these days, learning <em>core</em> data skills may provide as much value as basic financial literacy or knowledge of a simple coding language.</p>
<p>This isn't just about your career: data has grown far beyond the world of IT professionals. It's getting harder to imagine <em>any</em> significant life events that analytics can't help secure and enhance.</p>
<p>And the good news is that acquiring those basic data skills doesn't have to cost you any money or take much time. There's a lot of sophisticated number crunching going on using plain old spreadsheets, for instance.</p>
<p>But even getting up to speed with the far more versatile world of Python and Jupyter Notebooks can happen faster than you might think. </p>
<p><a target="_blank" href="https://www.youtube.com/watch?v=jcTj6FgWOpo">freeCodeCamp's YouTube channel is host to a quick and dirty course of my own</a> that'll get you started. The <a target="_blank" href="https://stories.thedataproject.net/">rest of the course curriculum that's available (for free) on my website</a> can finish the job.</p>
<p>But why should you bother?</p>
<h2 id="heading-who-needs-data-analytics">Who needs data analytics?</h2>
<p>Operations and business decision making of all kinds depend heavily on understanding data. Professionals in the security, administration, business intelligence, scientific research, insurance, and engineering fields would accomplish precious little without their data insights.</p>
<p>There's a constant torrent of data being automatically generated daily by billions of events executed through millions of devices. Whether the data comes from credit card transactions, environmental sensors installed in cars, or system activity on your laptop, there's no end to the stories it could be telling us.</p>
<p>It's the job of an analyst to capture and manipulate data until it yields useful interpretations. That might mean:</p>
<ul>
<li>Identifying and blocking suspicious behavior on distributed IT system infrastructure</li>
<li>Understanding and fixing performance choke points within complex multi-level business processes</li>
<li>Discovering drug interactions with biological pathogens</li>
<li>Detecting patterns and predicting events within financial markets</li>
<li>Integrating online data with real-world operations (through technologies like augmented reality)</li>
</ul>
<p>...Or countless other applications, most of which haven't yet been imagined.</p>
<h2 id="heading-how-will-data-fit-into-your-life">How will data fit into your life?</h2>
<p>Smart use of publicly-accessible data can help you with important decisions. Searching for the right career? Try crunching government data, like that of the US Bureau of Labor Statistics – something I partially demonstrate <a target="_blank" href="https://stories.thedataproject.net/docs/1-cpi_content/">here</a>.</p>
<p>Facing some salary negotiation with your boss? You'd be surprised how much helpful geo-specific salary information there is available.</p>
<p>Not sure which way housing availability and prices in your city are likely to move in the next couple of years? You guessed it, there's almost certainly a public data source to answer all your questions.</p>
<p>Sometimes, of course, the data you're after won't exist in exactly the format you need. Rather than give up, you can crowd source the problem. </p>
<p>That, for example, is what's behind my <a target="_blank" href="https://thedataproject.net/index.php/take-survey-fix-world/">online "Consumer Product Durability" survey</a>. If I'm successful in that particular project, I'll be able to convert the actual experiences of thousands of living, breathing consumers into practical guidance, to help people know when to pay the big bucks for name brands, and when "no-name" is just as good.</p>
<p>The bottom line is that endless streams of data are sitting there, waiting for you to capitalize on them. Whether your goals are personal or professional, if you don't use your data, rest assured, the competition (other people fighting for your job or other organizations fighting for your company's profits) will.</p>
<p>One more thought. You shouldn't think that data analytics is an all or nothing thing: perhaps because – on a basic level – they're so accessible, learning to use some analytics tools can be a powerful <em>addition</em> to your technology portfolio. Think how much more you'll be worth to your employer or business once you can tame their data.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Teach Yourself Data Analytics in 30 Days ]]>
                </title>
                <description>
                    <![CDATA[ You can learn the basics of Data Analytics with 30 days of practice.  We just released a Data Analytics course on the freeCodeCamp.org YouTube channel. The course includes a 40-minute video, as well as a website and Jupyter notebooks. If you follow t... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/teach-yourself-data-analytics-in-30-days/</link>
                <guid isPermaLink="false">66b2068df31aa965000e587b</guid>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ youtube ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Beau Carnes ]]>
                </dc:creator>
                <pubDate>Wed, 16 Jun 2021 17:19:31 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2021/06/dataanalytics.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>You can learn the basics of Data Analytics with 30 days of practice. </p>
<p>We just released a Data Analytics course on the freeCodeCamp.org YouTube channel. The course includes a 40-minute video, as well as a website and Jupyter notebooks. If you follow the plan laid out in these course resources, you can learn data analytics in 30 days.</p>
<p>David Clinton developed this course. David has written many popular technical books and created many helpful video courses.</p>
<p>The course aims to be a quick and dirty introduction to Python-based data<br>analytics. The goal is to get users with some basic understanding of the workings of Python to the point where they can confidently find and manipulate data sources and use a Jupyter environment to derive insights from their data. The course will demonstrate effective analytics methods, but does not try to be exhaustive. </p>
<p>The only prerequisite for the course is a basic understanding of Python programming, or at least how programming works in general. </p>
<p>Here are the main topics covered in this course:</p>
<ul>
<li>Installing Python and Jupyter </li>
<li>Working with the Jupyter environment</li>
<li>Finding data sources and using APIs </li>
<li>Working with data </li>
<li>Plotting data </li>
<li>Understanding data </li>
</ul>
<p>You can watch the full course below or <a target="_blank" href="https://www.youtube.com/watch?v=jcTj6FgWOpo">on the freeCodeCamp.org YouTube channel</a>.</p>
<p>Make sure to also check out the accompanying website: <a target="_blank" href="https://stories.thedataproject.net/">https://stories.thedataproject.net/</a></p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/jcTj6FgWOpo" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<h3 id="heading-full-transcript">Full Transcript</h3>
<p>(note: autogenerated)</p>
<p>David Clinton has written and created many popular technical books and video courses.</p>
<p>This data analytics course, along with the accompanying website, and Jupyter Notebooks will help you learn data analytics in 30 days.</p>
<p>Welcome to my course, I'm really glad to have you here.</p>
<p>And I'm even happier that you've decided to join the data analytics party.</p>
<p>Who am I, I'm the author of more than a dozen books on Linux and AWS administration, digital security, and dozens of courses on Pluralsight.</p>
<p>I've also got a fistful of articles right here on the Free Code Camp news site.</p>
<p>But I just write about stuff.</p>
<p>Hopefully, when you're done with this content, you'll be out using data to change the world.</p>
<p>Since you've already seen my claim that this will only take you 30 days, I should explain what this actually is going to show you the tools you'll need to find and manipulate raw data, and use various graphing tools to help you understand and interpret it.</p>
<p>But don't expect us to cover a full data science curriculum here complete with single and multivariable calculus, algorithmic problem solving, or even machine learning.</p>
<p>That would require a whole lot more time and effort.</p>
<p>If that's what you're after, check out the new data science content that Free Code Camp is in the process of bringing online, there's something else you won't get from these videos experience.</p>
<p>Once you've watched the entire course, you probably still won't be able to do much on your own.</p>
<p>The value of actual hands on experience is the mistakes you make, you know, Miss typing syntax, not properly understanding what your code is doing, or failing to account for Environment specific restrictions.</p>
<p>diagnosing and working through those mistakes is where you'll really begin to take charge and accomplish great things.</p>
<p>So where will you get that experience? If you're ambitious, and you've got exciting project ideas of your own, by all means, dive in and try it out.</p>
<p>But if you think you still need some guidance, then I've got everything you should need on my stories dot the data project dotnet site, you can work through the exercises in each of eight data stories that you'll find over there.</p>
<p>If there's a specific skill you're looking for the learning objectives index down here will point you directly to the chapter where you'll find it all that's available for anyone and for free.</p>
<p>If you happen to prefer working with a real book, you can purchase the same content in that format.</p>
<p>But don't think anything's missing from the free website. But right now,let's talk about data analytics tools.</p>
<p>There are many ways to consume data, the one you choose will reflect your specific needs, and your comfort with various skills.</p>
<p>spreadsheets, as you probably already know, are much more than just fancy calculators or places to keep your household budget numbers.</p>
<p>They also come with powerful functions, external integrations and graphing capabilities.</p>
<p>Enterprise strength tools like Tableau Splunk, or Microsoft's Power BI, are also great for crunching numbers and visualizing insights, which you can then share with your team members.</p>
<p>So then what's the big deal with Python? Well, the Python ecosystem is much, much broader than those purpose build tools.</p>
<p>And the Python community makes all kinds of useful data specific libraries and modules available.</p>
<p>When you let Python loose against your data.</p>
<p>You've got all the resources of a full bore industrial strength programming language at your fingertips.</p>
<p>It's not what you can do with it.</p>
<p>That's the challenge.</p>
<p>It's finding something that you can't do.</p>
<p>OK, but what about Jupiter? Jupiter is an open source platform within which you can load your data and execute your Python code.</p>
<p>It's a lot like a programming Id like Microsoft Visual Studio.</p>
<p>And while Jupyter Notebooks can be used with a growing number of languages, and for as many tasks as you can imagine, it's best known and loved as a host for Python data heroics.</p>
<p>Once upon a time, the lines of code you'd write for your programs would be saved to a single text file whose name ended with a dot p y suffix.</p>
<p>When you want it to run your code to see how things went.</p>
<p>You do it from the command line or a powerful and complicated source code editor like Visual Studio.</p>
<p>But for anything to work, it all had to work.</p>
<p>That would make it harder to troubleshoot when something didn't go according to spec.</p>
<p>But it would also make it a lot harder to play around with specific details just to see what happens.</p>
<p>And it also made it tough to share live versions of your code across the internet.</p>
<p>As we'll soon see, Jupyter Notebooks let you run your code a single line at a time, or altogether.</p>
<p>That flexibility makes it easier to understand your code and when things go wrong to troubleshoot it.</p>
<p>notebooks, by the way, are JSON based files that effectively move the processing environment for just about any data oriented programming code from your server or workstation to your web browser.</p>
<p>You can download Jupiter to your PC or a private server and access the interface through any browser with network access.</p>
<p>Or you can run notebooks on third party hosting services like Google's core laboratory, or for a cost cloud providers like Amazon's Sage maker studio notebooks, or Microsoft's as your notebook.</p>
<p>Jupiter comes in three flavors.</p>
<p>The two you're most likely to encounter are classic notebooks and the newer Jupiter lab.</p>
<p>Both run nicely within your browser.</p>
<p>But Jupiter lab comes with more extensions and lets you work with multiple notebook files and terminal access within a single browser tab.</p>
<p>I'll be using the classic notebook environment for the demos in this course, but there's usually no problem transferring notebooks between versions.</p>
<p>The third labor just to be complete is Jupiter hub.</p>
<p>A server version built to provide authenticated notebook access to multiple users, you can serve up to 100 or so users from a single cloud server using the littlest Jupiter hub.</p>
<p>For larger deployments involving clusters of servers, it probably be better off with a Kubernetes version known as zero to Jupiter hub with Kubernetes.</p>
<p>But all that's way beyond the scope of this course.</p>
<p>Our next job is to build our work environment.</p>
<p>Assuming you've decided to host Jupiter on your own machine, you'll need Python installed.</p>
<p>The good news is that most operating systems come with Python pre installed, you can confirm that you've got an up to date version of Python by opening a command prompt and typing Python dash dash version or sometimes Python three dash dash version, a pythons installed, you'll probably see something like this.</p>
<p>Just make sure you've got Python three installed and not the deprecated and insecure Python two.</p>
<p>If you do need to install Python manually, you're best off using pythons official documentation that will be the most complete and up to date source available that will work with whatever operating system you're on.</p>
<p>It's important to note that not all Python versions even those from three point x will necessarily behave quite the way you expect.</p>
<p>You may for instance, find that you need a library written for version 3.9.</p>
<p>But that there's no way to get it working on your 3.7 system.</p>
<p>Upgrading your system version to 3.9 might work out well for you.</p>
<p>But it could also cause some unexpected and unpleasant consequences.</p>
<p>It's hard to know when a particular Python library might also be needed by your core operating system.</p>
<p>If you pull the original version of the library, he might end up disabling the OS itself.</p>
<p>And don't think it won't happen.</p>
<p>I crippled my own O's that way just a few months ago.</p>
<p>One solution is to run Python for your project within a special virtual environment that's isolated from your larger o 's.</p>
<p>That way, you can install all the libraries and versions you like without having to worry about damaging your work system.</p>
<p>You can do this using a full scale virtual container running a Docker or as I prefer LX D image or on a standalone AWS cloud instance.</p>
<p>But you can also use pythons own v n module, you want to read the official documentation for the virtual environment instructions specific to your host OS.</p>
<p>Whichever version of Jupiter you choose, if you decide to install and run it locally, the Jupiter project officially recommends doing it through the Python Anaconda distribution and its binary package manager conda.</p>
<p>Various guides to doing that are available for various OS hosts, but this official page is a good place to start.</p>
<p>As you can see though, the Python PIP package manager is also an option.</p>
<p>Once all that's done, you should be able to open a notebook right in your browser and get right down to work.</p>
<p>For me, a notebooks most powerful feature is the way you can run subsets of your code within individual cells.</p>
<p>This makes it easier to break down long and complex programs into easily readable and executable snippets.</p>
<p>With a cell selected.</p>
<p>Clicking the Run button will execute just that cells code.</p>
<p>Note how the box on the left gets a number representing the sequential position of the execution.</p>
<p>As you become more familiar with Jupiter, you'll probably get in the habit of executing cells using Ctrl and enter rather than a mouse click, you can insert a new cell right after the one that's currently selected by clicking the plus button.</p>
<p>the up and down arrows move cells as you might expect, up and down.</p>
<p>cells, by default are formatted to handle code, Python three in my case, but they can also be set for markdown which can be handy for documenting your notebooks, or making new sections easier to find.</p>
<p>A single hashtag in markdown, for instance, represents a top level selection title, executing the cell will print the text to match your formatting instruction.</p>
<p>The precise locations and appearance of the buttons you'll use to get this stuff done will vary between the different Jupiter versions, but all the basic functions are universally available.</p>
<p>Whatever values your code creates, will remain in the kernel memory until the output from a particular cell or for the entire kernel are cleared.</p>
<p>This lets you rerun previous or subsequent cells to see what impact the change might have.</p>
<p>It also means resetting your environment for a complete start over is as easy as selecting restart kernel and clear all outputs.</p>
<p>Not all Python functionality will be available out of the box.</p>
<p>Sometimes, as you saw just before, you'll need to tell Python to load one or more modules through declarations within your code.</p>
<p>But some modules need to be installed manually from the host command line before they can even be imported.</p>
<p>For such cases, Python recommends their installer program Pip, or in some cases, the conda tool from the Anaconda distribution.</p>
<p>You can read more about using PIP for the proper care and feeding of your Python system within the helpful Python documentation site.</p>
<p>Okay, here's where we get down to real work.</p>
<p>We're going to head out to the internet to find reliable data that will help us answer a real world question.</p>
<p>And we'll use a public API to get the data.</p>
<p>Then we'll examine the data to get a feel for its current formatting, and what it'll take to fix it.</p>
<p>After applying the necessary formatting, so our code can happily read it will merge multiple data sets together so we can look for correlations, and then experiment with graphing tools to find the one that represents our data in the most intelligible way.</p>
<p>What's the problem we're trying to solve? I'm curious to see whether wages paid to us workers over the past 20 years have on average, gone up, assuming they have increased, I'd also like to know whether the extra money in their pockets has also increased their actual standard of living? To answer those questions, we're going to access two data sets collected and maintained by the US government's Bureau of Labor Statistics.</p>
<p>One of the many nice things about the Bureau of Labor Statistics usually referred to as BLS, is that they provide an API for access from within our Python scripts.</p>
<p>To make that work, you'll need to know the BLS endpoint address matching the specific data series you need the Python code to initiate the request, and for higher volume requests a BLS API key.</p>
<p>Getting the series endpoint addresses you need may take some digging around in the BLS website.</p>
<p>However, the most popular data sets are accessible through a single page.</p>
<p>This shows you what that looks like, including the endpoint codes like lns, one one and a whole lot of zeros for the civilian labor force said, You can also search for data sets on this page.</p>
<p>Searching for a computer for instance, will take you to a list that includes the deeply tempting average hourly wage for level 11 computer and mathematical occupations in Austin, Round Rock, Texas.</p>
<p>The information you'll discover by expanding that selection will include its series ID, which is the end point because I know you can barely contain your curiosity.</p>
<p>I'll click through and show you that it turns out that level 11 computer in mathematical professions in Austin, Round Rock Texas, could expect to earn $51.76 an hour back in 2019.</p>
<p>So how do you turn those series IDs into Python friendly data? manually writing get and put requests can be very picky, and it'll take a lot of tries before you get it exactly right.</p>
<p>To avoid all that, I decided to go with a third party Python library called BLS.</p>
<p>that's available through all of our share roses GitHub repo, you install the library on your host machine, using pip install BLS.</p>
<p>That's all it'll take.</p>
<p>While we're here, we might as well activate our BLS API key.</p>
<p>You register for the API From this page, and they'll send you an email with your key and a validation URL that you'll need to click.</p>
<p>Once you've got your key, you export it to your system environment on Linux, or Mac OS, that would mean running something like this, where your key is substituted for the fake one I'm using here, I'm going to use the API to request us consumer price index, CPI, and wage and salary statistics between 2002 and 2020.</p>
<p>The CPI is a measure of the price of a basket of essential consumer goods.</p>
<p>It's an important proxy for changes in the cost of living, which in turn, is an indicator of the general health of the economy.</p>
<p>Our wages data will come from the BLS employment cost index, covering wages and salaries for private industry workers in all industries and occupations.</p>
<p>A growing employment index would at first glance, suggests that things are getting better for most people.</p>
<p>However, seeing the average employment wage trends in isolation isn't all that useful.</p>
<p>After all, the highest salary won't do you much good if your basic expenses are higher still.</p>
<p>So the goal is to pull both the CPI and wages data sets, and then correlate them looking for patterns.</p>
<p>This will show us how wages have been changing in relation to costs.</p>
<p>Now, let me show you how it actually works.</p>
<p>With valid endpoints for the two data sets we are going to be using, we're all set to start digging for CPI, and employment gold.</p>
<p>Importing these four libraries, including BLS will give us all the tools we'll need.</p>
<p>pandas stands for Python for data analysis, which is a library for working with data as data frames.</p>
<p>Data frames are perhaps the most important structure you'll use as you learn to process large data sets.</p>
<p>NumPy is a library for executing mathematical functions against large arrays of data.</p>
<p>And map plot live is a library for plotting data in visual graphs of various kinds.</p>
<p>when importing a library, you assign it the name you'll use to invoke it.</p>
<p>You can choose just about anything but pandas is often represented by PD NumPy as NP and matplotlib as PLT, I'm also importing the BLS library that we installed a bit earlier, that will be invoked using its actual name, BLS.</p>
<p>I'll execute that cell.</p>
<p>Now I pass the BLS endpoint for the CPI data series to the BLS get series command from the BLS library.</p>
<p>The endpoint code itself was of course, copied from the popular data sets page on the BLS website, I'll assign the data series that comes back to a data frame using the variable CPI, and then save the data frame to a local CSV file.</p>
<p>This isn't necessary, but you might find it easier to work with the data when it's saved locally.</p>
<p>Next, I'll load the data from the new CSV file using the pan das PD read CSV command against the file name.</p>
<p>I'll assign the variable name CPI data to the new data frame that comes out the other end.</p>
<p>Running just CPI data will print out the first and last five lines of the data frame.</p>
<p>The date column contains month and year values.</p>
<p>And the second column contains our actual data.</p>
<p>I'd like to simplify the headers to make them easier to work with.</p>
<p>So I'll use the pan das columns attribute I definitely prefer this way.</p>
<p>However, we'll need to also see the wages data to know whether the formatted uses is compatible with our CPI set.</p>
<p>So I'll pull the wages data series using the BLS library and assign it to the wages data frame.</p>
<p>Once again, I'll save it to a local CSV file and read that data into a new data frame called df.</p>
<p>I'll clean up my column headers and use head to print only the first five lines of data, you should notice two things.</p>
<p>This data isn't delivered in monthly increments, but quarterly, but only one entry for every three months.</p>
<p>And the date format is different.</p>
<p>Instead of a month number, there's q1 or q2.</p>
<p>If you want Python to sync between our two data sets, we'll need to do some editing.</p>
<p>I'll do that by replacing every March data point meaning any date entry containing the string dash 03 within its date value with the string q one June that is a string including dash 06 we'll get q2 and so on.</p>
<p>As you can see now when I print just the date column, some values have been updated to the new format.</p>
<p>But the rest of them are from this point unnecessary and will cause us trouble So we'll have to get rid of them altogether.</p>
<p>I'll do that by creating a new data frame called New CPI and reading into it the contents of the old CPI data data frame.</p>
<p>But I'll use the pan das string dot contains function to identify all the rows in the data frame that contain a dash.</p>
<p>And by specifying false, dropping them, we'll be left with only properly formatted quarterly data points.</p>
<p>And I said, I'll save this data frame to a CSV file to notice how we've dropped from 232 rows to just 77.</p>
<p>Just because I'm paranoid, I'll create a new data frame called New df.</p>
<p>So the old df data frame will still be available to me, should I accidentally make a mess with what we're about to do next, with our data all neatened up, we're ready to begin our analysis.</p>
<p>We've got a big problem here, the data in the CPI set comes in absolute point values.</p>
<p>While the wages are reported in percentages measuring growth, as is there's no way to accurately compare them.</p>
<p>For one thing, each row of our wages data is the percentage by which wages whatever is in that quarter had the current rate continued for a full 12 months.</p>
<p>So not only do those values not correspond to the absolute CPI price data, they're not even technically true of their own timeframe.</p>
<p>So when we're told that the rate for the first quarter of 2002 was 3.5%.</p>
<p>That means that if wages continued to rise at the current first quarter rate, for a full 12 months, the annual average growth would have been 3.5%, but not 14%, which means the numbers we're going to work with will have to be adjusted.</p>
<p>That's because the actual growth during say the three months of 2002, quarter one wasn't 3.5%, but only a quarter of that, or 0.8 75%.</p>
<p>If I don't make this adjustment, but continue to map quarterly growth numbers to quarterly CPI prices that are calculated output would lead us to think that wages are growing so fast, that they become detached from reality.</p>
<p>Now, I should warn you that solving this compatibility problem will require some fake math, I'm going to divide each quarterly growth rate by four or in other words, I'll pretend that the real changes to wages during those three months were exactly one quarter of the reported year over year rate.</p>
<p>I'm sure that's almost certainly not true.</p>
<p>And it's a gross simplification.</p>
<p>However, for the big historical picture I'm trying to draw here, it's probably close enough.</p>
<p>Now that will still leave us with a number that's a percentage, but the corresponding CPI number we're comparing it to is again a point figure.</p>
<p>To solve this problem.</p>
<p>I'll apply one more piece of fakery.</p>
<p>To convert those percentages to match the CPI values, I'm going to create a function, I'll feed the function the starting 2002 first quarter CPI value of 170 7.1.</p>
<p>That'll be my baseline.</p>
<p>I'll give that variable the name new num.</p>
<p>For each iteration, the function will make through the rows of my wages data series, I'll divide the current wage value x by 400.</p>
<p>Where'd I get that number 100 simply converts the percentage to 3.5, etc, to a decimal 0.035.</p>
<p>And the four will reduce the annual or 12 month rate to a quarterly rate covering three months to convert that to a usable number a multiply it by the current value of new num, and then add new num to the product.</p>
<p>That should give us an approximation of the original CPI value adjusted by the related wage growth percentage.</p>
<p>But of course, this won't be a number that has any direct equivalent in the real world.</p>
<p>Instead, it is, as I said, an arbitrary approximation of what that number might have been.</p>
<p>But again, I think it'll be close enough for our purposes.</p>
<p>Take a moment to read through the function.</p>
<p>Global new num declares a variable as global.</p>
<p>This makes it possible for me to replace the original value of new num with the functions output, so the percentage in the next row will be adjusted by the updated value.</p>
<p>Note also how any strings will be ignored.</p>
<p>And finally, Note how the updated data series will populate the new wages data variable.</p>
<p>Let's check that new data looks great.</p>
<p>Our next task will be to merge our two data frames and then plot their data.</p>
<p>Don't go away.</p>
<p>What's left, we need to merge our two data series so Python can compare them.</p>
<p>But since we've already done all the cleaning up and manipulation, this will go smoothly.</p>
<p>I'll create a new data frame called merge data and feed it with the URL Put up this PD merge function, I simply supply the names of my two data frames and specify that the date column should be the index.</p>
<p>That wasn't hard.</p>
<p>Let's take a look.</p>
<p>Our data is all there, we can visually scan through the CPI and wages columns and look for any unusual relationships.</p>
<p>But that defeats the point.</p>
<p>Python data analytics is all about letting our code do that for us.</p>
<p>Let's plot the thing.</p>
<p>Here, we'll tell plot to take our merge data frame merge data and create a bar chart.</p>
<p>Because there's an awful lot of data here, I'll extend the size of the chart with a manual fixed size value, I set the x axis to use values in the date column.</p>
<p>And again, because there are so many of them, I'll rotate the labels by 45 degrees to make them more readable.</p>
<p>Finally, I'll set the labels for the x and y axes.</p>
<p>This is what comes out the other end, because of the crowding, it's not especially easy to read.</p>
<p>But you can see that the orange wages bars are for the most part higher than the blue CPI bars.</p>
<p>That means that wages are experiencing a higher growth rate than the CPI, we'll have a stab at analyzing some of this a bit later, is there an easier way to display all this data, you bet there is I can change the value of kind from bar to line, and things will instantly improve.</p>
<p>Here's how the new code will work as a line plot and with the grid.</p>
<p>Python, along with its associated libraries, gives us the ability to use a much wider variety of plotting tools than just bar and line graphs.</p>
<p>We're going to explore just two of them here, scatter plots, and histograms.</p>
<p>We'll also talk a bit about how regression lines work, and what kinds of insights they can show us.</p>
<p>We'll begin with scatter plots.</p>
<p>This code is from the property rights and economic development chapter on my Teach Yourself data analytics website, you can catch up on the background over there.</p>
<p>But the code you're looking at comes from two data sources, the World Bank's measure of per capita gross domestic product by country and the index of economic freedom data from the heritage.o</p>
<p>rg site, I merged data from the two data frames into this one called merge data, I'll create a simple scatterplot.</p>
<p>With this one line command, we can clearly see a pattern, the higher the per capita gross domestic product, meaning the more economic activity a country generates, the further to the right on the x axis, a country's dot is likely to fall.</p>
<p>And the further to the right, the higher is the economic freedom score.</p>
<p>Of course, there are anomalies in our data.</p>
<p>There are countries whose position appears way out of range of all the others, it'd be nice if we could somehow see which countries those are, and would also be nice if we could quantify the precise statistical relationship between our two values, rather than having to visually guess.</p>
<p>We'll begin by visualizing those anomalies in our data.</p>
<p>To make this happen.</p>
<p>all important another couple of libraries that are part of the plotly family of tools, you may need to manually install them on your host using pip install plotly.</p>
<p>Before those will work.</p>
<p>From there, we can run p x scatter, and point it to our merged data data frame associating the score column with the x axis and value with the y axis.</p>
<p>So we'll be able to hover over a dot and see the data it represents.</p>
<p>We'll add the hover data argument and tell it to include country and score data.</p>
<p>This time when you run the code, you get the same nice plot.</p>
<p>But if you hover your mouse over any dot, you'll also see its data values.</p>
<p>In this example, you can see that the tiny but rich country of Luxembourg has an economic freedom score of 75.9</p>
<p>and a per capita GDP of more than $121,000.</p>
<p>You can similarly pick out other countries at either end of the chart, we can learn more about the statistical relationship between our values by adding a regression line, a measure of the data is R squared value.</p>
<p>We already saw how our plot showed a visible trend up and to the right.</p>
<p>But we also saw there were outliers.</p>
<p>Can we be confident that the outliers are the exceptions, and that the overall relationship between our two data sources is sound.</p>
<p>There's only so much we can assume based on visually viewing the graph at some point, we'll need hard numbers to describe what we're looking at.</p>
<p>A simple linear regression analysis can give us a measure of the strength of the relationship between a dependent variable and the data model.</p>
<p>r squared is a number between zero and 100%.</p>
<p>Where 100% would indicate a perfect fit.</p>
<p>Of course, in the real world, a 100% fit is next to impossible.</p>
<p>You'll judge The accuracy of your model or assumption within the context of the data you're working with, how can you add a regression line to a panda's chart? There are, as always, many ways to go about it, I like simple and the O LS trend line approach is about as simple as it gets, just add a trendline argument to the code we've already been using.</p>
<p>That's it.</p>
<p>Oh LS By the way, stands for ordinary least squares, which is a, which is a type of linear regression.</p>
<p>And here's how it looked with our regression line.</p>
<p>When I hover over the line, I'm showing an R squared value of 0.550451.</p>
<p>Or in other words around 35%.</p>
<p>For our purposes, I consider that a pretty strong correlation.</p>
<p>A histogram is a plotting tool that breaks your data down into bins. A bin is actually an approximation of a statistically appropriate interval between sets of your data bins attempt to guess at the probability density function PDF, that will best represent the values you're actually using.</p>
<p>But they may not display exactly the way you think, especially when you use a default value.</p>
<p>I'll illustrate how this works or actually how it doesn't work.</p>
<p>Using data from the do birthdays make elite athletes chapter on the website.</p>
<p>As you can see over there, I'd scraped the semi official NHL API for the birth dates of around 1100 current NHL players.</p>
<p>My goal was to visualize the distribution of their birth dates across all 12 months to see if their births were concentrated within a specific yearly season.</p>
<p>When I display the data using a histogram, we didn't see the pattern we'd expected.</p>
<p>In fact, the pattern wasn't truly representative of the real world.</p>
<p>That's because histograms are great for showing frequency distributions by grouping data points together into bins.</p>
<p>This can help us quickly visualize the state of a very large data set where granular precision will get in the way, but it can be misleading for use cases like ours.</p>
<p>Since we were looking for a literal mapping of events to calendar dates.</p>
<p>Even setting the bin amount to 12 to match the number of months won't help, because a histogram won't necessarily stick to those exact borders.</p>
<p>What we really need here is a plain old bar graph that incorporates our value counts numbers, I'll pipe the results of value counts to a data frame called df one, and then plot that as a simple bar graph.</p>
<p>In the next module, we're going to talk about understanding our data visualizations, and integrating what we see in our Jupyter Notebooks with stuff that happens out there.</p>
<p>In the real world.</p>
<p>Stay tuned.</p>
<p>We're supposed to be doing data analytics here, such as staring at pretty graphs probably isn't the whole point.</p>
<p>The CPI and wages data sets we plotted in the previous chapter, for instance, showed us a clear general correlation.</p>
<p>But there were some visually recognizable anomalies.</p>
<p>Unless we can connect those anomalies with historical events, and explain them in a historical context, we won't be getting the full value from our data.</p>
<p>But even before going there, we should confirm that our plots actually make sense in the context of their data sources.</p>
<p>Working with our BLS examples, let's look at graphs to compare CPI and wages data from both before and after our manipulation.</p>
<p>That way, we can be sure that our math and particularly our fake math didn't skew things too badly.</p>
<p>Here's what our CPI data look like when plotted using the raw data.</p>
<p>It's certainly a busy graph, but you can clearly see the gentle upward slope punctuated by a handful of sudden jumps.</p>
<p>Next, we'll see that same data after removing three out of every four months data points, the same ups and downs are still visible.</p>
<p>Given our overall goals.</p>
<p>I'd categorize our transformation as a success.</p>
<p>Now, how about the wages data here, because we move from percentages to currency, the transformation was more intrusive, and the risks of misrepresentation were greater.</p>
<p>We'll also need to take into account the way a percentage will display differently from an absolute value.</p>
<p>Here's the original data.</p>
<p>Note how there's no consistent curve, either upwards or downwards.</p>
<p>That's because we're measuring the rate of growth as it took place within each individual quarter, not the growth itself.</p>
<p>Now compare that with this line graph of that wage data now converted to currency based values.</p>
<p>The gentle curve, you see makes some sense, it's about real growth after all, not growth rates, but it's also possible to recognize a few spots where the curve steepens and others where it smooths out a bit more.</p>
<p>Why are the slopes so smooth in comparison with the percentage based data? Look at the y axis labels.</p>
<p>The index graph is measured in points between 180 and 280.</p>
<p>While the percentage graph goes from zero to 3.5.</p>
<p>It's the scale that's different.</p>
<p>All in all, I believe we're safe concluding that what we produced is a good match with our source data.</p>
<p>establishing some kind of historical context for your data will require looking for anomalies and associating them with known historical events.</p>
<p>That's something I do at length in the wages and CPI Reality Check chapter on the website.</p>
<p>If you're interested, I'm sure you'll work through that material on your own.</p>
<p>But I think you've seen enough here to get a picture of how plotting the right visualization can be helpful.  </p>
<p>But that brings us to the end of this particular course, as I've mentioned, a number of times already, the full curriculum is available on my the data project dotnet site, and you're more than welcome to join all the cool kids over there and be in touch if you've got something to add to the conversation.</p>
<p>The main thing is to realize that the end of this course isn't anywhere near the end of your data analytics education.</p>
<p>watching me calmly execute nice, clean code samples isn't really learning.</p>
<p>Unless you're a very special breed of genius, you won't begin to understand how all this really works.</p>
<p>Until you dive in and work things through yourself.</p>
<p>I say worth things through.</p>
<p>But what I really mean is not worth things through because it's mistakes and frustration that are the best teachers.</p>
<p>Don't imagine that my Python code just came to exist on a quiet afternoon.</p>
<p>While I was sipping nice hot coffee.</p>
<p>First of all, I don't drink coffee.</p>
<p>But more to the point, there was nothing quiet about it.</p>
<p>There were humiliating failures, reformulations start overs and countless trips to stack overflow before things began to take shape.</p>
<p>But the more problems I faced and overcame, the deeper the process sank into my mind, and the better I got at it.</p>
<p>And so will you just be prepared for tough times ahead.</p>
<p>Before you all run off and get on with your day, let's spend a moment or two reviewing everything we saw here.</p>
<p>We spoke about the many ways you can work with Jupyter notebooks, including online platforms like Google's Collaboratory, and locally hosting either Jupiter lab or plastic notebooks within introduced ourselves to the Jupiter environment, learning about cells kernels and the operating environment.</p>
<p>We saw how we can find data through public API's and how to integrate API credentials into our Python environment.</p>
<p>Python libraries and modules were our next focus, including how to import appropriate libraries to allow us to effectively clean and manipulate our data.</p>
<p>And finally, turning to some actual data analytics.</p>
<p>We learned some basics of plotting, including working with scatter plots, regression lines and histograms.</p>
<p>And we closed out the chorus with a quick discussion of how to use our data visualizations to integrate our insights with the real world.</p>
<p>I hope this has been helpful for you and I invite you to check out some of my other content on my main website.</p>
<p>Take care.  </p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
