Teach Yourself Data Analytics in 30 Days

You can learn the basics of Data Analytics with 30 days of practice.

We just released a Data Analytics course on the freeCodeCamp.org YouTube channel. The course includes a 40-minute video, as well as a website and Jupyter notebooks. If you follow the plan laid out in these course resources, you can learn data analytics in 30 days.

David Clinton developed this course. David has written many popular technical books and created many helpful video courses.

The course aims to be a quick and dirty introduction to Python-based data
analytics. The goal is to get users with some basic understanding of the workings of Python to the point where they can confidently find and manipulate data sources and use a Jupyter environment to derive insights from their data. The course will demonstrate effective analytics methods, but does not try to be exhaustive.

The only prerequisite for the course is a basic understanding of Python programming, or at least how programming works in general.

Here are the main topics covered in this course:

Installing Python and Jupyter
Working with the Jupyter environment
Finding data sources and using APIs
Working with data
Plotting data
Understanding data

You can watch the full course below or on the freeCodeCamp.org YouTube channel.

Make sure to also check out the accompanying website: https://stories.thedataproject.net/

Full Transcript

(note: autogenerated)

David Clinton has written and created many popular technical books and video courses.

This data analytics course, along with the accompanying website, and Jupyter Notebooks will help you learn data analytics in 30 days.

Welcome to my course, I'm really glad to have you here.

And I'm even happier that you've decided to join the data analytics party.

Who am I, I'm the author of more than a dozen books on Linux and AWS administration, digital security, and dozens of courses on Pluralsight.

I've also got a fistful of articles right here on the Free Code Camp news site.

But I just write about stuff.

Hopefully, when you're done with this content, you'll be out using data to change the world.

Since you've already seen my claim that this will only take you 30 days, I should explain what this actually is going to show you the tools you'll need to find and manipulate raw data, and use various graphing tools to help you understand and interpret it.

But don't expect us to cover a full data science curriculum here complete with single and multivariable calculus, algorithmic problem solving, or even machine learning.

That would require a whole lot more time and effort.

If that's what you're after, check out the new data science content that Free Code Camp is in the process of bringing online, there's something else you won't get from these videos experience.

Once you've watched the entire course, you probably still won't be able to do much on your own.

The value of actual hands on experience is the mistakes you make, you know, Miss typing syntax, not properly understanding what your code is doing, or failing to account for Environment specific restrictions.

diagnosing and working through those mistakes is where you'll really begin to take charge and accomplish great things.

So where will you get that experience? If you're ambitious, and you've got exciting project ideas of your own, by all means, dive in and try it out.

But if you think you still need some guidance, then I've got everything you should need on my stories dot the data project dotnet site, you can work through the exercises in each of eight data stories that you'll find over there.

If there's a specific skill you're looking for the learning objectives index down here will point you directly to the chapter where you'll find it all that's available for anyone and for free.

If you happen to prefer working with a real book, you can purchase the same content in that format.

But don't think anything's missing from the free website. But right now,let's talk about data analytics tools.

There are many ways to consume data, the one you choose will reflect your specific needs, and your comfort with various skills.

spreadsheets, as you probably already know, are much more than just fancy calculators or places to keep your household budget numbers.

They also come with powerful functions, external integrations and graphing capabilities.

Enterprise strength tools like Tableau Splunk, or Microsoft's Power BI, are also great for crunching numbers and visualizing insights, which you can then share with your team members.

So then what's the big deal with Python? Well, the Python ecosystem is much, much broader than those purpose build tools.

And the Python community makes all kinds of useful data specific libraries and modules available.

When you let Python loose against your data.

You've got all the resources of a full bore industrial strength programming language at your fingertips.

It's not what you can do with it.

That's the challenge.

It's finding something that you can't do.

OK, but what about Jupiter? Jupiter is an open source platform within which you can load your data and execute your Python code.

It's a lot like a programming Id like Microsoft Visual Studio.

And while Jupyter Notebooks can be used with a growing number of languages, and for as many tasks as you can imagine, it's best known and loved as a host for Python data heroics.

Once upon a time, the lines of code you'd write for your programs would be saved to a single text file whose name ended with a dot p y suffix.

When you want it to run your code to see how things went.

You do it from the command line or a powerful and complicated source code editor like Visual Studio.

But for anything to work, it all had to work.

That would make it harder to troubleshoot when something didn't go according to spec.

But it would also make it a lot harder to play around with specific details just to see what happens.

And it also made it tough to share live versions of your code across the internet.

As we'll soon see, Jupyter Notebooks let you run your code a single line at a time, or altogether.

That flexibility makes it easier to understand your code and when things go wrong to troubleshoot it.

notebooks, by the way, are JSON based files that effectively move the processing environment for just about any data oriented programming code from your server or workstation to your web browser.

You can download Jupiter to your PC or a private server and access the interface through any browser with network access.

Or you can run notebooks on third party hosting services like Google's core laboratory, or for a cost cloud providers like Amazon's Sage maker studio notebooks, or Microsoft's as your notebook.

Jupiter comes in three flavors.

The two you're most likely to encounter are classic notebooks and the newer Jupiter lab.

Both run nicely within your browser.

But Jupiter lab comes with more extensions and lets you work with multiple notebook files and terminal access within a single browser tab.

I'll be using the classic notebook environment for the demos in this course, but there's usually no problem transferring notebooks between versions.

The third labor just to be complete is Jupiter hub.

A server version built to provide authenticated notebook access to multiple users, you can serve up to 100 or so users from a single cloud server using the littlest Jupiter hub.

For larger deployments involving clusters of servers, it probably be better off with a Kubernetes version known as zero to Jupiter hub with Kubernetes.

But all that's way beyond the scope of this course.

Our next job is to build our work environment.

Assuming you've decided to host Jupiter on your own machine, you'll need Python installed.

The good news is that most operating systems come with Python pre installed, you can confirm that you've got an up to date version of Python by opening a command prompt and typing Python dash dash version or sometimes Python three dash dash version, a pythons installed, you'll probably see something like this.

Just make sure you've got Python three installed and not the deprecated and insecure Python two.

If you do need to install Python manually, you're best off using pythons official documentation that will be the most complete and up to date source available that will work with whatever operating system you're on.

It's important to note that not all Python versions even those from three point x will necessarily behave quite the way you expect.

You may for instance, find that you need a library written for version 3.9.

But that there's no way to get it working on your 3.7 system.

Upgrading your system version to 3.9 might work out well for you.

But it could also cause some unexpected and unpleasant consequences.

It's hard to know when a particular Python library might also be needed by your core operating system.

If you pull the original version of the library, he might end up disabling the OS itself.

And don't think it won't happen.

I crippled my own O's that way just a few months ago.

One solution is to run Python for your project within a special virtual environment that's isolated from your larger o 's.

That way, you can install all the libraries and versions you like without having to worry about damaging your work system.

You can do this using a full scale virtual container running a Docker or as I prefer LX D image or on a standalone AWS cloud instance.

But you can also use pythons own v n module, you want to read the official documentation for the virtual environment instructions specific to your host OS.

Whichever version of Jupiter you choose, if you decide to install and run it locally, the Jupiter project officially recommends doing it through the Python Anaconda distribution and its binary package manager conda.

Various guides to doing that are available for various OS hosts, but this official page is a good place to start.

As you can see though, the Python PIP package manager is also an option.

Once all that's done, you should be able to open a notebook right in your browser and get right down to work.

For me, a notebooks most powerful feature is the way you can run subsets of your code within individual cells.

This makes it easier to break down long and complex programs into easily readable and executable snippets.

With a cell selected.

Clicking the Run button will execute just that cells code.

Note how the box on the left gets a number representing the sequential position of the execution.

As you become more familiar with Jupiter, you'll probably get in the habit of executing cells using Ctrl and enter rather than a mouse click, you can insert a new cell right after the one that's currently selected by clicking the plus button.

the up and down arrows move cells as you might expect, up and down.

cells, by default are formatted to handle code, Python three in my case, but they can also be set for markdown which can be handy for documenting your notebooks, or making new sections easier to find.

A single hashtag in markdown, for instance, represents a top level selection title, executing the cell will print the text to match your formatting instruction.

The precise locations and appearance of the buttons you'll use to get this stuff done will vary between the different Jupiter versions, but all the basic functions are universally available.

Whatever values your code creates, will remain in the kernel memory until the output from a particular cell or for the entire kernel are cleared.

This lets you rerun previous or subsequent cells to see what impact the change might have.

It also means resetting your environment for a complete start over is as easy as selecting restart kernel and clear all outputs.

Not all Python functionality will be available out of the box.

Sometimes, as you saw just before, you'll need to tell Python to load one or more modules through declarations within your code.

But some modules need to be installed manually from the host command line before they can even be imported.

For such cases, Python recommends their installer program Pip, or in some cases, the conda tool from the Anaconda distribution.

You can read more about using PIP for the proper care and feeding of your Python system within the helpful Python documentation site.

Okay, here's where we get down to real work.

We're going to head out to the internet to find reliable data that will help us answer a real world question.

And we'll use a public API to get the data.

Then we'll examine the data to get a feel for its current formatting, and what it'll take to fix it.

After applying the necessary formatting, so our code can happily read it will merge multiple data sets together so we can look for correlations, and then experiment with graphing tools to find the one that represents our data in the most intelligible way.

What's the problem we're trying to solve? I'm curious to see whether wages paid to us workers over the past 20 years have on average, gone up, assuming they have increased, I'd also like to know whether the extra money in their pockets has also increased their actual standard of living? To answer those questions, we're going to access two data sets collected and maintained by the US government's Bureau of Labor Statistics.

One of the many nice things about the Bureau of Labor Statistics usually referred to as BLS, is that they provide an API for access from within our Python scripts.

To make that work, you'll need to know the BLS endpoint address matching the specific data series you need the Python code to initiate the request, and for higher volume requests a BLS API key.

Getting the series endpoint addresses you need may take some digging around in the BLS website.

However, the most popular data sets are accessible through a single page.

This shows you what that looks like, including the endpoint codes like lns, one one and a whole lot of zeros for the civilian labor force said, You can also search for data sets on this page.

Searching for a computer for instance, will take you to a list that includes the deeply tempting average hourly wage for level 11 computer and mathematical occupations in Austin, Round Rock, Texas.

The information you'll discover by expanding that selection will include its series ID, which is the end point because I know you can barely contain your curiosity.

I'll click through and show you that it turns out that level 11 computer in mathematical professions in Austin, Round Rock Texas, could expect to earn $51.76 an hour back in 2019.

So how do you turn those series IDs into Python friendly data? manually writing get and put requests can be very picky, and it'll take a lot of tries before you get it exactly right.

To avoid all that, I decided to go with a third party Python library called BLS.

that's available through all of our share roses GitHub repo, you install the library on your host machine, using pip install BLS.

That's all it'll take.

While we're here, we might as well activate our BLS API key.

You register for the API From this page, and they'll send you an email with your key and a validation URL that you'll need to click.

Once you've got your key, you export it to your system environment on Linux, or Mac OS, that would mean running something like this, where your key is substituted for the fake one I'm using here, I'm going to use the API to request us consumer price index, CPI, and wage and salary statistics between 2002 and 2020.

The CPI is a measure of the price of a basket of essential consumer goods.

It's an important proxy for changes in the cost of living, which in turn, is an indicator of the general health of the economy.

Our wages data will come from the BLS employment cost index, covering wages and salaries for private industry workers in all industries and occupations.

A growing employment index would at first glance, suggests that things are getting better for most people.

However, seeing the average employment wage trends in isolation isn't all that useful.

After all, the highest salary won't do you much good if your basic expenses are higher still.

So the goal is to pull both the CPI and wages data sets, and then correlate them looking for patterns.

This will show us how wages have been changing in relation to costs.

Now, let me show you how it actually works.

With valid endpoints for the two data sets we are going to be using, we're all set to start digging for CPI, and employment gold.

Importing these four libraries, including BLS will give us all the tools we'll need.

pandas stands for Python for data analysis, which is a library for working with data as data frames.

Data frames are perhaps the most important structure you'll use as you learn to process large data sets.

NumPy is a library for executing mathematical functions against large arrays of data.

And map plot live is a library for plotting data in visual graphs of various kinds.

when importing a library, you assign it the name you'll use to invoke it.

You can choose just about anything but pandas is often represented by PD NumPy as NP and matplotlib as PLT, I'm also importing the BLS library that we installed a bit earlier, that will be invoked using its actual name, BLS.

I'll execute that cell.

Now I pass the BLS endpoint for the CPI data series to the BLS get series command from the BLS library.

The endpoint code itself was of course, copied from the popular data sets page on the BLS website, I'll assign the data series that comes back to a data frame using the variable CPI, and then save the data frame to a local CSV file.

This isn't necessary, but you might find it easier to work with the data when it's saved locally.

Next, I'll load the data from the new CSV file using the pan das PD read CSV command against the file name.

I'll assign the variable name CPI data to the new data frame that comes out the other end.

Running just CPI data will print out the first and last five lines of the data frame.

The date column contains month and year values.

And the second column contains our actual data.

I'd like to simplify the headers to make them easier to work with.

So I'll use the pan das columns attribute I definitely prefer this way.

However, we'll need to also see the wages data to know whether the formatted uses is compatible with our CPI set.

So I'll pull the wages data series using the BLS library and assign it to the wages data frame.

Once again, I'll save it to a local CSV file and read that data into a new data frame called df.

I'll clean up my column headers and use head to print only the first five lines of data, you should notice two things.

This data isn't delivered in monthly increments, but quarterly, but only one entry for every three months.

And the date format is different.

Instead of a month number, there's q1 or q2.

If you want Python to sync between our two data sets, we'll need to do some editing.

I'll do that by replacing every March data point meaning any date entry containing the string dash 03 within its date value with the string q one June that is a string including dash 06 we'll get q2 and so on.

As you can see now when I print just the date column, some values have been updated to the new format.

But the rest of them are from this point unnecessary and will cause us trouble So we'll have to get rid of them altogether.

I'll do that by creating a new data frame called New CPI and reading into it the contents of the old CPI data data frame.

But I'll use the pan das string dot contains function to identify all the rows in the data frame that contain a dash.

And by specifying false, dropping them, we'll be left with only properly formatted quarterly data points.

And I said, I'll save this data frame to a CSV file to notice how we've dropped from 232 rows to just 77.

Just because I'm paranoid, I'll create a new data frame called New df.

So the old df data frame will still be available to me, should I accidentally make a mess with what we're about to do next, with our data all neatened up, we're ready to begin our analysis.

We've got a big problem here, the data in the CPI set comes in absolute point values.

While the wages are reported in percentages measuring growth, as is there's no way to accurately compare them.

For one thing, each row of our wages data is the percentage by which wages whatever is in that quarter had the current rate continued for a full 12 months.

So not only do those values not correspond to the absolute CPI price data, they're not even technically true of their own timeframe.

So when we're told that the rate for the first quarter of 2002 was 3.5%.

That means that if wages continued to rise at the current first quarter rate, for a full 12 months, the annual average growth would have been 3.5%, but not 14%, which means the numbers we're going to work with will have to be adjusted.

That's because the actual growth during say the three months of 2002, quarter one wasn't 3.5%, but only a quarter of that, or 0.8 75%.

If I don't make this adjustment, but continue to map quarterly growth numbers to quarterly CPI prices that are calculated output would lead us to think that wages are growing so fast, that they become detached from reality.

Now, I should warn you that solving this compatibility problem will require some fake math, I'm going to divide each quarterly growth rate by four or in other words, I'll pretend that the real changes to wages during those three months were exactly one quarter of the reported year over year rate.

I'm sure that's almost certainly not true.

And it's a gross simplification.

However, for the big historical picture I'm trying to draw here, it's probably close enough.

Now that will still leave us with a number that's a percentage, but the corresponding CPI number we're comparing it to is again a point figure.

To solve this problem.

I'll apply one more piece of fakery.

To convert those percentages to match the CPI values, I'm going to create a function, I'll feed the function the starting 2002 first quarter CPI value of 170 7.1.

That'll be my baseline.

I'll give that variable the name new num.

For each iteration, the function will make through the rows of my wages data series, I'll divide the current wage value x by 400.

Where'd I get that number 100 simply converts the percentage to 3.5, etc, to a decimal 0.035.

And the four will reduce the annual or 12 month rate to a quarterly rate covering three months to convert that to a usable number a multiply it by the current value of new num, and then add new num to the product.

That should give us an approximation of the original CPI value adjusted by the related wage growth percentage.

But of course, this won't be a number that has any direct equivalent in the real world.

Instead, it is, as I said, an arbitrary approximation of what that number might have been.

But again, I think it'll be close enough for our purposes.

Take a moment to read through the function.

Global new num declares a variable as global.

This makes it possible for me to replace the original value of new num with the functions output, so the percentage in the next row will be adjusted by the updated value.

Note also how any strings will be ignored.

And finally, Note how the updated data series will populate the new wages data variable.

Let's check that new data looks great.

Our next task will be to merge our two data frames and then plot their data.

Don't go away.

What's left, we need to merge our two data series so Python can compare them.

But since we've already done all the cleaning up and manipulation, this will go smoothly.

I'll create a new data frame called merge data and feed it with the URL Put up this PD merge function, I simply supply the names of my two data frames and specify that the date column should be the index.

That wasn't hard.

Let's take a look.

Our data is all there, we can visually scan through the CPI and wages columns and look for any unusual relationships.

But that defeats the point.

Python data analytics is all about letting our code do that for us.

Let's plot the thing.

Here, we'll tell plot to take our merge data frame merge data and create a bar chart.

Because there's an awful lot of data here, I'll extend the size of the chart with a manual fixed size value, I set the x axis to use values in the date column.

And again, because there are so many of them, I'll rotate the labels by 45 degrees to make them more readable.

Finally, I'll set the labels for the x and y axes.

This is what comes out the other end, because of the crowding, it's not especially easy to read.

But you can see that the orange wages bars are for the most part higher than the blue CPI bars.

That means that wages are experiencing a higher growth rate than the CPI, we'll have a stab at analyzing some of this a bit later, is there an easier way to display all this data, you bet there is I can change the value of kind from bar to line, and things will instantly improve.

Here's how the new code will work as a line plot and with the grid.

Python, along with its associated libraries, gives us the ability to use a much wider variety of plotting tools than just bar and line graphs.

We're going to explore just two of them here, scatter plots, and histograms.

We'll also talk a bit about how regression lines work, and what kinds of insights they can show us.

We'll begin with scatter plots.

This code is from the property rights and economic development chapter on my Teach Yourself data analytics website, you can catch up on the background over there.

But the code you're looking at comes from two data sources, the World Bank's measure of per capita gross domestic product by country and the index of economic freedom data from the heritage.o

rg site, I merged data from the two data frames into this one called merge data, I'll create a simple scatterplot.

With this one line command, we can clearly see a pattern, the higher the per capita gross domestic product, meaning the more economic activity a country generates, the further to the right on the x axis, a country's dot is likely to fall.

And the further to the right, the higher is the economic freedom score.

Of course, there are anomalies in our data.

There are countries whose position appears way out of range of all the others, it'd be nice if we could somehow see which countries those are, and would also be nice if we could quantify the precise statistical relationship between our two values, rather than having to visually guess.

We'll begin by visualizing those anomalies in our data.

To make this happen.

all important another couple of libraries that are part of the plotly family of tools, you may need to manually install them on your host using pip install plotly.

Before those will work.

From there, we can run p x scatter, and point it to our merged data data frame associating the score column with the x axis and value with the y axis.

So we'll be able to hover over a dot and see the data it represents.

We'll add the hover data argument and tell it to include country and score data.

This time when you run the code, you get the same nice plot.

But if you hover your mouse over any dot, you'll also see its data values.

In this example, you can see that the tiny but rich country of Luxembourg has an economic freedom score of 75.9

and a per capita GDP of more than $121,000.

You can similarly pick out other countries at either end of the chart, we can learn more about the statistical relationship between our values by adding a regression line, a measure of the data is R squared value.

We already saw how our plot showed a visible trend up and to the right.

But we also saw there were outliers.

Can we be confident that the outliers are the exceptions, and that the overall relationship between our two data sources is sound.

There's only so much we can assume based on visually viewing the graph at some point, we'll need hard numbers to describe what we're looking at.

A simple linear regression analysis can give us a measure of the strength of the relationship between a dependent variable and the data model.

r squared is a number between zero and 100%.

Where 100% would indicate a perfect fit.

Of course, in the real world, a 100% fit is next to impossible.

You'll judge The accuracy of your model or assumption within the context of the data you're working with, how can you add a regression line to a panda's chart? There are, as always, many ways to go about it, I like simple and the O LS trend line approach is about as simple as it gets, just add a trendline argument to the code we've already been using.

That's it.

Oh LS By the way, stands for ordinary least squares, which is a, which is a type of linear regression.

And here's how it looked with our regression line.

When I hover over the line, I'm showing an R squared value of 0.550451.

Or in other words around 35%.

For our purposes, I consider that a pretty strong correlation.

A histogram is a plotting tool that breaks your data down into bins. A bin is actually an approximation of a statistically appropriate interval between sets of your data bins attempt to guess at the probability density function PDF, that will best represent the values you're actually using.

But they may not display exactly the way you think, especially when you use a default value.

I'll illustrate how this works or actually how it doesn't work.

Using data from the do birthdays make elite athletes chapter on the website.

As you can see over there, I'd scraped the semi official NHL API for the birth dates of around 1100 current NHL players.

My goal was to visualize the distribution of their birth dates across all 12 months to see if their births were concentrated within a specific yearly season.

When I display the data using a histogram, we didn't see the pattern we'd expected.

In fact, the pattern wasn't truly representative of the real world.

That's because histograms are great for showing frequency distributions by grouping data points together into bins.

This can help us quickly visualize the state of a very large data set where granular precision will get in the way, but it can be misleading for use cases like ours.

Since we were looking for a literal mapping of events to calendar dates.

Even setting the bin amount to 12 to match the number of months won't help, because a histogram won't necessarily stick to those exact borders.

What we really need here is a plain old bar graph that incorporates our value counts numbers, I'll pipe the results of value counts to a data frame called df one, and then plot that as a simple bar graph.

In the next module, we're going to talk about understanding our data visualizations, and integrating what we see in our Jupyter Notebooks with stuff that happens out there.

In the real world.

Stay tuned.

We're supposed to be doing data analytics here, such as staring at pretty graphs probably isn't the whole point.

The CPI and wages data sets we plotted in the previous chapter, for instance, showed us a clear general correlation.

But there were some visually recognizable anomalies.

Unless we can connect those anomalies with historical events, and explain them in a historical context, we won't be getting the full value from our data.

But even before going there, we should confirm that our plots actually make sense in the context of their data sources.

Working with our BLS examples, let's look at graphs to compare CPI and wages data from both before and after our manipulation.

That way, we can be sure that our math and particularly our fake math didn't skew things too badly.

Here's what our CPI data look like when plotted using the raw data.

It's certainly a busy graph, but you can clearly see the gentle upward slope punctuated by a handful of sudden jumps.

Next, we'll see that same data after removing three out of every four months data points, the same ups and downs are still visible.

Given our overall goals.

I'd categorize our transformation as a success.

Now, how about the wages data here, because we move from percentages to currency, the transformation was more intrusive, and the risks of misrepresentation were greater.

We'll also need to take into account the way a percentage will display differently from an absolute value.

Here's the original data.

Note how there's no consistent curve, either upwards or downwards.

That's because we're measuring the rate of growth as it took place within each individual quarter, not the growth itself.

Now compare that with this line graph of that wage data now converted to currency based values.

The gentle curve, you see makes some sense, it's about real growth after all, not growth rates, but it's also possible to recognize a few spots where the curve steepens and others where it smooths out a bit more.

Why are the slopes so smooth in comparison with the percentage based data? Look at the y axis labels.

The index graph is measured in points between 180 and 280.

While the percentage graph goes from zero to 3.5.

It's the scale that's different.

All in all, I believe we're safe concluding that what we produced is a good match with our source data.

establishing some kind of historical context for your data will require looking for anomalies and associating them with known historical events.

That's something I do at length in the wages and CPI Reality Check chapter on the website.

If you're interested, I'm sure you'll work through that material on your own.

But I think you've seen enough here to get a picture of how plotting the right visualization can be helpful.

But that brings us to the end of this particular course, as I've mentioned, a number of times already, the full curriculum is available on my the data project dotnet site, and you're more than welcome to join all the cool kids over there and be in touch if you've got something to add to the conversation.

The main thing is to realize that the end of this course isn't anywhere near the end of your data analytics education.

watching me calmly execute nice, clean code samples isn't really learning.

Unless you're a very special breed of genius, you won't begin to understand how all this really works.

Until you dive in and work things through yourself.

I say worth things through.

But what I really mean is not worth things through because it's mistakes and frustration that are the best teachers.

Don't imagine that my Python code just came to exist on a quiet afternoon.

While I was sipping nice hot coffee.

First of all, I don't drink coffee.

But more to the point, there was nothing quiet about it.

There were humiliating failures, reformulations start overs and countless trips to stack overflow before things began to take shape.

But the more problems I faced and overcame, the deeper the process sank into my mind, and the better I got at it.

And so will you just be prepared for tough times ahead.

Before you all run off and get on with your day, let's spend a moment or two reviewing everything we saw here.

We spoke about the many ways you can work with Jupyter notebooks, including online platforms like Google's Collaboratory, and locally hosting either Jupiter lab or plastic notebooks within introduced ourselves to the Jupiter environment, learning about cells kernels and the operating environment.

We saw how we can find data through public API's and how to integrate API credentials into our Python environment.

Python libraries and modules were our next focus, including how to import appropriate libraries to allow us to effectively clean and manipulate our data.

And finally, turning to some actual data analytics.

We learned some basics of plotting, including working with scatter plots, regression lines and histograms.

And we closed out the chorus with a quick discussion of how to use our data visualizations to integrate our insights with the real world.

I hope this has been helpful for you and I invite you to check out some of my other content on my main website.

Take care.