After over 80+ hours of watching course videos, doing quizzes and assignments, reading reviews on various aggregators and forums, I’ve narrowed down the best data science courses available to the list below.
The best data science courses:
- Data Science Specialization — JHU @ Coursera
- Introduction to Data Science — Metis
- Applied Data Science with Python Specialization — UMich @ Coursera
- Statistics and Data Science MicroMasters — MIT @ edX
- CS109 Data Science — Harvard
- Python for Data Science and Machine Learning Bootcamp — Udemy
The selections here are geared more towards individuals getting started in data science, so I’ve filtered courses based on the following criteria:
- The course goes over the entire data science process
- The course uses popular open-source programming tools and libraries
- The instructors cover the basic, most popular machine learning algorithms
- The course has a good combination of theory and application
- The course needs to either be on-demand or available every month or so
- There’s hands-on assignments and projects
- The instructors are engaging and personable
- The course has excellent ratings — generally, greater than or equal to 4.5/5
There’s a lot more data science courses than when I first started this page four years ago, and so there needs to now be a substantial filter to determine which courses are the best. I hope you feel confident that the courses below are truly worth your time and effort, because it will take several months (or more) of learning and practice to be a data science practitioner.
In addition to the top general data science course picks, I have included a separate section for more specific data science interests, like Deep Learning, SQL, and other relevant topics. These are courses with a more specialized approach, and don’t cover the whole data science process, but they are still the top choices for that topic. These extra picks are good for supplementing before, after, and during the main courses.
Resources you should use when learning
When learning data science online it’s important to not only get an intuitive understanding of what you’re actually doing, but also to get sufficient practice using data science on unique problems.
In addition to the courses listed below, I would suggest reading two books:
- Introduction to Statistical Learning — available for Free — one of the most widely recommended books for beginners in data science. Explains the fundamentals of machine learning and how everything works behind the scenes
- Applied Predictive Modeling — a breakdown of the entire modeling process on real-world datasets with incredibly useful tips each step of the way
These two textbooks are incredibly valuable and provide a much better foundation than just taking courses alone. The first book is incredibly effective at teaching the intuition behind much of the data science process, and if you are able to understand almost everything in there, then you’re more well off than most entry-level data scientists.
Use Video Speed Controller for Chrome to speed up any video. I usually choose between 1.5x — 2.5x speed depending on the content, and use the “s” (slow down) and “d” (speed up) key shortcuts that come with the extension.
Now to an overview and review of each course.
1. Data Science Specialization — JHU @ Coursera
This course series is one of the most enrolled in and highly rated course collections in this list. JHU did an incredible job with the balance of breadth and depth in the curriculum. One thing that’s included in this series that’s usually missing from many of data science courses is a complete section on statistics, which is the backbone to data science.
Overall, the Data Science specialization is an ideal mix of theory and application using the R programming language. As far as prerequisites go, you should have some programming experience (doesn’t have to be R) and you have a good understanding of Algebra. Previous knowledge of Linear Algebra and/or Calculus isn’t necessary, but it is helpful.
Price — Free or $49/month for certificate and graded materials
Provider — Johns Hopkins University
- The Data Scientist’s Toolbox
- R Programming
- Getting and Cleaning Data
- Exploratory Data Analysis
- Reproducible Research
- Statistical Inference
- Regression Models
- Practical Machine Learning
- Developing Data Products
- Data Science Capstone
If you’re rusty with statistics and/or want to learn more R first, check out the Statistics with R Specialization as well.
2. Introduction to Data Science — Metis
An extremely highly rated course — 4.9/5 on SwichUp and 4.8/5 on CourseReport — which is taught live by a data scientist from a top company. This is a six week long data science course that covers everything in the entire data science process, and it’s the only live online course in this list. Furthermore, not only will you get a certificate upon completion, but since this course also accredited, you’ll also receive continuing education units.
Two nights per week, you’ll join the instructor with other students to learn data science as if it was an online college course. Not only are you able to ask questions, but the instructor also spends extra time for office hours to further help those students that might be struggling.
Price — $750
- Computer Science, Statistics, Linear Algebra Short Course
- Exploratory Data Analysis and Visualization
- Data Modeling: Supervised/Unsupervised Learning and Model Evaluation
- Data Modeling: Feature Selection, Engineering, and Data Pipelines
- Data Modeling: Advanced Supervised/Unsupervised Learning
- Data Modeling: Advanced Model Evaluation and Data Pipelines | Presentations
For prerequisites, you’ll need to know Python, some linear algebra, and some basic statistics. If you need to work on any of these areas, Metis also has Beginner Python and Math for Data Science, a separate live online course just for learning the Python, Stats, Probability, Linear Algebra, and Calculus for data science.
3. Applied Data Science with Python Specialization — UMich @ Coursera
University of Michigan, who also launched an online data science Master’s degree, produce this fantastic specialization focused the applied side of data science. This means you’ll get a strong introduction to commonly used data science Python libraries, like matplotlib, pandas, nltk, scikit-learn, and networkx, and learn how to use them on real data.
This series doesn’t include the statistics needed for data science or the derivations of various machine learning algorithms, but does provide a comprehensive breakdown of how to use and evaluate those algorithms in Python. Because of this, I think this would be more appropriate for someone that already knows R and/or is learning the statistical concepts elsewhere.
If you’re rusty with statistics, consider the Statistics with Python Specialization first. You’ll learn many of the most important statistical skills needed for data science.
Price — Free or $49/month for certificate and graded materials
Provider — University of Michigan
- Introduction to Data Science in Python
- Applied Plotting, Charting & Data Representation in Python
- Applied Machine Learning in Python
- Applied Text Mining in Python
- Applied Social Network Analysis in Python
To take these courses, you’ll need to know some Python or programming in general, and there are actually a couple of great lectures in the first course dealing with some of the more advanced Python features you’ll need to process data effectively.
Dataquest is a fantastic resource on its own, but even if you take other courses on this list, Dataquest serves as a superb complement to your online learning.
Dataquest foregoes video lessons and instead teaches through an interactive textbook of sorts. Every topic in the data science track is accompanied by several in-browser, interactive coding steps that guide you through applying the exact topic you’re learning.
Video-based learning is more “passive” — it’s very easy to think you understand a concept after watching a 2-hour long video, only to freeze up when you actually have to put what you’ve learned in action. — Dataquest FAQ
To me, Dataquest stands out from the rest of the interactive platforms because the curriculum is very well organized, you get to learn by working on full-fledged data science projects, and there’s a super active and helpful Slack community where you can ask questions.
The platform has one main data science learning curriculum for Python:
Data Scientist In Python Path
This track currently contains 31 courses, which cover everything from the very basics of Python, to Statistics, to the math for Machine Learning, to Deep Learning, and more. The curriculum is constantly being improved and updated for a better learning experience.
Price — 1/3 of content is Free, $29/month for Basic, $49/month for Premium
Here’s a condensed version of the curriculum:
- Python — Basic to Advanced
- Python data science libraries — Pandas, NumPy, Matplotlib, and more
- Visualization and Storytelling
- Effective data cleaning and exploratory data analysis
- Command line and Git for data science
- SQL — Basic to Advanced
- APIs and Web Scraping
- Probability and Statistics — Basic to Intermediate
- Math for Machine Learning — Linear Algebra and Calculus
- Machine Learning with Python — Regression, K-Means, Decision Trees, Deep Learning and more
- Natural Language Processing
- Spark and Map-Reduce
Additionally, there’s also entire data science projects scattered throughout the curriculum. Each project’s goal is to get you to apply everything you’ve learned up to that point and to get you familiar with what it’s like to work on an end-to-end data science strategy.
Lastly, if you’re more interested in learning data science with R, then definitely check out Dataquest’s new Data Analyst in R path. The Dataquest subscription gives you access to all paths on their platform, so you can learn R or Python (or both!).
5. Statistics and Data Science MicroMasters — MIT @ edX
MicroMasters from edX are advanced, graduate-level courses that carry real credits you can apply to a select number of graduate degrees. The inclusion of probability and statistics courses makes this series from MIT a very well-rounded curriculum for being able to understand data intuitively.
Due to its advanced nature, you should have experience with single and multivariate calculus, as well as Python programming. There isn’t any introduction to Python or R like in some of the other courses in this list, so before starting the ML portion, they recommend taking Introduction to Computer Science and Programming Using Python to get familiar with Python.
Price — Free or $1,350 for credential and graded materials
Provider — University of Michigan
- Probability — The Science of Uncertainty and Data
- Data Analysis in Social Science — Assessing Your Knowledge
- Fundamentals of Statistics
- Machine Learning with Python: from Linear Models to Deep Learning
- Capstone Exam in Statistics and Data Science
The ML course has several interesting projects you’ll work on, and at the end of the whole series you’ll focus on one exam to wrap everything up.
6. CS109 Data Science — Harvard
With a great mix of theory and application, this course from Harvard is one of the best for getting started as a beginner. It’s not on an interactive platform, like Coursera or edX, and doesn’t offer any sort of certification, but it’s definitely worth your time and it’s totally free.
- Web Scraping, Regular Expressions, Data Reshaping, Data Cleanup, Pandas
- Exploratory Data Analysis
- Pandas, SQL and the Grammar of Data
- Statistical Models
- Storytelling and Effective Communication
- Bias and Regression
- Classification, kNN, Cross Validation, Dimensionality Reduction, PCA, MDS
- SVM, Evaluation, Decision Trees and Random Forests, Ensemble Methods, Best Practices
- Recommendations, MapReduce, Spark
- Bayes Theorem, Bayesian Methods, Text Data
- Effective Presentations
- Experimental Design
- Deep Networks
- Building Data Science
Python is used in this course, and there’s many lectures going through the intricacies of the various data science libraries to work through real-world, interesting problems. This is one of the only data science courses around that actually touches on every part of the data science process.
A very reasonably priced course for the value. The instructor does an outstanding job explaining the Python, visualization, and statistical learning concepts needed for all data science projects. A huge benefit to this course over other Udemy courses are the assignments. Throughout the course you’ll break away and work on Jupyter notebook workbooks to solidify your understanding, then the instructor follows up with a solutions video to thoroughly explain each part.
- Python Crash Course
- Python for Data Analysis — Numpy, Pandas
- Python for Data Visualization — Matplotlib, Seaborn, Plotly, Cufflinks, Geographic plotting
- Data Capstone Project
- Machine learning — Regression, kNN, Trees and Forests, SVM, K-Means, PCA
- Recommender Systems
- Natural Language Processing
- Big Data and Spark
- Neural Nets and Deep Learning
This course focuses more on the applied side, and one thing missing is a section on statistics. If you plan on taking this course it would be a good idea to pair it with a separate statistics and probability course as well.
An honorary mention goes out to another Udemy course: Data Science A-Z. I do like Data Science A-Z quite a bit due to its complete coverage, but since it uses other tools outside of the Python/R ecosystem, I don’t think it fits the criteria as well as Python for Data Science and Machine Learning Bootcamp.
Other top data science courses for specific skills
Deep Learning Specialization — Coursera
Created by Andrew Ng, maker of the famous Stanford Machine Learning course, this is one of the highest rated data science courses on the internet. This course series is for those interested in understanding and working with neural networks in Python.
Mathematics for Machine Learning — Coursera
This is one of the most highly rated courses dedicated to the specific mathematics used in ML. Take this course if you’re uncomfortable with the linear algebra and calculus required for machine learning, and you’ll save some time over other, more generic math courses.
How to Win a Data Science Competition — Coursera
One of the courses in the Advanced Machine Learning Specialization. Even if you’re not looking to participate in data science competitions, this is still an excellent course for bringing together everything you’ve learned up to this point. This is more of an advanced course that teaches you the intuition behind why you should pick certain ML algorithms, and even goes over many of the algorithms that have been winning competitions lately.
Bayesian Statistics: From Concept to Data Analysis — Coursera
Bayesian, as opposed to Frequentist, statistics is an important subject to learn for data science. Many of us learned Frequentist statistics in college without even knowing it, and this course does a great job comparing and contrasting the two to make it easier to understand the Bayesian approach to data analysis.
Spark and Python for Big Data with PySpark — Udemy
From the same instructor as the Python for Data Science and Machine Learning Bootcamp in the list above, this course teaches you how to leverage Spark and Python to perform data analysis and machine learning on an AWS cluster. The instructor makes this course really fun and engaging by giving you mock consulting projects to work on, then going through a complete walkthrough of the solution.
How to actually learn data science
When joining any of these courses you should make the same commitment to learning as you would towards a college course. One goal for learning data science online is to maximize mental discomfort. It’s easy to get caught in the habit of signing in to watch a few videos and feel like you’re learning, but you’re not really learning much unless it hurts your brain.
Vik Paruchuri (from Dataquest) produced this helpful video on how to approach learning data science effectively:
Essentially, it comes down to doing what you’re learning, i.e. when you take a course and learn a skill, apply it to a real project immediately. Working through real-world projects that you are genuinely interested in helps solidify your understanding and provides you with proof that you know what you’re doing.
One of the most uncomfortable things about learning data science online is that you never really know when you’ve learned enough. Unlike in a formal school environment, when learning online you don’t have many good barometers for success, like passing or failing tests or entire courses. Projects help remediate this by first showing you what you don’t know, and then serving as a record of knowledge when it’s done.
All in all, the project should be the main focus, and courses and books should supplement that.
When I first started learning data science and machine learning, I began (as a lot do) by trying to predict stocks. I found courses, books, and papers that taught the things I wanted to know, and then I applied them to my project as I was learning. I learned so much in a such short period of time that it seems like an improbable feat if laid out as a curriculum.
It turned out to be extremely powerful working on something I was passionate about. It was easy to work hard and learn nonstop because predicting the market was something I really wanted to accomplish.
Essential knowledge and skills
There’s a base skill set and level of knowledge that all data scientists must possess, regardless of what industry they’re in. For hard skills, you not only need to be proficient with the mathematics of data science, but you also need the skills and intuition to understand data.
The Mathematics you should be comfortable with:
- Statistics (Frequentist and Bayesian)
- Linear Algebra
- Basic calculus
Furthermore, these are the basic programming skills you should be comfortable with:
- Python or R,
- Extracting data from various sources, like SQL databases, JSON, CSV, XML, and text files
- Cleaning and transforming unstructured, messy data
- Effective Data visualization
- Machine learning — Regression, Clustering, kNN, SVM, Trees and Forests, Ensembles, Naive Bayes
Lastly, it’s not all about the hard skills; there’s also many soft skills that are extremely important and many of them aren’t taught in courses. These are:
- Curiosity and creativity
- Communication skills — speaking and presenting in front of groups, and being able to explain complex topics to non-technical team members
- Problem solving — coming up with analytical solutions for business problems
Python vs. R
After going through the list you might have noticed that each course is dedicated to one language: Python or R. So which one should you learn?
Short answer: just learn Python, or learn both.
Python is an incredibly versatile language, and it has a huge amount of support in data science, machine learning, and statistics. Not only that, but you can also do things like build web apps, automate tasks, scrape the web, create GUIs, build a blockchain, and create games.
Because Python can do so many things, I think it should be the language you choose. Ultimately, it doesn’t matter that much which language you choose for data science since you’ll find many jobs looking for either. So why not pick the language that can do almost anything?
In the long run, though, I think learning R is also very useful since many statistics/ML textbooks use R for examples and exercises. In fact, both books I mentioned at the beginning use R, and unless someone translates everything to Python and posts it to Github, you won’t get the full benefit of the book. Once you learn Python, you’ll be able to learn R pretty easily.
Check out this StackExchange answer for a great breakdown of how the two languages differ in machine learning.
Are certificates worth it?
One big difference between Udemy and other platforms, like edX, Coursera, and Metis, is that the latter offer certificates upon completion and are usually taught by instructors from universities.
Some certificates, like those from edX and Metis, even carry continuing education credits. Other than that, many of the real benefits, like accessing graded homework and tests, are only accessible if you upgrade. If you need to stay motivated to complete the entire course, committing to a certificate also puts money on the line so you’ll be less likely to quit. I think there’s definitely personal value in certificates, but, unfortunately, not many employers value them that much.
Coursera and edX vs. Udemy
Udemy does not currently have a way to offer certificates, so I generally find Udemy courses to be good for more applied learning material, whereas Coursera and edX are usually better for theory and foundational material.
Whenever I’m looking for a course about a specific tool, whether it be Spark, Hadoop, Postgres, or Flask web apps, I tend to search Udemy first since the courses favor an actionable, applied approach. Conversely, when I need an intuitive understanding of a subject, like NLP, Deep Learning, or Bayesian Statistics, I’ll search edX and Coursera first.
Data science is vast, interesting, and rewarding field to study and be a part of. You’ll need many skills, a wide range of knowledge, and a passion for data to become an effective data scientist that companies want to hire, and it’ll take longer than the hyped up YouTube videos claim.
If you’re more interested in the machine learning side of data science, check out the Top 5 Machine Learning Courses for 2019 as a supplement to this article.
If you have any questions or suggestions, feel free to leave them in the comments below.
Thanks for reading and have fun learning!
Originally published at learndatasci.com.