by David Venturi
If you want to learn Data Science, start with one of these programming classes
A year ago, I was a numbers geek with no coding background. After trying an online programming course, I was so inspired that I enrolled in one of the best computer science programs in Canada.
Two weeks later, I realized that I could learn everything I needed through edX, Coursera, and Udacity instead. So I dropped out.
The decision was not difficult. I could learn the content I wanted to faster, more efficiently, and for a fraction of the cost.
I already had a university degree and, perhaps more importantly, I already had the university experience. Paying $30K+ to go back to school seemed irresponsible.
I started creating my own data science master’s degree using online courses shortly afterwards, after realizing it was a better fit for me than computer science. I scoured the introduction to programming landscape. I’ve already taken several courses and audited portions of many others. I know the options, and what skills are needed if you’re targeting a data analyst or data scientist role.
For this guide, I spent 20+ hours trying to find every single online introduction to programming course offered as of August 2016, extracting key bits of information from their syllabi and reviews, and compiling their ratings. For this task, I turned to none other than the open source Class Central community and its database of thousands of course ratings and reviews.
Since 2011, Class Central founder Dhawal Shah has kept a closer eye on online courses than arguably anyone else in the world. Dhawal personally helped me assemble this list of resources.
How we picked courses to consider
Each course had to fit four criteria:
- It introduces programming and, optionally, computer science. See “A note on Programming vs. Computer Science” below.
- The language of instruction is Python or R. These are by far the two most popular programming languages used in data science.
- It must be an interactive online course, so no books or text-based tutorials. Regarding the latter, Codecademy’s video-less and text editor-based courses would qualify, but strict text tutorials like the ones from R tutorial would not. Though books are viable ways to learn programming, Python, and R, this guide focuses on courses.
- It must be a decent length: at least ten hours in total for estimated completion.
How we evaluated courses
We believe we covered every notable course that exists and which fits the above criteria. Since there are seemingly hundreds of courses on Udemy in Python and R, we chose to consider the most reviewed and highest rated ones only. There is a chance we missed something, however. Please let us know if you think that is the case.
We compiled average rating and number of reviews from Class Central and other review sites. We calculated a weighted average rating for each course. If a series had multiple courses (like Rice University’s Part 1 and Part 2), we calculated the weighted average rating across all courses. We also read text reviews and used this feedback to supplement the numerical ratings.
We made subjective syllabus judgment calls based on three factors:
- Coverage of the fundamentals of programming.
- Coverage of more advanced, but useful, topics in programming. (E.g. several courses choose to not cover object-oriented programming. We believe this is a key topic, though not a deal-breaker, hence these courses only being docked marks and not excluded from consideration.)
- How much of the syllabus is relevant to data science?
A note on Programming vs. Computer Science
Programming is not computer science and vice versa. There is a difference of which beginners may not be acutely aware. Borrowing this answer from Programmers Stack Exchange:
Computer science is the study of what computers [can] do; programming is the practice of making computers do things.
The course we are looking for introduces programming and optionally touches on relevant aspects of computer science that would benefit a new programmer in terms of awareness. Many of the courses considered, you’ll notice, do indeed have a computer science portion.
None of the courses, however, are strictly computer science courses, which is why something like Harvard’s CS50x on edX is excluded.
Our pick for the best programming course for data scientists is…
University of Toronto’s “Learn to Program” series on Coursera. LTP1: The Fundamentals and LTP2: Crafting Quality Code have a near-perfect weighted average rating of 4.71 out of 5 stars over 284 reviews. They also have a great mix of content difficulty and scope for the beginner data scientist.
This free, Python-based introduction to programming sets itself apart from the other 20+ courses we considered.
Jennifer Campbell and Paul Gries, two associate professors in the University of Toronto’s department of computer science (which is regarded as one of the best in the world) teach the series. The self-paced, self-contained Coursera courses match the material in their book, “Practical Programming: An Introduction to Computer Science Using Python 3.” LTP1 covers 40–50% of the book and LTP2 covers another 40%. The 10–20% not covered is not particularly useful for data science, which helped their case for being our pick.
The professors kindly and promptly sent me detailed course syllabi upon request, which were difficult to find online prior to the course’s official restart in September 2016.
Learn to Program: The Fundamentals (LTP1)
Timeline: 7 weeks
Estimated time commitment: 6–8 hours per week
This course provides an introduction to computer programming intended for people with no programming experience. It covers the basics of programming in Python including elementary data types (numeric types, strings, lists, dictionaries, and files), control flow, functions, objects, methods, fields, and mutability.
- Installing Python, IDLE, mathematical expressions, variables, assignment statement, calling and defining functions, syntax, and semantic errors.
- Strings, input/output, function reuse, function design recipe, and docstrings.
- Booleans, import, namespaces, and if statements.
- For loops and fancy string manipulation.
- While loops, lists, and mutability.
- For loops over indices, parallel lists and strings, and files.
- Tuples and dictionaries.
Learn to Program: Crafting Quality Code (LTP2)
Timeline: 5 weeks
Estimated time commitment: 6–8 hours per week
You know the basics of programming in Python: elementary data types (numeric types, strings, lists, dictionaries, and files), control flow, functions, objects, methods, fields, and mutability. You need to be good at these in order to succeed in this course.
LTP: Crafting Quality Code covers the next steps: designing larger programs, testing your code so that you know it works, reading code in order to understand how efficient it is, and creating your own types.
- Designing algorithms: how do you decide what to do in a function body? How do you figure out what functions to write in the first place?
- Automated testing: doctest and unittest.
- Analyzing code for speed — details of searching and sorting.
- Creating new types: classes in Python.
- Functions as arguments, default parameter values, and exceptions.
Associate professor Gries also provided the following commentary on the course structure: “Each module has between about 45 minutes to a bit more than an hour of video. There are in-video quiz questions, which will bring the total time spent studying the videos to perhaps 2 hours.”
These videos are generally shorter than ten minutes each.
He continued: “In addition, we have one exercise (a dozen or two or so multiple choice and short-answer questions) per module, which should take an hour or two. There are three programming assignments in LTP1, each of which might take four to eight hours of work. There are two programming assignments in LTP2 of similar size.”
He emphasized that the estimate of 6–8 hours per week is a rough guess: “Estimating time spent is incredibly student-dependent, so please take my estimates in that context. For example, someone who knows a bit of programming, perhaps in another programming language, might take half the time of someone completely new to programming. Sometimes someone will get stuck on a concept for a couple of hours, while they might breeze through on other concepts … That’s one of the reasons the self-paced format is so appealing to us.”
In total, the University of Toronto’s Learn to Program series runs an estimated 12 weeks at 6–8 hours per week, which is about standard for most online courses created by universities. If you prefer to binge-study your MOOCs, that’s 72–96 hours, which could feasibly be completed in two to three weeks, especially if you have a bit of programming experience.
Another great Python option
If you already have some familiarity with programming, and don’t mind a syllabus that has a notable skew towards games and interactive applications, I would also recommend Rice University’s An Introduction to Interactive Programming in Python (Part 1 and Part 2) on Coursera.
With 6,000+ reviews and the highest weighted average rating of 4.93/5 stars, this popular course is noted for its engaging videos, challenging quizzes, and enjoyable mini projects. It’s slightly more difficult, and focuses less on the fundamentals and more on topics that aren’t applicable in data science than our #1 pick.
These courses are also part of the 7 course Principles in Computing Specialization on Coursera.
The materials are self-paced and free, and a paid certificate is available. The course must be purchased for $79 (USD) for access to graded materials.
The condensed course description and full syllabus are as follows:
“This two-part course is designed to help students with very little or no computing background learn the basics of building simple interactive applications … To make learning Python easy, we have developed a new browser-based programming environment that makes developing interactive applications in Python simple. These applications will involve windows whose contents are graphical and respond to buttons, the keyboard, and the mouse.
Recommended background: A knowledge of high school mathematics is required. While the class is designed for students with no prior programming experience, some beginning programmers have viewed the class as being fast-paced. For students interested in some light preparation prior to the start of class, we recommend a self-paced Python learning site such as codecademy.com.”
Timeline: 5 weeks
Estimated time commitment: 7–10 hours per week
Week 0 — statements, expressions, variables
Understand the structure of this class, and explore Python as a calculator.
Week 1 — functions, logic, conditionals
Learn the basic constructs of Python programming, and create a program that plays a variant of Rock-Paper-Scissors.
Week 2 — event-driven programming, local/global variables
Learn the basics of event-driven programming, understand the difference between local and global variables, and create an interactive program that plays a simple guessing game.
Week 3 — canvas, drawing, timers
Create a canvas in Python, learn how to draw on the canvas, and create a digital stopwatch.
Week 4 — lists, keyboard input, the basics of modeling motion
Learn the basics of lists in Python, model moving objects in Python, and recreate the classic arcade game “Pong.”
Week 5 — mouse input, list methods, dictionaries
Read mouse input, learn about list methods and dictionaries, and draw images.
Week 6 — classes and object-oriented programming
Learn the basics of object-oriented programming in Python using classes, and work with tiled images.
Week 7 — basic game physics, sprites
Understand the math of acceleration and friction, work with sprites, and add sound to your game.
Week 8 — sets and animation
Learn about sets in Python, compute collisions between sprites, and animate sprites.
If you are set on R
If you are set on an introduction to programming course in R, we recommend DataCamp’s series of R courses: Introduction to R, Intermediate R, Intermediate R — Practice, and Writing Functions in R. Though the latter three come at a price point of $25/month, DataCamp is best in category for covering the programming fundamentals and R-specific topics, which is reflected in its average rating of 4.29/5 stars.
We believe the best approach to learning programming for data science using online courses is to do it first through Python. Why? There is a lack of MOOC options that teach core programming principles and use R as the language of instruction. We found six such R courses that fit our testing criteria, compared to twenty-two Python-based courses. Most of the R courses didn’t receive great ratings and failed to meet most of our subjective testing criteria.
The series breakdown is as follows:
Introduction to R
Estimated time commitment: 4 hours
- Intro to basics
- Data frames
Estimated time commitment: 6 hours
- Conditionals and control flow
- The apply family
Intermediate R — Practice
Estimated time commitment: 4 hours
This follow-up course on intermediate R does not cover new programming concepts. Instead, you will strengthen your knowledge of the topics in intermediate R with a bunch of new and fun exercises.
Writing Functions in R
Estimated time commitment: 4 hours
- A quick refresher
- When and how you should write a function
- Functional programming
- Advanced inputs and output
- Robust functions
Another option for R would be to take a Python-based introduction to programming course to cover the fundamentals of programming, and then pick up R syntax with an R basics course. This is what I did, but I did it with Udacity’s Data Analysis with R. It worked well for me.
You can also pick up R with our top recommendation for a statistics class, which teaches the basics of R through coding up stats problems.
Our #1 and #2 picks had a 4.71 and 4.93 star weighted average rating over 284 and 6,069 reviews, respectively. Let’s look at the other alternatives.
Python courses (descending weighted average ratings)
- Programming for Everybody (Getting Started with Python) and Python Data Structures (University of Michigan/Coursera): another great option. It has a great teacher (Dr. Charles “Chuck” Severance), as well. This series came close to usurping our #1 pick because it matched it in rating and in most of the subjective criteria. This course is more gentle, however, with reviewers noting that it might not prepare you as well as other options. Dr. Chuck himself noted that this course is a bridge to more advanced programming courses: “I would suggest that after students complete my Python course, if they are interested in more programming, that they would take the Rice course.” We also felt that the reviews for our #1 pick were more enthusiastic. It has a 4.8-star weighted average rating over 4,800+ reviews.
- Python A-Z: Python For Data Science With Real Exercises (Udemy): it costs money, and has a 4.7-star weighted average rating over 52 reviews.
- Automate the Boring Stuff with Python Programming (Udemy): it costs money, and has a 4.6-star weighted average rating over 2,000+ reviews.
- Python for Beginners: From Noob to Expert in 22+ Hours (Udemy): it costs money, and has a 4.6-star weighted average rating over 240 reviews.
- Introduction to Computer Science and Programming Using Python (MIT/edX): another good option. It has 4.5-star weighted average rating over 240 reviews.
- Complete Python Bootcamp (Udemy): it costs money, and has a 4.5-star weighted average rating over 4,700+ reviews.
- Treehouse’s Python series (9 courses): it costs money. It’s a popular option, but there are not enough reviews to make a value judgment. It has a 4.5-star weighted average rating over 5 reviews.
- Python (Codecademy): video-less, text editor-based, interactive course. It has a 4.5-star weighted average rating over 20 reviews.
- Introduction to Python for Data Science (Microsoft/edX): it has a 4.47-star weighted average rating over 360 reviews.
- Intro to Programming Nanodegree (Udacity): it has a notable focus on web development. It’s a great option for someone who doesn’t know what type of programming they want to do. It has a 4.4-star weighted average rating over 730 reviews. Note that it contains the first half of Udacity’s popular “Intro to Computer Science” course, which doesn’t fit our inclusion criteria.
- CS For All: Introduction to Computer Science and Python Programming (Harvey Mudd College/edX): it has very few reviews, and a 4.33-star weighted average rating over 6 reviews.
- Programming Foundations with Python (Udacity): doesn’t cover the fundamentals. It has a 4-star weighted average rating over 7 reviews.
- Learn to Program Using Python (edX/University of Texas Arlington): it has a 4-star weighted average rating over 14 reviews.
- Learn to Code for Data Analysis (The Open University/FutureLearn): it has a 3.5-star weighted average rating over 2 reviews.
- DataCamp’s Python series (3 courses): it has no reviews on the two major course review sites, but DataCamp is a popular option.
- SoloLearn’s Python 3 Tutorial: it has no reviews, but has a comprehensive curriculum and a dedicated fanbase.
- Dataquest’s Python series (3 courses): it has no reviews, but has a comprehensive curriculum and an outspoken fanbase.
R courses (descending weighted average ratings)
- R Programming A-Z™: R For Data Science With Real Exercises! (Udemy): costs money. It doesn’t offer as much bang for your buck as our #1 R offering. Ratings are similar, considering sample size. It has a 4.7-star weighted average rating over 785 reviews.
- Introduction to R for Data Science (Microsoft/edX): not as much depth as DataCamp’s offering. It has a 4.48-star weighted average rating over 500 reviews.
- R Programming (Johns Hopkins University/Coursera): doesn’t sufficiently cover the basics of programming. Reviewers note that it is difficult, and not in a good way. It has a 4.04-star weighted average rating over 900+ reviews, despite a 2.5-star rating over 212 reviews on Class Central.
- TryR (CodeSchool): it’s not long enough to fit testing criteria, and doesn’t sufficiently cover programming fundamentals. It has a 4-star weighted average rating over 260 reviews.
- Programming with R for Data Science (Microsoft/edX): more of an introduction to the R language rather than programming. The course site states, “If you have some programming experience, and would like to learn more about R, then you’re at the right place.” It has a 3-star weighted average rating over 12 reviews.
Wrapping it Up
This is the first of a six-piece series that covers the best MOOCs for launching yourself into the data science field. It will cover several other data science core competencies: statistics, the data science process, data visualization, and machine learning.
If you want to learn Data Science, take a few of these statistics classes
A comprehensive guide to online statistics and probability courses.medium.freecodecamp.comI ranked every Intro to Data Science course on the internet, based on thousands of data points
A comprehensive guide to online intro to data science courses.medium.freecodecamp.com
The final piece will be a summary of those courses, and the best MOOCs for other key topics such as data wrangling, databases, and even software engineering.
If you’re looking for a complete list of Data Science MOOCs, you can find them on Class Central’s Data Science and Big Data subject page.
If you enjoyed reading this, check out some of Class Central’s other pieces:
Here are 250 Ivy League courses you can take online right now for free
250 MOOCs from Brown, Columbia, Cornell, Dartmouth, Harvard, Penn, Princeton, and Yale.medium.freecodecamp.comThe 50 best free online university courses according to data
When I launched Class Central back in November 2011, there were around 18 or so free online courses, and almost all of…medium.freecodecamp.com
If you have suggestions for courses I missed, let me know in the responses!
If you found this helpful, click the ? so more people will see it here on Medium.
This is a condensed version of the original article published on Class Central, where course descriptions, syllabi, and multiple reviews are included.