The field of machine learning is becoming more and more mainstream every year. With this growth come many libraries and tools to abstract away some of the most difficult concepts to implement for people starting out.
Most people will say you need a higher level degree in ML to work in the industry. If you love working with data and practical math, then I would say this is not true. I did not graduate college with a Machine Learning or data degree yet I am working with ML right now at a startup. I want to share what I used to learn and how I got here in hopes that it will help someone else.
I knew Python already when I started, but, if you don’t, I recommend learning basic and intermediate Python first. The language is pretty easy to learn compared to others. Python is also home to the largest data science/ML community so there are tons of tools to help as you learn.
With that out of the way, the first thing you should do is download “The Machine Learning Podcast” by OCDevel (overcast.fm, iTunes) into your favorite podcast app. Listen to the first 10–15 episodes. They are very good at giving an overview of the machine learning ecosystem and there are also recommended resources which are linked on the OCDevel site.
Anaconda & Jupyter Notebook — These are a must for ML & data science. Follow the instructions here to install and set them up.
Visual Studio Code with Python Plugin — I never thought I would be recommending a Microsoft product, but I am honestly impressed with their open source commitment lately. This is now my favorite code editor, even for doing some things in Python — like debugging code.
Kaggle.com is the best place to find datasets when you are starting out. Go ahead and sign up for an account and poke around the site. You will notice that there are many competitions for people of all experience levels and even tutorials to go with them (like this beginner-friendly one about the Titanic). These datasets will be very helpful to practice with while you are learning Python libraries.
Next, it’s important to learn the common Python libraries for working with data: Numpy, Matplotlib, Pandas, Scikit-Learn, etc. I recommend starting with this course from datacamp. It goes over some basics which you can skip or use for review and the Numpy section is a good intro.
Pandas is a must learn but also takes a little while to grasp since it does so many things. It’s built on top of Numpy and is used for cleaning, preparing, and analyzing data. It also has built-in tools for things like visualization. I used a lot of resources to learn Pandas and practice with it. Here are a few:
- Learn Pandas on Kaggle
- Learn Pandas Video Course | Notebook for Course
- Jupyter Notebook Extra Examples: Basics | Plotting with Matplotlib & Pandas | And Many More
After Pandas comes Scikit-Learn. This is where things start to be applied more to actual machine learning algorithms. Scikit-Learn is a scientific Python library for machine learning.
The best resource I found for this so far is the book “Hands on Machine Learning with Scikit-Learn and Tensorflow”. I think it does a very good job of teaching you step-by-step with practical examples. The first half is about Scikit-Learn, so I did that part first and then came back to the Tensorflow portion.
There are many other Python libraries like Keras and PyTorch, but I will get into those later. This is already a lot to learn :)
This is the first step into machine learning. Scikit-Learn has shallow learning functions like linear regression built into the library. The Scikit-Learn book that I mention above teaches about many types of common machine learning algorithms and lets you practice with hands on examples.
While that’s good, I still found it useful to also go through Andrew Ng’s Machine Learning course from Stanford. It’s available to be audited for free on Coursera (there is a podcast for this course on iTunes, but it’s a little hard to follow and well over a decade old). The quality of instruction is amazing and it’s one of the most recommended resources online (it’s not the easiest to get through which is why I recommend it down here).
Start going through the Andrew Ng course slowly and don’t get frustrated if you don’t understand something. I had to put it down and pick it up several times. I also took Matlab in college, which is the language he uses in the course, so I didn’t have trouble with that part. But if you want to use Python instead, you can find the examples translated online.
Yes, math is necessary. However, I don’t feel like an intense, math-first approach is best way to learn; it’s intimidating for many people. As OCDevel suggests in his podcast (linked above), spend most of your time learning practical machine learning and maybe 15–20% studying the math.
I think the first step here is to learn/brush up on statistics. It can be easier to digest and be both a lot fun and practical. After statistics, you will definitely need to learn a bit of linear algebra and some calculus to really know what’s going on in deep learning. This will take some time, but here are some of the resources that I recommend for this.
- I think the statistics courses on Udacity are quite good. You can start with this one and then explore the other ones they offer.
- I loved the book, “Naked Statistics”. It’s full of practical examples and enjoyable to read.
- It’s also useful to understand Bayesian statistics and how it differs Frequentist and Classical models. This coursera course does a great job explaining these concepts — there is also a part 2 of the course here.
Linear Algebra Resources:
- The book, “Linear Algebra, Step by Step” is excellent. It’s like a high school/college textbook but well written and easy to follow. There are also plenty of exercises for each chapter with answers in the back.
- Essence of Linear Algebra video series — The math explanations by 3blue1brown are amazing. I highly recommend his math content.
- There is an overview of linear algebra in the Andrew Ng course as well but I think the two resources I list above are a bit easier to use for learning the subject.
I had taken a few years of Calculus before, but I still needed to brush up quite a bit. I picked up a used textbook for Calc. 1 at a local bookstore to start. Here are some online resources that helped me as well.
Other Helpful Math:
- Mathematical Decision Making from The Great Courses Plus
After learning some math and the basics of data science and machine learning, it’s time to jump into more algorithms and neural networks.
You probably got a taste of deep learning already with some of the resources I mentioned in part 1, but here are some really good resources to introduce you to neural networks anyhow. At least they will be a good review and fill in some gaps for you.
While you are working through the Andrew Ng Stanford course, I recommend checking out fast.ai. They have several high quality, practical video courses that can really help to learn and cement these concepts. The first is Practical Deep Learning for Coders and second — just released — is Cutting Edge Deep Learning For Coders, Part 2. I picked up so many things from watching and re-watching some of these videos. Another amazing feature of fast.ai is the community forum; probably one of the most active AI forums online.
Deep Learning Libraries in Python
I think it’s a good idea to learn a little bit from all three of these libraries. Keras is a good place to start as it’s API is made to be simpler and more intuitive. Right now, I use almost entirely PyTorch, which is my personal favorite, but they all have pro’s and con’s. Thus it’s good to be able to which one to choose in different situations.
- Deeplizard Keras Playlist — This channel has some seriously good explanations and examples. You can following along with the videos for free, or have access to the code notebooks as well by subscribing on Patreon at the $3 (USD) tier.
- I also found the documentation for Keras to be quite good
- Datacamp has many well-written tutorials for ML and Keras like this one
- The Tensorflow section of book, “Hands on Machine Learning with Scikit-Learn and Tensorflow” (mentioned above also)
- Deeplizard Tensorflow Series
- Deeplizard Pytorch Series
- Udacity Pytorch Bootcamp — I’m currently taking Udacity’s Deep Reinforcement Learning nanodegree and I thought their PyTorch section earlier in the course was very good. They are about to launch it for free to the public! Here are some of their PyTorch notebooks on Github.
- Fast.ai is also built with PyTorch — You will be learning this library some if you go through their courses.
Blogs & Research Papers
I have found it very helpful to read current research as I learn. There are plenty of resources that help making complicated concepts, and the math behind them, easier to digest. These papers are also a lot more fun to read then you may realize.
- fast.ai blog
- Distill .pub — Machine Learning Research explained clearly
- Two Minute Papers — Short video breakdowns of AI and other research papers
- Arvix Sanity — More intuitive tool to search through, sort, and save research papers
- Deep Learning Papers Roadmap
- Machine Learning Subreddit — They have ‘what are you reading’ threads discussing research papers
- Arxiv Insights — This channel has some great breakdowns of AI research papers
- The Data Skeptic — They have a lot of good shorter episodes, called [mini]s where they cover machine learning concepts
- Software Engineering Daily Machine Learning
- OCDevel Machine Learning Podcast — I already mentioned this one, but I’m listing it again just in case you missed it
Additional Learning Resources
Please clap if this was helpful :)
Social Media: @gwen_faraday
If you know of any other resources that are good, or see that I am missing something, please leave links in the comments. Thank you.