image-57

The field of machine learning is becoming more and more mainstream every year. With this growth come many libraries and tools to abstract away some of the most difficult concepts to implement for people starting out.

Most people will say you need a higher level degree in ML to work in the industry. If you love working with data and practical math, then I would say this is not true. I did not graduate college with a Machine Learning or data degree yet I am working with ML right now at a startup. I want to share what I used to learn and how I got here in hopes that it will help someone else.

Getting Started

I knew Python already when I started, but, if you don’t, I recommend learning basic and intermediate Python first. The language is pretty easy to learn compared to others. Python is also home to the largest data science/ML community so there are tons of tools to help as you learn.

Learn Python: freeCodeCamp Python Crash Course

With that out of the way, the first thing you should do is download “The Machine Learning Podcast” by OCDevel (overcast.fm, iTunes) into your favorite podcast app. Listen to the first 10–15 episodes. They are very good at giving an overview of the machine learning ecosystem and there are also recommended resources which are linked on the OCDevel site.

Tooling

Anaconda & Jupyter Notebook — These are a must for ML & data science. Follow the instructions here to install and set them up.

Visual Studio Code with Python Plugin — I never thought I would be recommending a Microsoft product, but I am honestly impressed with their open source commitment lately. This is now my favorite code editor, even for doing some things in Python — like debugging code.

Kaggle.com is the best place to find datasets when you are starting out. Go ahead and sign up for an account and poke around the site. You will notice that there are many competitions for people of all experience levels and even tutorials to go with them (like this beginner-friendly one about the Titanic). These datasets will be very helpful to practice with while you are learning Python libraries.

Python Libraries

Next, it’s important to learn the common Python libraries for working with data: Numpy, Matplotlib, Pandas, Scikit-Learn, etc. I recommend starting with this course from datacamp. It goes over some basics which you can skip or use for review and the Numpy section is a good intro.

Pandas is a must learn but also takes a little while to grasp since it does so many things. It’s built on top of Numpy and is used for cleaning, preparing, and analyzing data. It also has built-in tools for things like visualization. I used a lot of resources to learn Pandas and practice with it. Here are a few:

  1. Learn Pandas on Kaggle
  2. Learn Pandas Video Course | Notebook for Course
  3. Jupyter Notebook Extra Examples: Basics | Plotting with Matplotlib & Pandas | And Many More

After Pandas comes Scikit-Learn. This is where things start to be applied more to actual machine learning algorithms. Scikit-Learn is a scientific Python library for machine learning.

The best resource I found for this so far is the book “Hands on Machine Learning with Scikit-Learn and Tensorflow”. I think it does a very good job of teaching you step-by-step with practical examples. The first half is about Scikit-Learn, so I did that part first and then came back to the Tensorflow portion.

There are many other Python libraries like Keras and PyTorch, but I will get into those later. This is already a lot to learn :)

Shallow Learning

This is the first step into machine learning. Scikit-Learn has shallow learning functions like linear regression built into the library. The Scikit-Learn book that I mention above teaches about many types of common machine learning algorithms and lets you practice with hands on examples.

While that’s good, I still found it useful to also go through Andrew Ng’s Machine Learning course from Stanford. It’s available to be audited for free on Coursera (there is a podcast for this course on iTunes, but it’s a little hard to follow and well over a decade old). The quality of instruction is amazing and it’s one of the most recommended resources online (it’s not the easiest to get through which is why I recommend it down here).

Start going through the Andrew Ng course slowly and don’t get frustrated if you don’t understand something. I had to put it down and pick it up several times. I also took Matlab in college, which is the language he uses in the course, so I didn’t have trouble with that part. But if you want to use Python instead, you can find the examples translated online.

Math :)

Yes, math is necessary. However, I don’t feel like an intense, math-first approach is best way to learn; it’s intimidating for many people. As OCDevel suggests in his podcast (linked above), spend most of your time learning practical machine learning and maybe 15–20% studying the math.

I think the first step here is to learn/brush up on statistics. It can be easier to digest and be both a lot fun and practical. After statistics, you will definitely need to learn a bit of linear algebra and some calculus to really know what’s going on in deep learning. This will take some time, but here are some of the resources that I recommend for this.

Statistics Resources:

  1. I think the statistics courses on Udacity are quite good. You can start with this one and then explore the other ones they offer.
  2. I loved the book, “Naked Statistics”. It’s full of practical examples and enjoyable to read.
  3. It’s also useful to understand Bayesian statistics and how it differs Frequentist and Classical models. This coursera course does a great job explaining these concepts — there is also a part 2 of the course here.

Linear Algebra Resources:

  1. The book, “Linear Algebra, Step by Step” is excellent. It’s like a high school/college textbook but well written and easy to follow. There are also plenty of exercises for each chapter with answers in the back.
  2. Essence of Linear Algebra video series — The math explanations by 3blue1brown are amazing. I highly recommend his math content.
  3. There is an overview of linear algebra in the Andrew Ng course as well but I think the two resources I list above are a bit easier to use for learning the subject.

Calculus Resources:

I had taken a few years of Calculus before, but I still needed to brush up quite a bit. I picked up a used textbook for Calc. 1 at a local bookstore to start. Here are some online resources that helped me as well.

  1. Essence of Calculus video series
  2. Understanding Calculus from The Great Courses Plus

Other Helpful Math:

  1. Mathematical Decision Making from The Great Courses Plus

Deep Learning

After learning some math and the basics of data science and machine learning, it’s time to jump into more algorithms and neural networks.

You probably got a taste of deep learning already with some of the resources I mentioned in part 1, but here are some really good resources to introduce you to neural networks anyhow. At least they will be a good review and fill in some gaps for you.

  1. 3blue1brown’s Series Explaining Neural Networks
  2. Deeplizard’s Intro to Deep Learning Playlist

While you are working through the Andrew Ng Stanford course, I recommend checking out fast.ai. They have several high quality, practical video courses that can really help to learn and cement these concepts. The first is Practical Deep Learning for Coders and second — just released — is Cutting Edge Deep Learning For Coders, Part 2. I picked up so many things from watching and re-watching some of these videos. Another amazing feature of fast.ai is the community forum; probably one of the most active AI forums online.

Deep Learning Libraries in Python

I think it’s a good idea to learn a little bit from all three of these libraries. Keras is a good place to start as it’s API is made to be simpler and more intuitive. Right now, I use almost entirely PyTorch, which is my personal favorite, but they all have pro’s and con’s. Thus it’s good to be able to which one to choose in different situations.

Keras

Tensorflow

PyTorch

Blogs & Research Papers

I have found it very helpful to read current research as I learn. There are plenty of resources that help making complicated concepts, and the math behind them, easier to digest. These papers are also a lot more fun to read then you may realize.

  1. fast.ai blog
  2. Distill .pub — Machine Learning Research explained clearly
  3. Two Minute Papers — Short video breakdowns of AI and other research papers
  4. Arvix Sanity — More intuitive tool to search through, sort, and save research papers
  5. Deep Learning Papers Roadmap
  6. Machine Learning Subreddit — They have ‘what are you reading’ threads discussing research papers
  7. Arxiv Insights — This channel has some great breakdowns of AI research papers

Audio-supplementary Education

  1. The Data Skeptic — They have a lot of good shorter episodes, called [mini]s where they cover machine learning concepts
  2. Software Engineering Daily Machine Learning
  3. OCDevel Machine Learning Podcast — I already mentioned this one, but I’m listing it again just in case you missed it

Additional Learning Resources

The End

Please clap if this was helpful :)

Social Media: @gwen_faraday

If you know of any other resources that are good, or see that I am missing something, please leave links in the comments. Thank you.