This article will serve as a guide to improving your Data Science skills while working from home. You can use it to build real-life projects, beef up your portfolio, and prepare yourself for what's next.

The coronavirus outbreak is taking over headlines. Due to the spread of COVID-19, remote work is suddenly an overnight requirement for many. You might be working from home as you are reading this article.

With millions working from home for many weeks now, we should seize this opportunity to improve our skills in the domain we are focusing on.

Here is my strategy to learn Data Science while working from home with few personal real life projects.

"So what should we do?"

"Where should we start learning?"

Grab your coffee as I explain the process of how you can learn data science sitting at home. This blog is for everyone, from beginners to professionals.

image-117
Photo by Nick Morrison on Unsplash

Prerequisites

To start this journey, you will need to cover the prerequisites. No matter which specific field you are in, you will need to learn the following prerequisites for data science.

Logic/Algorithms:

It’s important to know why we need a particular prerequisite before learning it. Algorithms are basically a set of instructions given to a computer to make it do a specific task.

Machine learning is built from various complex algorithms. So you need to understand how algorithms and logic work on a basic level before jumping into complex algorithms needed for machine learning.

If you are able to write the logic for any given puzzle with the proper steps, it will be easy for you to understand how these algorithms work and you can write one for yourself.

Resources: Some awesome free resources to learn data structures and algorithms in depth.

Statistics:

Statistics is a collection of tools that you can use to get answers to important questions about data.

Machine learning and statistics are two tightly related fields of study. So much so that statisticians refer to machine learning as “applied statistics” or “statistical learning”.

image-111
Image source : http://me.me/

The following topics should be covered by aspiring data scientists before they start machine learning.

  • Measures of Central Tendency — mean, median, mode, etc
  • Measures of Variability — variance, standard deviation, z-score, etc
  • Probability — probability density function, conditional probability, etc
  • Accuracy — true positive, false positive, sensitivity, etc
  • Hypothesis Testing and Statistical Significance — p-value, null hypothesis, etc

Resources: Learn college level statistics in this free 8 hour course.

Business:

This depends on which domain you want to focus on. It basically involves understanding the particular domain and getting domain expertise before you get into a data science project. This is important as it helps in defining our problem accurately.

Resources: Data science for business

Brush up your basics

This sounds pretty easy but we tend to forget some important basic concepts. It gets difficult to learn more complex concepts and the latest technologies in a specific domain without having a solid foundation in the basics.

Here are few concepts you can start revising:

Python programming language

Python is widely used in data science. Check out this collection of great Python tutorials and these helpful code samples to get started.

image-112
Image source : memecrunch.com

You can also check out this Python3 Cheatsheet that will help you learn new syntax that was released in python3. It'll also help you brush up on basic syntax.

And if you want a great free course, check out this Python for Everybody course from Dr. Chuck.

General data science skills

Want to take a great course on data science concepts? Here's a bunch of data science courses that you can take online, ranked according to thousands of data points.

Resources: Data science for beginners - free 6 hour course, What languages should you learn for data science?

Data Collection

Now it is time for us to explore all the ways you can collect your data. You never know where your data might be hiding. Following are a few ways you can collect your data.

Web scraping

Web scraping helps you gather structured data from the web, select some of that data, and keep what you selected for whatever use you require.

You can start learning BeautifulSoup4 which helps you scrape websites and make your own datasets.

Advance Tip: You can automate browsers and get data from interactive web pages such as Firebase using Selenium. It is useful for automating web applications and automating boring web based administration

Resources: Web Scraping 101 in Python

Cloud servers

If your data is stored on cloud servers such as S3, you might need to get familiar with how to get data from there. The following link will help you understand how to implement them using Amazon S3.

Resources : Getting started with Amazon S3, How to deploy your site or app to AWS S3 with CloudFront

APIs

There are millions of websites that provide data through APIs such as Facebook, Twitter, etc. So it is important to learn how they are used and have a good idea on how they are implemented.

Resources : What is an API? In English, please, How to build a JSON API with Python, and Getting started with Python API.

Data Preprocessing

This topic includes everything from data cleaning to feature engineering. It takes a lot of time and effort. So we need to dedicate a lot of time to actually learn it.

image-113
Image source : https://www.pinterest.com/pin/293648838181843463/

Data cleaning involves different techniques based on the problem and data type. The data needs to be cleaned from irrelevant data, syntax erros, data inconsistencies and missing data. The following guide will get you started with data cleaning.

Resources : Ultimate guide to data cleaning

Data Preprocessing is an important step in which the data gets transformed, or encoded, so that the machine can easily parse it. It requires time as well as effort to preprocess different types of data which include numerical, textual and image data.

Resources : Data Preprocessing: Concepts, All you need to know about text preprocessing for NLP and Machine Learning, Preprocessing for deep learning.

Machine Learning

Finally we reach our favourite part of data science: Machine Learning.

image-114
Image source : https://in.pinterest.com/pin/536209899383255279/

My suggestion here would be  to  first brush up your basic algorithms.

Classification — Logistic Regression, RandomForest, SVM, Naive Bayes, Decision Trees

Resources : Types of classification algorithms in Machine Learning, Classification Algorithms in Machine Learning

Regression — Linear Regression, RandomForest, Polynomial Regression

Resources : Introduction to Linear Regression , Use Linear Regression models to predict quadratic, root, and polynomial functions, 7 Regression Techniques you should know, Selecting the best Machine Learning algorithm for your regression problem,

Clustering — K-Means Clustering, DBSCAN, Agglomerative Hierarchical Clustering

Resources : Clustering algorithms

Gradient Boosting — XGBoost, Catboost, AdaBoost

Resources : Gradient boosting from scratch, Understanding Gradient Boosting Machines

I urge you all to understand the math behind these algorithms so you have a clear idea of how it actually works. You can refer to this blog where I have implemented XGBoost from scratch — Implementing XGBoost from scratch

Now you can move on to Neural Networks and start your Deep Learning journey.

Resources: Deep Learning for Developers, Introduction to Deep Learning with Tensorflow, How to develop neural networks with Tensorflow, Learn how deep neural networks work

You can then further dive deep into how LSTM, Siamese Networks, CapsNet and BERT works.

Hackathons:

image-115
Image Source : https://me.me/

Now we need to implement these algorithms on a competitive level. You can start looking for online Data Science Hackathons. Here is the list of websites where I try to compete with other data scientists.

Analytics Vidhya — https://datahack.analyticsvidhya.com/contest/all/

Kaggle — https://www.kaggle.com/competitions

Hackerearth — https://www.hackerearth.com/challenges/

MachineHack — https://www.machinehack.com/

TechGig — https://www.techgig.com/challenge

Dare2compete — https://dare2compete.com/e/competitions/latest

Crowdanalytix — https://www.crowdanalytix.com/community

To have a look at a winning solution, here is a link to my winning solution to one online Hackathon on Analytics Vidhya — https://github.com/Sid11/AnalyticsVidhya_DataSupremacy

Projects:

We see people working on dummy data and still don’t get the taste of how actual data looks like. In my opinion, working on real life data gives you a very clear idea how data in real life looks like. The amount of time and effort required in cleaning real life data takes about 70% of your project’s time.

Business Intelligence

After you get the results from your project, it is now time to make business decisions from those results. Business Intelligence is a suite of software and services that helps transform data into actionable intelligence and knowledge.

This can be done by creating a dashboard from the output of our model. Tableau is a powerful and the fastest growing data visualization tool used in the Business Intelligence Industry. It helps in simplifying raw data into the very easily understandable format. Data analysis is very fast with Tableau and the visualizations created are in the form of dashboards and worksheets.

Resources : Getting started with Tableau, Tableau for Data Science course

image-116
Image source : https://imgflip.com/i/31dvdc

It is now time for you start your work from home to improve your skillset. Also if you started this journey and need my advice or details about any subpart which I have mentioned above, feel free to comment or mail me at jsiddhesh96[at]gmail[dot]com.