This article will serve as a guide to improving your Data Science skills while working from home. You can use it to build real-life projects, beef up your portfolio, and prepare yourself for what's next.

The coronavirus outbreak is taking over headlines. Due to the spread of COVID-19, remote work is suddenly an overnight requirement for many. You might be working from home as you are reading this article.

With millions working from home for many weeks now, we should seize this opportunity to improve our skills in the domain we are focusing on.

Here is my strategy to learn Data Science while working from home with few personal real life projects.

"So what should we do?"

"Where should we start learning?"

Grab your coffee as I explain the process of how you can learn data science sitting at home. This blog is for everyone, from beginners to professionals.

## Prerequisites

To start this journey, you will need to cover the prerequisites. No matter which specific field you are in, you will need to learn the following prerequisites for data science.

### Logic/Algorithms:

It’s important to know why we need a particular prerequisite before learning it. Algorithms are basically a set of instructions given to a computer to make it do a specific task.

Machine learning is built from various complex algorithms. So you need to understand how algorithms and logic work on a basic level before jumping into complex algorithms needed for machine learning.

If you are able to write the logic for any given puzzle with the proper steps, it will be easy for you to understand how these algorithms work and you can write one for yourself.

### Statistics:

Statistics is a collection of tools that you can use to get answers to important questions about data.

Machine learning and statistics are two tightly related fields of study. So much so that statisticians refer to machine learning as “applied statistics” or “statistical learning”.

The following topics should be covered by aspiring data scientists before they start machine learning.

• Measures of Central Tendency — mean, median, mode, etc
• Measures of Variability — variance, standard deviation, z-score, etc
• Probability — probability density function, conditional probability, etc
• Accuracy — true positive, false positive, sensitivity, etc
• Hypothesis Testing and Statistical Significance — p-value, null hypothesis, etc

This depends on which domain you want to focus on. It basically involves understanding the particular domain and getting domain expertise before you get into a data science project. This is important as it helps in defining our problem accurately.

This sounds pretty easy but we tend to forget some important basic concepts. It gets difficult to learn more complex concepts and the latest technologies in a specific domain without having a solid foundation in the basics.

Here are few concepts you can start revising:

### Python programming language

Python is widely used in data science. Check out this collection of great Python tutorials and these helpful code samples to get started.

You can also check out this Python3 Cheatsheet that will help you learn new syntax that was released in python3. It'll also help you brush up on basic syntax.

And if you want a great free course, check out this Python for Everybody course from Dr. Chuck.

### General data science skills

Want to take a great course on data science concepts? Here's a bunch of data science courses that you can take online, ranked according to thousands of data points.

## Data Collection

Now it is time for us to explore all the ways you can collect your data. You never know where your data might be hiding. Following are a few ways you can collect your data.

### Web scraping

Web scraping helps you gather structured data from the web, select some of that data, and keep what you selected for whatever use you require.

You can start learning BeautifulSoup4 which helps you scrape websites and make your own datasets.

Advance Tip: You can automate browsers and get data from interactive web pages such as Firebase using Selenium. It is useful for automating web applications and automating boring web based administration

Resources: Web Scraping 101 in Python

### Cloud servers

If your data is stored on cloud servers such as S3, you might need to get familiar with how to get data from there. The following link will help you understand how to implement them using Amazon S3.

### APIs

There are millions of websites that provide data through APIs such as Facebook, Twitter, etc. So it is important to learn how they are used and have a good idea on how they are implemented.

## Data Preprocessing

This topic includes everything from data cleaning to feature engineering. It takes a lot of time and effort. So we need to dedicate a lot of time to actually learn it.

Data cleaning involves different techniques based on the problem and data type. The data needs to be cleaned from irrelevant data, syntax erros, data inconsistencies and missing data. The following guide will get you started with data cleaning.

Resources : Ultimate guide to data cleaning

Data Preprocessing is an important step in which the data gets transformed, or encoded, so that the machine can easily parse it. It requires time as well as effort to preprocess different types of data which include numerical, textual and image data.

## Machine Learning

Finally we reach our favourite part of data science: Machine Learning.

My suggestion here would be  to  first brush up your basic algorithms.

Classification — Logistic Regression, RandomForest, SVM, Naive Bayes, Decision Trees

Regression — Linear Regression, RandomForest, Polynomial Regression

Clustering — K-Means Clustering, DBSCAN, Agglomerative Hierarchical Clustering

Resources : Clustering algorithms

I urge you all to understand the math behind these algorithms so you have a clear idea of how it actually works. You can refer to this blog where I have implemented XGBoost from scratch — Implementing XGBoost from scratch

Now you can move on to Neural Networks and start your Deep Learning journey.

You can then further dive deep into how LSTM, Siamese Networks, CapsNet and BERT works.

## Hackathons:

Now we need to implement these algorithms on a competitive level. You can start looking for online Data Science Hackathons. Here is the list of websites where I try to compete with other data scientists.

Analytics Vidhya — https://datahack.analyticsvidhya.com/contest/all/

Hackerearth — https://www.hackerearth.com/challenges/

MachineHack — https://www.machinehack.com/

Dare2compete — https://dare2compete.com/e/competitions/latest

Crowdanalytix — https://www.crowdanalytix.com/community

To have a look at a winning solution, here is a link to my winning solution to one online Hackathon on Analytics Vidhya — https://github.com/Sid11/AnalyticsVidhya_DataSupremacy

## Projects:

We see people working on dummy data and still don’t get the taste of how actual data looks like. In my opinion, working on real life data gives you a very clear idea how data in real life looks like. The amount of time and effort required in cleaning real life data takes about 70% of your project’s time.