This article will serve as a guide to improving your Data Science skills while working from home. You can use it to build real-life projects, beef up your portfolio, and prepare yourself for what's next.
The coronavirus outbreak is taking over headlines. Due to the spread of COVID-19, remote work is suddenly an overnight requirement for many. You might be working from home as you are reading this article.
With millions working from home for many weeks now, we should seize this opportunity to improve our skills in the domain we are focusing on.
Here is my strategy to learn Data Science while working from home with few personal real life projects.
"So what should we do?"
"Where should we start learning?"
Grab your coffee as I explain the process of how you can learn data science sitting at home. This blog is for everyone, from beginners to professionals.
To start this journey, you will need to cover the prerequisites. No matter which specific field you are in, you will need to learn the following prerequisites for data science.
It’s important to know why we need a particular prerequisite before learning it. Algorithms are basically a set of instructions given to a computer to make it do a specific task.
Machine learning is built from various complex algorithms. So you need to understand how algorithms and logic work on a basic level before jumping into complex algorithms needed for machine learning.
If you are able to write the logic for any given puzzle with the proper steps, it will be easy for you to understand how these algorithms work and you can write one for yourself.
Statistics is a collection of tools that you can use to get answers to important questions about data.
Machine learning and statistics are two tightly related fields of study. So much so that statisticians refer to machine learning as “applied statistics” or “statistical learning”.
The following topics should be covered by aspiring data scientists before they start machine learning.
- Measures of Central Tendency — mean, median, mode, etc
- Measures of Variability — variance, standard deviation, z-score, etc
- Probability — probability density function, conditional probability, etc
- Accuracy — true positive, false positive, sensitivity, etc
- Hypothesis Testing and Statistical Significance — p-value, null hypothesis, etc
This depends on which domain you want to focus on. It basically involves understanding the particular domain and getting domain expertise before you get into a data science project. This is important as it helps in defining our problem accurately.
Resources: Data science for business
Brush up your basics
This sounds pretty easy but we tend to forget some important basic concepts. It gets difficult to learn more complex concepts and the latest technologies in a specific domain without having a solid foundation in the basics.
Here are few concepts you can start revising:
Python programming language
You can also check out this Python3 Cheatsheet that will help you learn new syntax that was released in python3. It'll also help you brush up on basic syntax.
And if you want a great free course, check out this Python for Everybody course from Dr. Chuck.
General data science skills
Want to take a great course on data science concepts? Here's a bunch of data science courses that you can take online, ranked according to thousands of data points.
Now it is time for us to explore all the ways you can collect your data. You never know where your data might be hiding. Following are a few ways you can collect your data.
Web scraping helps you gather structured data from the web, select some of that data, and keep what you selected for whatever use you require.
You can start learning BeautifulSoup4 which helps you scrape websites and make your own datasets.
Advance Tip: You can automate browsers and get data from interactive web pages such as Firebase using Selenium. It is useful for automating web applications and automating boring web based administration
Resources: Web Scraping 101 in Python
If your data is stored on cloud servers such as S3, you might need to get familiar with how to get data from there. The following link will help you understand how to implement them using Amazon S3.
There are millions of websites that provide data through APIs such as Facebook, Twitter, etc. So it is important to learn how they are used and have a good idea on how they are implemented.
This topic includes everything from data cleaning to feature engineering. It takes a lot of time and effort. So we need to dedicate a lot of time to actually learn it.
Data cleaning involves different techniques based on the problem and data type. The data needs to be cleaned from irrelevant data, syntax erros, data inconsistencies and missing data. The following guide will get you started with data cleaning.
Resources : Ultimate guide to data cleaning
Data Preprocessing is an important step in which the data gets transformed, or encoded, so that the machine can easily parse it. It requires time as well as effort to preprocess different types of data which include numerical, textual and image data.
Finally we reach our favourite part of data science: Machine Learning.
My suggestion here would be to first brush up your basic algorithms.
Classification — Logistic Regression, RandomForest, SVM, Naive Bayes, Decision Trees
Regression — Linear Regression, RandomForest, Polynomial Regression
Resources : Introduction to Linear Regression , Use Linear Regression models to predict quadratic, root, and polynomial functions, 7 Regression Techniques you should know, Selecting the best Machine Learning algorithm for your regression problem,
Clustering — K-Means Clustering, DBSCAN, Agglomerative Hierarchical Clustering
Resources : Clustering algorithms
Gradient Boosting — XGBoost, Catboost, AdaBoost
I urge you all to understand the math behind these algorithms so you have a clear idea of how it actually works. You can refer to this blog where I have implemented XGBoost from scratch — Implementing XGBoost from scratch
Now you can move on to Neural Networks and start your Deep Learning journey.
You can then further dive deep into how LSTM, Siamese Networks, CapsNet and BERT works.
Now we need to implement these algorithms on a competitive level. You can start looking for online Data Science Hackathons. Here is the list of websites where I try to compete with other data scientists.
Analytics Vidhya — https://datahack.analyticsvidhya.com/contest/all/
Kaggle — https://www.kaggle.com/competitions
Hackerearth — https://www.hackerearth.com/challenges/
MachineHack — https://www.machinehack.com/
TechGig — https://www.techgig.com/challenge
Dare2compete — https://dare2compete.com/e/competitions/latest
Crowdanalytix — https://www.crowdanalytix.com/community
To have a look at a winning solution, here is a link to my winning solution to one online Hackathon on Analytics Vidhya — https://github.com/Sid11/AnalyticsVidhya_DataSupremacy
We see people working on dummy data and still don’t get the taste of how actual data looks like. In my opinion, working on real life data gives you a very clear idea how data in real life looks like. The amount of time and effort required in cleaning real life data takes about 70% of your project’s time.
- Here are the best free open data sources anyone can use
- Open Government Data — https://data.gov.in/
- Data about real contributed by thousands of users and organizations across the world — https://data.world/datasets/real
- 19 public datasets for Data Science Project — https://www.springboard.com/blog/free-public-data-sets-data-science-project/
After you get the results from your project, it is now time to make business decisions from those results. Business Intelligence is a suite of software and services that helps transform data into actionable intelligence and knowledge.
This can be done by creating a dashboard from the output of our model. Tableau is a powerful and the fastest growing data visualization tool used in the Business Intelligence Industry. It helps in simplifying raw data into the very easily understandable format. Data analysis is very fast with Tableau and the visualizations created are in the form of dashboards and worksheets.
It is now time for you start your work from home to improve your skillset. Also if you started this journey and need my advice or details about any subpart which I have mentioned above, feel free to comment or mail me at jsiddhesh96[at]gmail[dot]com.