August 28, 2018 / #Data Science

How to set up PySpark for your Jupyter notebook

By Tirthajyoti Sarkar

Apache Spark is one of the hottest frameworks in data science. It realizes the potential of bringing together both Big Data and machine learning. This is because:

Spark is fast (up to 100x faster than traditional Hadoop MapReduce) due to in-memory operation.
It offers robust, distributed, fault-tolerant data objects (called RDDs)

Learn to code for free. freeCodeCamp's open source curriculum has helped more than 40,000 people get jobs as developers. Get started