by Tirthajyoti Sarkar

How to set up PySpark for your Jupyter notebook

Apache Spark is one of the hottest frameworks in data science. It realizes the potential of bringing together both Big Data and machine learning. This is because:

  • Spark is fast (up to 100x faster than traditional Hadoop MapReduce) due to in-memory operation.
  • It offers robust, distributed, fault-tolerant data objects (called RDDs)