By Tirthajyoti Sarkar

Apache Spark is one of the hottest frameworks in data science. It realizes the potential of bringing together both Big Data and machine learning. This is because:

  • Spark is fast (up to 100x faster than traditional Hadoop MapReduce) due to in-memory operation.
  • It offers robust, distributed, fault-tolerant data objects (called RDDs)