By Tirthajyoti Sarkar
Apache Spark is one of the hottest frameworks in data science. It realizes the potential of bringing together both Big Data and machine learning. This is because:
- Spark is fast (up to 100x faster than traditional Hadoop MapReduce) due to in-memory operation.
- It offers robust, distributed, fault-tolerant data objects (called RDDs)