Spark - Optimal executors, cores and executor memory - Tuning spark jobs
Executor, memory and core setting for optimal performance on Spark Spark is adopted by tech giants to bring intelligence to their applications. Predictive analysis and machine learning along with traditional data warehousing is using spark as the execution engine behind the scenes. I have been exploring spark since incubation and I have used spark core as an effective replacement for map reduce applications. Optimizing jobs in spark is a tricky area as there are no many ways to do it. I have done some trial and error in they way I write code sequencing. But as Spark use lazy evaluation and DAGs are pre created during execution there are no may ways to alter it. This blog share some information about optimizing spark jobs - programmatically i.e. writing better code and playing around with hardware. Rule of thumb for better performance Use per key aggregation - use reduce by instead of group by . Blog talks about the difference aggregate function options a...