Posts

Showing posts from June, 2016

Spark - Optimal executors, cores and executor memory - Tuning spark jobs

Image
Executor, memory and core setting for optimal performance on Spark Spark is adopted by tech giants to bring intelligence to their applications. Predictive analysis and machine learning along with traditional data warehousing is using spark as the execution engine behind the scenes. I have been exploring spark since incubation and I have used spark core as an effective replacement for map reduce applications. Optimizing jobs in spark is a tricky area as there are no many ways to do it. I have done some trial and error in they way I write code sequencing. But as Spark use lazy evaluation and DAGs are pre created during execution there are no may ways to alter it.  This blog share some information about optimizing spark jobs - programmatically i.e. writing better code and playing around with hardware. Rule of thumb for better performance Use per key aggregation  - use reduce by instead of group by . Blog talks about the difference aggregate function options a...

Compose and Send HTML emails from Informatica (ETL)

Image
Dynamically generating and sending html emails using ETL(Informatica) Sending status emails from ETL is a very common practice in data warehouse projects. email tasks are available in all ETL tools which makes this task much easier. Normally, content of these emails are dynamic and created using Unix scripts or in some case ETL itself. Common ETL generated summary emails include Error reports ETA and job/load status Data warehouse/ Mart load completion Database/Server capacity alerts Summary reports in email Most of these emails are send to business group or IT project support itself. These emails are formatted and the real data is send as attachments(.csv,.xls ,.txt etc). Here, I am demonstrating a method to compose and send html emails using ETL and Unix command. I am deviating from usual method of sending emails with attachment and instead writing the attachment content into email body itself. Advanatge of html email is that , the data can be visually re...