Posts

Showing posts from 2017

Real time Analytics-Implementing a lambda architecture on Hadoop - Part 2

Image
Hbase-Lily Indexer- Indexing data from Hbase to Solr by configuration This is the second part of my 3 part blog series to achieve real time analytic capability. In this blog focus is to index data from Hbase to Solr just by configuration and very less development. If you have a web or mobile app it is nice to have a search capability on data- in order to achieve fuzzy search capability we use Solr. Since we already loaded data to Hbase as a part of ETL using Spark  it is not necessary to have another ETL process to load Solr. Lily Indexer is useful in indexing the data added/updated/deleted in Hbase database to Solr collection. This syncs the data in near real time.  Indexing allows you to query data stored in HBase with the Solr service.  The indexer supports flexible, custom, application-specific rules to extract, transform, and load HBase data into Solr. Solr search results can contain columnFamily:qualifier links back to the data stored in HBase. ...

Real time Analytics-Implementing a lambda architecture on Hadoop

Image
Implement lambda architecture with fewer steps - using Spark, Hbase, Solr, Hbase-lily indexer and Hive Welcome to three part tutorial of getting your data available for consumption on real time (near) basis. Data domain has so advanced where decision making has to rapid hence we are gradually moving away from batch based data load (ETL) and tending towards real time analytics. With data being center of your strategies and decision making, getting data available sooner is pivotal for all organization. This three part tech blog explains about implementing lambda architecture (architecture supporting batch and real time analytics alike). Overall architecture for such projects is to cater three needs 1. Quick data access for web sites - Random access of data pattern e.g. a particular profile id or customer id or a comment key 2. Fast searches on random texts, fuzzy search, search suggestion e.g  customer name, product name etc. 3. Analytical query support for BI tools like C...