Should the EDW world really care about Hadoop? “Yes” according to Kimball

In late 2012, when I was asked to run an engineering team building EDW solutions for a large bank, I bumped in to the data world that goes by many names such as data management, information management, enterprise data warehouse, business intelligence and customer intelligence etc. This world is undergoing a revolution because of a massive shift in data storage paradigm due to arrival of Hadoop and other NoSQL (a provocative choice of name that does not help making it a mass movement) databases. These developments are disruptive in nature because suddenly the cost of storing data drops drastically while the speed and efficiency with which it can be analyzed increases manifold. These technologies originated from Apache foundation projects where Java was the dominant language not so long ago. The reaction of EDW world, to these developments, is of disbelief which reminds us of “not-invented-here” syndrome. Many seasoned EDW professionals view the paradigm shift as a passing fad.

To many people in the EDW world, Ralph Kimball is the guru or god. He invented dimensional modeling which has been the dominant database design approach that prescribed arranging data in facts and dimensions tables after cleansing. Recently, Ralph Kimball in a webcast about Hadoop argued that Hadoop is not a passing fad. This technology is here to stay in the enterprises by taking mainstream role in data warehousing. He goes on to compare the relative costs advantage of having Hadoop versus MPP based data warehouse appliances. The relative ease with which data can be quickly loaded on Hadoop for ad-hoc exploration and analytics is a great advantage for business as the wait time before data is loaded into EDW using ETL tools gets cut down from weeks to hours after switching to Hadoop. This a great news for EDW departments who can score a few brownie points over their application development counterparts who have shrunk the release cycles from several months to the deployments on DTAP pipeline with just a push of button.

Cloudera is a leading Hadoop vendor. They have signed up Ralph Kimball to feature on their website to endorse Hadoop. The big Hadoop vendors are trying to make a breakthrough in the EDW departments of the large enterprises. However, as compared to RDBMS behemoth, Hadoop remains a dwarf, with only 1,000 paying customers worldwide as compared 400,000 for Oracle as reported in a NYT blog which you can read here:

http://bits.blogs.nytimes.com/2014/04/02/pivotal-offers-a-big-data-bundle/?_php=true&_type=blogs&_r=0

The endorsement of Hadoop by a Guru of EDW is a good move and we can sincerely hope that it ignites more interest in Hadoop in EDW departments and cynicism dies away.

You can watch Dr. Ralph Kimball’s webcast here: http://www.cloudera.com/content/cloudera/en/resources/library/recordedwebinar/building-a-hadoop-data-warehouse-video.html

More on transforming your EDW Department:
http://www.slideshare.net/AnuragShrivastava7/the-evolving-role-of-enterprise-data-warehouse-department

I must admit that I was an outsider to the world of EDW until recently, as I spent most of my IT career developing applications based upon n-Tier architecture. Professionally, I grew up on the diet of Object Oriented Programming, by initially getting inspired by the work of Booch and Rumbaugh. I later became an ardent follower of Martin Fowler and Eric Evans who advocated the concepts such as Test Driven Development (TDD) and Domain Driven Design (DDD).

 
2
Kudos
 
2
Kudos

Now read this

The analytical DELTA and the Data Lake

The analytical DELTA is an acronym for the five success factors that you need to have a successful analytics program or project in your organization, so that you can compete in the market by data powered decision-making. The Data Lake or... Continue →