June 21, 2014

Three Conversation Topics at Big Data Analytics Event in London

Last week, I attended Big Data Analytics 2014 event in London. Before and after my talk titled Big Data: An Opportunity to Reinvent Your BI Department, I had a couple of good conversations with the event participants.

Let’s talk about the essence of three great conversation topics that I was part of during this event:

Visualisation Complements Data Science #

Software engineers have learnt over a period of time that for the end user GUI is the system. You can create a perfect software architecture and design but if you have a lousy GUI then you will fail to impress the end user. I am not arguing here that no programmer can design great GUI. Though I am suggesting here that you should invest in good GUI designers because the combination of a great GUI designer and a great programmer is hard to find in a single human being.

You can present the solution of a data science problem in plain numbers but the number require more effort to understand than a picture. You can present numbers in commonly available graphic forms, such as bar charts or pie charts. However, you create a great impression when you have data visualisation experts in your team. Data visualisation experts are creative people who can do magic with the visualisation of numbers. Common BI tools such as Tableau and PowerPivot have excellent data visualisation capabilities but if you really want to go beyond the standard solutions then you need the tools such as D3.js and the experts who can build presentations data using interactive visualisations.

You can play with D3.js to feel the power of a great visualisation.

To build great data driven applications, you will need data visualisations experts and not only the data scientists.

Engineering Complements Data Science #

The work of Data Scientists intersects with the work of software engineers when they build a data driven application or product for operational use. Good old-fashioned software engineering such as writing well-tested and maintainable code requires engineers having good knowledge of software design, writing quality code and test automation. Data Scientists are generally good at discovering patterns in data, applying machine-learning algorithms to data and developing predictive analytics models. It is hard to find a great software engineer in a great data scientist.

If you do not develop data driven applications and products using good software development practices before they are put in to production then you can expect to encounter problems that badly engineered code usually brings.

Your first big data use cases can delight your product managers but going in to production without a proper engineering is inviting trouble in the form of unmet SLAs and bug discovered in the production time which may be expensive to fix.

To build great data driven applications, you will need good software engineers and not only the data scientists.

Enterprise Data Hub #

I talked about the concept of Enterprise Data Lake in a previous blog. During the event, I got a chance to have a conversation with Tom Reilly, CEO, Cloudera who was also the keynote speaker in this event. Tom shared the Cloudera’s vision of Hadoop by explaining two key points to me.

First when you use Hadoop as the central building block of the Enterprise Data Hub, it should be possible to manage SLAs for different Hadoop workloads for different groups, users and use cases. For example, a question whether a particular credit card transaction is fraudulent or not requires a much faster response than when you want to run analytics to discover the credit card spending behaviour of your customers. It is a big question when Hadoop will be capable to work as the operational data store for organisation but it should be obvious that the large organisations will run different types of workloads on Hadoop requiring different SLAs. Ability to define fine-grained SLAs for various workloads will prevent the building of multiple Hadoop clusters within the organisation with duplicated data and higher cost of operations as an unwanted consequence.

Another major Hadoop vendor, Hortonworks, has addressed the security concerns surrounding enterprise Hadoop by acquiring XA Secure which offers fine grained access control on the objects stored in Hadoop. The security and SLA management are two features that will make enterprise adoption of Hadoop easier.

Tom’s second point was that Cloudera is building a library of machine learning and predictive analytics algorithms for common big data use cases. This may be a good starting point for those companies who have data but they are unable to analyse it either because of the lack of data science skills or the lack of use cases. The number of available solutions for machine learning using Big Data is growing every month but the most solutions are the offerings independent from Hadoop but they can work with Hadoop by processing the data stored in the HDFS.

If machine learning is bundled in the Hadoop distribution of Cloudera, then new Hadoop users can quickly start getting business benefits of Hadoop.

A participant from insurance industry lamented that they do not have any big data use cases for insurance. They were looking for some big data success stories from the insurance companies in this event.

Just after my presentation, I saw a case study presentation on MetLife Insurance about how they integrated 70 different databases to build 360-degree customer view in just 3 months. Customer reps at MetLife use a Facebook like interface to cross-sell and up-sell using real-time analytics with the help of the 360 degree view created using MongoDB.

Forming the right teams for big data analytics, choosing the right technology stack and the positioning of big data competency within the enterprise are the common problems that everyone is grappling with at a time when the hype around big data has peaked.

Kudos

Three Conversation Topics at Big Data Analytics Event in London

Visualisation Complements Data Science #

Engineering Complements Data Science #

Enterprise Data Hub #

Now read this

Where is my Data Lake?