Tech, Life and Society

Thoughts on tech, life and society

Page 2


Choosing a Commercial R Distribution Over Open Source R

RevR organised a meet-up in San Jose on 2nd June before the Hadoop Summit 2014, which I attended to catch up with what was happening in the commercial R world. This blog covers a subset of topics discussed in the meet up punctuated with my own opinion.

When we think of data science and analytics, R and SAS are the two software solutions, which often come to our mind. SAS has been around for a couple of decades as the old guard of data science even before this word was invented and the Big Data became a buzzword in the enterprises and Internet powered businesses.

R is an open source, community driven project. Hundreds of contributors from the universities and research institutions from all over the world contribute to R, which makes R a rich and popular technology for data science, machine learning and predictive analytics. R was created in 1991 by building upon the previous work done...

Continue reading →


Should the EDW world really care about Hadoop? “Yes” according to Kimball

In late 2012, when I was asked to run an engineering team building EDW solutions for a large bank, I bumped in to the data world that goes by many names such as data management, information management, enterprise data warehouse, business intelligence and customer intelligence etc. This world is undergoing a revolution because of a massive shift in data storage paradigm due to arrival of Hadoop and other NoSQL (a provocative choice of name that does not help making it a mass movement) databases. These developments are disruptive in nature because suddenly the cost of storing data drops drastically while the speed and efficiency with which it can be analyzed increases manifold. These technologies originated from Apache foundation projects where Java was the dominant language not so long ago. The reaction of EDW world, to these developments, is of disbelief which reminds us of...

Continue reading →


Hiring Programmers

I would like to argue that managers should not play the primary role in the hiring of programmers but the programmers should.

Hands-on programming should be the essential component of programmer hiring process. A programmer candidate should solve a problem together with your best programmer. I call this process doing a technical problem solving session. The feedback of my best programmer should be my best tool to judge whether a candidate gets hired or not. By solving a problem together, I mean, code and deploy a small application. In reality, the programming interviews are limited to question and answer session combined with a discussion in front of a whiteboard. As a result, the real programming skills of a candidate get exposed only when he or she starts to work in your team.

There are two problems that need to be addressed before this technical problem solving session approach...

Continue reading →


Low Latency Query Frameworks for Hadoop

Strata Conference is about many more things beyond Hadoop. However, a common thread among all the themes this year was that the HDFS and the YARN (Yet Another Resource Negotiator) have been accepted as the standard base frameworks for building a big data platform. Hadoop community is now focusing upon building an strong ecosystem of tools for Hadoop to accelerate the enterprise adoption of Hadoop.

Hadoop ecosystem needs mature tools for interactive fast querying to match the performance of MySql or Oracle. Tools for streaming, user friendly tools for data integration and workflow scheduling and machine learning are the fresh areas of push for the community involved in maturing Hadoop ecosystem. Very innovation that drives open source community, a significant contributor to Hadoop ecosystem, also makes large enterprises nervous to adopt new open source tools because they do not know...

Continue reading →


Strata 2014 at Santa Clara - First Day Impressions

Strata 2014 was the greatest Strata ever with 3100 participants. It is a sign that the interest in Big Data is growing in Industry. A unique thing about Strata is that it is a vendor neutral conference with lot of emphasis on open source innovations in the industry. The Silicon Valley edition of Strata drew speakers from top technology companies in the area such as Facebook, Netflix and LinkedIn. Among the established software vendors, IBM, Microsoft, SAP and Teradata were present but Oracle, another big data giant, was a notable absentee.

This year in the conference, I focused upon Hadoop and innovations in its ecosystem. On the First day of conference, a tutorial titled Building a Data Platform by John Akred, CTO, Silicon Valley Data Science and his colleagues, advocated building a modern data platform using Agile approach. They introduced the concept of experimental enterprise...

Continue reading →


Why is Agile so hard?

The simplicity of agile methods such as Scrum make you ignore the fact that it takes lot of effort to practices them in a right way. My Scrum Trainer, Pete Deemer, told us in Scrum class that if Scrum is followed or understood correctly then it is worse than waterfall. In Scrum, you rely upon self-organising teams who do the right things to build product right. Like any other methodology, successful implementation requires couple of die hard believers who really want to adopt a new way of working to achieve better performance of team. If this is complemented with good experience on Scrum projects among few team members then the agile adoption goes very well.

View →