Strata 2014 at Santa Clara - First Day Impressions
Strata 2014 was the greatest Strata ever with 3100 participants. It is a sign that the interest in Big Data is growing in Industry. A unique thing about Strata is that it is a vendor neutral conference with lot of emphasis on open source innovations in the industry. The Silicon Valley edition of Strata drew speakers from top technology companies in the area such as Facebook, Netflix and LinkedIn. Among the established software vendors, IBM, Microsoft, SAP and Teradata were present but Oracle, another big data giant, was a notable absentee.
This year in the conference, I focused upon Hadoop and innovations in its ecosystem. On the First day of conference, a tutorial titled Building a Data Platform by John Akred, CTO, Silicon Valley Data Science and his colleagues, advocated building a modern data platform using Agile approach. They introduced the concept of experimental enterprise which can take benefit of Cloud, DevOps and Open Source to rapidly build and deploy the data driven applications. An experimental enterprise can create a data value chain through which the analytical capabilities can be exposed.
The afternoon tutorial titled Building Real-Time Applications with Apache HBase, by Ronan Stokes, Cloudera, covered the architecture of HBase and several time series based use cases. HBase is a distributed, fault tolerant database inspired by Google’s Big Table.
HBase makes use of HDFS for data storage and offers low latency random access to data. It is suitable for storing time series data such a user events, web logs and tweets, where near Real-Time performance is required. Because of the use of Hadoop File System (HDFS), HBase can leverage the existing investment in Hadoop infrastructure to deliver the fault tolerance provided by data replication on HDFS. HBase is a non relational, columnar database which does not support SQL. It has its own command line query tool as well as Java based API to store and retrieve data. A key difference between HBase and its close cousin Cassandra is that in case of failure of one or more cluster nodes, HBase choses consistency while Cassandra choses availability
Hadoop has taken a central role in building a data platform and therefore innovation in Hadoop ecosystem such improving query performance, full SQL support, security and simplifying ETL workflows were key topics in the parallel sessions and at vendor booths in the expo.