HMG’s 2015 CIO Executive Leadership Summit in Boston, Feb 12, 2015 was stellar. Congratulating Hunter Muller, President and CEO of HMG Strategy, I remarked that what stood out to me was the quality of the panel discussions: cogent, thought provoking and with just the right mix of well-picked experts. Nowhere was this truer than in the discussion on “Big Data: The Next Frontier – Insights and Strategies from Massachusetts Big Data Pioneers”. The panelists were Nidhi Aggarwal, VP of Product Strategy at Tamr, Prat Moghe, Founder and CEO at Cazena, Bill Simmons, CTO at DataXu, Paul Sondregger, VP Big Data Strategy at Oracle and Bob Zurek, SVP Products at Epsilon. The panel was ably moderated by Jo Hoppe, until recently CIO at PAREXEL.
Here is what I heard from the panel.
- We have entered the era of the digitization and datafication of everything. All activity results in data – so the imperative is to capture it then or lose the data and its usefulness forever. Uber is an example of this. Businesses have to ask, “What value creating activities are an asset?”. This is new.
- For a lot of data the emphasis is on store and analyze. But extracting value from the data is where the emphasis needs to be. It is not Big Data that is important, but smart data. Thus it is no use having data in a format that is not analyzable.
- Traditionally, as an accounting practice, data has always been treated as an expense, not an asset. McKinsey has studies on book value vs. market value for data intensive companies. [See McKinsey’s “Big Data, big new businesses” for their positive outlook]. Viewing data as an asset is an imperative. Selling data, monetizing it, is new to most companies. What is needed is something like a “Data Monetization Executive” for a business.
- As perspective, folks have been doing Big Data since before the term was coined and became popular. Some example: intelligence agencies, travel, financial companies have always had a lot of data to process. There is now lots of innovation to lower the cost of dealing with Big Data. But what is really new is that IT is in direct touch with data that is of critical value to the business. Thus CIOs get a direct link to the CEO for partnership through monetization of this asset.
- Big Data is often talked about in terms of Volume, Variety and Velocity as its markers. Today, volume and velocity are not the problem – variety is! It’s a big problem and one that cannot be solved ad-hoc; it must be solved across the enterprise. There is the need for a data unification platform. As an example, one can have 300 ERP systems and too many suppliers – yet be seeking a single comprehensive answer for the business. Quantity of data, e.g., from multiple drug efficiency studies is there – but it has to be usable.
- An example of Velocity is Epsilon’s robust royalty program. There is a very high rate of transactions, e.g., Walgreen’s rewards for customers program. Epsilon helps marketers create campaigns. A huge amount of data, e.g., clicks have to be analyzed. More than 20 million emails may need to be sent in an hour! Epsilon uses Cloudera and Cassandra and have an event-streaming engine under the hood.
- Predictive analytics using Big Data is often sought. For some parts of the business you know what the good questions are. For other parts, you may not be sure what data will be useful. This is the world of discovery – the need to explore. But what is important is not just to discover the correlations – you need to operationalize them. For something like automatic fraud detection the results are needed in real time – is the transaction okay to proceed with? Fraud detection is, in fact, like an arms race – you need to always stay a step ahead.
- There is a lot of hype around machine learning, but what is needed for value is to apply context. Thus IT often does not have the context for drug discovery in Pharma. Research scientists with conceptual knowledge are needed. Today, context resides in humans. They must be brought into the process earlier.
- On the tools front, traditional data warehousing does not work for Big Data. But Hadoop, Hive, etc. also have their problems. It started with Hadoop, but there is now some disillusion. Spark is what is hot now. [xplenty has a nice write-up on Spark vs. Hadoop MapReduce. Quora also answers well the question about the difference]. But it is really not about technology but outcomes. Real time vs. batch are very different workloads, and you have to look at how the workload needs to be executed. Hadoop is great for batch. Spark is for in-memory, where speed is needed. If SQL is the input, there may be a place for SQL in the workflow. Warehouses are good for “medium” rather than big data – 1 petabyte is about the maximum they can handle. But the main lesson is: don’t make a long term commitment to any one tool – the tools are likely to change. Secure storage is the only constant.
- What is the best place for doing Big Data? Among the top 2000 enterprises, 90% are doing Big Data on-premise. But 60% want to go to the cloud. Regarding security, one should note that Amazon Web Services is more secure than any company! So cloud security is not a trade-off – it is a matter of education. At a higher level, the cloud today is a silo – make it an extension of the company. For development, testing and operations we already know the cloud to be outstanding – you get scale and lower cost. But the economics of the cloud can also flip as you scale. Netflix had to abandon AWS; they are now doing it on their own. But the future belongs to hybrid clouds. Big Data will increasingly be part of it.
- It is encouraging that universities are beginning to develop programs around data for undergrads and grads. But the current skills gap will be around for a while. Talent augmentation is an approach. The other piece is to bring the business closer to IT – it does not have to be the other way round with IT always doing the moving.
To summarize: a great session. For those new to Big Data, let me strongly recommend “Big Data: A Revolution That Will Transform How We Live, Work and Think” by Viktor Mayer-Schonberger and Kenneth Cukier. It is sure to become a classic. You will find it here on Amazon.