This was the first talk I attended at EMC World and it was a good’un. I’ve been reading all that I can on the new Pivotal initiative and this talk was a great 101 on big data and why you want to virtualise your big data applications.
Some figures put on the board showed that the majority of fortune 500 companies are either evaluating, actively using or expanding their data analytics platforms. Unfortunately, this trend seems to be focused on Europe and America. We’re lagging a little behind the times in the Middle East and Africa.
I believe this is changing though and it’s starting in the telco space. I have regular meetings with the majority of major telcos in the Middle East and every one of them has some kind of large-scale data analytics initiative.
The presentation started with a very simple segmentation of analytics use cases:
- Horizontal Use Cases – Data analytics that works accross an organisation, regardless of business unit. For example log processing of applications
- Vertical Use Cases – Data analytics that is tied into the business processes of an organisation. For example an eCommerce recommendations engine
This got me thinking, most organisations are focusing in on the vertical use cases as they are seen as more directly relevant to their core business and have a greater impact on revenue generation. However, the horizontal use cases should be no-brainers as they have a huge impact on cost cutting and are more mature since they have been battle tested across a variety of organisations. I’m thinking specifically of security analytics and the immense impact it is having on keeping an organisation threat-free.
The talk then went on to describe the three stages of big data maturity within an organisation (specifically around the adoption of HADOOP). There was a great slide showing the three stages which I wish I grabbed a picture of. I’ll update the post when the slides are released.
- Stage 1 – Piloting, where one or two use cases are deployed to see what all the fuss is about
- Stage 2 – HADOOP in production to serve one or two business unitswho like what they are seeing and use it in an ad-hoc manner
- Stage 3 – Full blown cloud analytics platform that is completely integrated into a number of mission critical business processes and BI tools. The business now runs on analytics data
The majority of organisations in the Middle East will be situated in the stage 1 maturity level. The talk was mostly focused on Project Serengeti which is an open sourced tool-set to enable easy management of virtualised HADOOP clusters. It’s an open source cloud foundry project which means that the world’s open source developers are available to work on it.
Project Serengeti is an addition to the vCenter console with the folowing benefits:
- Deployment of a HADOOP cluster in minutes
- Scale out the cluster on the fly
- Customisation of the cluster with a simple specifications file. For example enabling high availability for the full HADOOP stack is achieved by setting a flag in the specifications file
- Proactive monitoring and automation using the standard VMWare vCops tools
Once an organisation has decided that it wants to move more rapidly into integrating big data into their business processes, it will start to deploy HADOP clusters into production environments for greater numbers of users. At this stage, the organisation will face the following challenges:
- Multiple clusters (e.g. prod, test, experimentation) where the data being worked on in each cluster is very similer. these clusters multiply as more and more use cases are deployed
- Multiple clusters based on different database technlogies. Organisations may target specific use cases to specific database technologies to drive the benefits from that technology
These challenges will lead to cluster sprawl where there is large amounts of redundant common data and inefficient use of cluster processing resources.
The innovative solution that Pivotal have developed is to separate out the data, application and cloud fabrics. The following slide from Joe Tucci’s keynote explains this very well.
Separating out the compute, storage and operations allows a far greater flexibility in how workloads can be implemented and allows you to scale out only what you need rather than the entire stack.
For example, you can create two different compute tenants working with the same data and also isolate these tenants based on resource consumption, software version or security.
This stage is more visionary as to what can be achieved and payed special attention to enabling real-time analytics which would be required when integrating the big data platform into mission critical business processes.
This part of the presentation focused mainly on the database technologies that are available to achieve this real time big data analytics:
- Real time databases – Big-SQL, HAWQ, Impala
- NOSQL – hBase, Cassandra
- In-Memory databases – Redis, Gemfire, Membase / Couchbase
I’m really looking forward to having some customer conversations around big data, HADOOP and Pivotal in the next few weeks as I think what’s been presented so far is game changing.