Defining Big Data

Tuesday night I attended an event – storagefest II 2012, which was hosted by Valhalla Partners.  The event was a dinner with a group of storage experts from all vectors of the storage industry.  There were customers of storage technologies, VCs with investments in storage, entrepreneurs (folks from storage startups), industry insiders (analysts) and folks from storage companies who have been acquired into large companies.  The goal of the event also had multiple vectors, specific to each "group" that attended.

VCs attend to hear what customers have to say about the state of the storage industry and what they should be investing in or if the storage startups they have invested in are doing the right things.  They also listen to people who have had successful exits and the advice they may have for running a successful storage business.

Customers attend to hear what is new in the storage business and to share their experiences and challenges within their infrastructure, and what they are looking for from their storage technologies and new companies.

Entrepreneurs attend to lend their advice, to see what is new and share ideas.

Industry insiders attend to learn more about customer challenges, who has the best chance at solving these challenges, how the industry is shaping up and to report on the event.

Large company attendees, people who have had successful exits into the large company, are typically in influential roles in their new company and go to learn about how the industry is evolving and what new technologies are out there that they may want to add to the portfolio of the larger company.  It is also a good chance to listen to customers discuss what they are looking for from the next generation of storage technologies.

I set all of that up so you can understand the players and the mix of people at the event.

After dinner and drinks the floor was opened to have a discussion around “Big Data” (the newest “hot topic” in the data storage industry).  The discussion was started with one question – “What is Big Data”?  After 2 hours of debate, from all of the “industry experts” I never once heard the answer.  A majority of the conversation was around the size or volume of capacity that data is consuming these days.  (One analogy even went as far as saying “It’s similar to a ‘big person’ – when a ‘big person’ can’t fit into conventional clothing, they shop at a Big & Tall shop – so ‘Big Data’ is data that doesn’t fit into conventional storage systems.)  I have to say that none of these are right.

So here it is, the definition of Big Data – Big Data is not defined by size or volume.  Big Data is any data.  Big Data is ALL data.  Big Data is structured data.  Big Data is unstructured data. Big Data is semi-structured data.

The tools we have today for analyzing even the smallest amount of data is very sophisticated and it is getting even smarter.  Think about an application that can analyze every bit of data in say a large store such as Target.  It can analyze and cross reference who buys what at what time of the day and in what geographic areas.  The application can then save the results of that data into a location where it can be further analyzed and even more new data is created.  All of this analytics is done so you can create a much more competitive business.  The data, in all of its forms throughout the process has value, and the more value you can extract from the data, the greater the opportunity is to create a more successful business.

About a year ago I listed to an excellent presentation by Jeff Jonas from IBM on Big Data.  One of the points he made was that big data gives you better predictions and bad data is actually good data because it tells you directions not to go which in turn enables you to get to your destination faster.  Our business objective is to squeeze context out of our data.  The funny thing is, context is, by definition, is to better understand something by taking into account the things around it. So in theory the more data you have, the better off you are.

One of the things we also know is that time is of the essence.  People’s willingness to wait for information to make business decisions has gone from running batch jobs on data sets to real-time answers.  We also know that the better the predictions (due to the ability to analyze more data) the faster folks will want the information.  This is “Big Data”.

Next we will talk about Cloud - this is getting interesting

PDF Printer    Send article as PDF   

About the Author

Steve Kenniston - The Storage Alchemist.