BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Defining Big Data

This article is more than 10 years old.

As the field of big data grows, increasing numbers are introduced to its concepts, and I often hear the basic question of “is my data big enough to be big data?” Seven terabytes? Seventy terabytes? Seven hundred?

It’s too late now to change the name, of course, but the “big” part of “big data” is troublesome. It’s a poor signpost to what’s important about big data, and carries borderline puerile overtones of boastfulness.

The mainstream media has adopted a definition of big data that’s broadly synonymous with “analytics”, albeit mixed in now and then with a smattering of privacy-invading personal data collection. For me, that’s often a good enough definition, as I’m interested in people understanding that there’s power and potential in data.

As one of the people responsible for early definitions, I wrote in January 2012 that big data is “data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures”. This has proved an accurate definition over the years, and has been adopted as the basis of the definition you will find in Wikipedia’s big data page.

Nevertheless, one thing has troubled me about this definition: though a better phenomenological definition than merely “big”, it doesn’t give indication as to the business relevance of big data, or why so many are excited about it. What I want to do is draw a connection between this definition and the mainstream media understanding of big data, and by doing so, point out where big data fits in an organization’s IT and analytics endeavors.

Why Big Data Is Important For Business

The clue from my above definition is in the word “conventional”. What is a conventional system, and why is it so? The vast majority of everyday IT systems in organizations these days perform functions I think of as “faster paper”. They fulfil well-known and understood back office processes. This is conventional IT: adapted and efficient for well-trodden paths. Though these systems must still be intelligently applied, they provide the box to think within.

What happens when digitization of business exceeds this “faster paper” stage? We’re now in the age of social networking, pervasive mobile phones, and ubiquitous network-connected sensors. We end up in a place where the available data is too big, unstructured or fast-moving for our conventional approaches to work. Hence the emergence of big data technologies, and their support for uncertain and evolving business processes, where analytics and probabilistic understandings are often the chief ways of deriving benefit from the data.

Here is the biggest change: information systems are moving from the back office, to being the backbone of business value creation. Paramount to success is the integration between the business itself and the development work. Business needs to become deeply familiar with the new canvas on which they’re painting, the potentials and caveats of using data. And IT itself is become two-headed: one half must continue the well-understood path of provisioning the systems that keep the operation running, and the other must steward and lead the evolving development work, hand-in-hand with business.

So, do you have big data? My answer is “yes”. Even if you don’t think so, my bet is that you’ve been discarding data that doesn’t fit the “faster paper” model, for which there are now the tools to collect that data and mine value from it. Your challenge lies in understanding how exploiting that data can drive your business.