Big Data: what is it?
Uncategorized

Big Data: what is it?

23rd May 2012

This is a guest post from Arend Zwaneveld (@Arend78), analytics expert at online optimisation consultancy Online Dialogue. A large part of Arend’s job is spent analysing data, contributing to online design, and optimisation projects with valuable data-driven insights.

Data analysis is my hobby. My day is incomplete without an excel sheet and / or some solid data-driven insight. The good news for me: every day more data is stored and … is just waiting to be analysed! If you have not been living under a rock recently, you know: the business trend of the moment is called “Big Data”.

In this article I try to describe the key concepts associated with Big Data, for anyone who – like me – wishes to have a say on this important development.

What is Big Data?

Depends whom you ask. According data storage and database providers, Big Data is a great reason for their customers to buy new systems. Some simply put that “Big Data is all data that can no longer be saved physically or logically in a single location or system.”

Others take a step back and make a distinction between “Big Data Storage” and “Big Data analytics” [1]. As far as I am concerned, however, these two are inextricably linked: not the quantity but the way the use of the data makes “Big” or “small”. The requirements placed on Big Data Storage follow from the analyses one wishes to perform on the data.

Big Data Analytics

Technically, data analytics becomes “Big Data analytics” when the data:

  1. Is only valuable to the business when readily available (near real-time);
  2. Comes from multiple sources;
  3. Can be “enriched” with other (unstructured) data flexibly an easily.

Gartner has a catchy summary of these properties: “Volume, Variety and Velocity” [2] . Dave Raffo (Storage Media Group) clarifies what is and is not Big Data with an example:

  • The (huge) database of transactions with Amazon is not Big Data: it’s uniform, archived and without potential for added value;
  • The (huge) database of Amazon with click and purchase behaviour that lets a customer directly receive a personalized offer when he revisits the website is Big Data: it’s diverse, pluralist, readily available data, used for the purpose of creating value [3].

Famous examples of Big Data [4]: the human genome project, Google Analytics, Google’s Automated Self Driving Cars [5] and Amazon’s product recommendation engine [6] .

Big Data analytics therefore requires very different data systems than the existing data warehousing solutions. Big Data storage systems are fast, scalable, flexible and able to handle and integrate both structured and unstructured data: a huge technical challenge for database providers.

Big Data Storage

Big Data systems are often distributed networks of simple PCs and servers. This makes these Big Data systems scalable: storage and computing capacity can be added easily (the alternative: buy a new supercomputer every year).

Special “Big Data software” cuts the data into pieces, which are then copied and distributed to multiple locations within the network. This method of data distribution gives such a “distributed network” some special properties. A Big Data Network is:

  1. Quick: when requesting data the fastest available data-fragments are loaded from the nearest or fastest available locations, in parallel [7]
  2. Redundant: all data is saved to at least two physically separated locations in the network;
  3. Flexible and always online: unlike traditional IT systems, it does not have to be taken “offline” for maintenance. A Big Data distributed network is said to have no “single point of failure”.

Why Big Data?

Is it a hype? Can organisations wait with the strategic use of data or is Big Data reality already? [8] According to a McKinsey cited scientific publication “the effective use of data and analytics increases productivity, profitability and market value of companies with 5 to 6 percent” [9]. In some industries, the strategic use of data analysis will make the difference between profit and loss [10].

Big Data and Web Analytics

In my daily work as a Web Analyst, I notice that most clients are “closing the loop”: they link their web statistics (package) with their CRM system. Such “end-to-end” integration lets them periodically determine which marketing campaigns are generating the most sales (instead of generating the most leads).

One technical step further and these systems will work in real-time, enabling organizations to have a personalized online dialogue with their visitors and give them tailored product offerings [6]. Then – all of a sudden – Big Data will no longer be a dream, but reality!

Big Data analytics developments are often driven from the web (web analytics). However, IT departments develop Big Data systems without the cooperation of a Web Analysts more often than not. A missed opportunity, to say the least: the intended Big Data analytics directly impact the technical requirements and system implementation. Web analysts will need to get involved in the implementation process of Big Data. Hence my interest in the subject and the reason I believe it is important for web analysts to understand Big Data and help develop it further within organizations.

In my work, I’ll actively try to get more involved with Big Data and will definitely continue to read and write about this fascinating topic!

Additions, corrections and comments from experts are more than welcome!

——————–

Sources

[1] John Webster – searchstorage.techtarget.com – Understanding Big Data analytics

[2] Gartner Says Solving ‘Big Data’ Challenge Involves More Than Just Managing Volumes of Data, June 2011

[3] Big Data: Senior News Director Dave Raffo’s take (podcast)

[4] Frank Ohlhorst – Weighing the balance of Big Data, Web analytics and compliance, September 2010

[5] Google Automatic Self-Driving Cars

[6] Quora – Was Amazon’s recommendation engine crucial to the company’s success?

[7] Tweakers.net – Wat is Hadoop?

[8] The Age of Big Data: Is It Coming or has It Arrived?

[9] Erik Brynjolfsson e.a. – “Strength in numbers: How does data-driven decisionmaking affect firm performance?” – Social Science Research Network (SSRN), April 2011

[10] Brad Brown e.a – McKinsey & Company – Are you ready for the era of ‘big data’? – October 2011

Tags

Written By
This post was written by an author who is not a regular contributor to State of Digital. See all the other regular State of Digital authors here. Opinions expressed in the article are those of the contributor and not necessarily those of State of Digital.
  • This field is for validation purposes and should be left unchanged.