Big Data
I. INTRODUCTION
To be able to understand the big data platforms it is needed to conceive the requirements of the digital era. In this document; technological need, concept, applications and tools of big data will be analysed briefly.
II. DEFINITION
Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data. [1]
Big Data has also been defined by the four “V”s: Volume, Velocity, Variety, and Value. These become a reasonable test to determine whether you should add Big Data to your information architecture. [2]
A. Volume
The amount of data. While volume indicates more data, it is the granular nature of the data that is unique. Big Data requires processing high volumes of low-density data, that is, data of unknown value, such as twitter data feeds, clicks on a web page, network traffic, sensor-enabled equipment capturing data at the speed of light, and many more. [2]
B. Velocity
A fast rate that data is received and perhaps acted upon. The highest velocity data normally streams directly into memory versus being written to disk. [2]
C. Variety
New unstructured data types. Unstructured and semi-structured data types, such as text, audio, and video require additional processing to both derive meaning and the supporting metadata. [2]
D. Value
The technological breakthrough is that the cost of data storage and compute has exponentially decreased, thus providing an abundance of data from which statistical sampling and other techniques become relevant, and meaning can be derived. However, finding value also requires new discovery processes involving clever and insightful analysts, business users, and executives. The real Big Data challenge is a human one, which is learning to ask the right questions, recognizing patterns, making informed assumptions, and predicting behaviour. [2]
III. THE NEED FOR BIG DATA PLATFORM
The data volumes are exploding, more data has been created in the past two years than in the entire previous history of the human race. [3] The production of data is expanding at an astonishing pace. Experts now point to a 4300% increase in annual data generation by 2020. Drivers include the switch from analog to digital technologies and the rapid increase in data generation by individuals and corporations alike. [4]
IV. STRUCTURE OF BIG DATA
Big data approaches data structure and analytics differently than traditional information architectures. A traditional data warehouse approach expects the data to undergo standardized ETL processes and eventually map into predefined schemas, also known as “schema on write”. A criticism of the traditional approach is the lengthy process to make changes to the pre-defined schema. One aspect of the appeal of Big Data is that the data can be captured without requiring a ‘defined’ data structure. Rather, the structure will be derived either from the data itself or through other algorithmic process, also known as “schema on read.” This approach is supported by new low-cost, inmemory parallel processing hardware/software architectures, such as HDFS/Hadoop and Spark. [2]
In addition, due to the large data volumes, Big Data also employs the tenet of “bringing the analytical capabilities to the data” versus the traditional processes of “bringing the data to the analytical capabilities through staging, extracting, transforming and loading,” thus eliminating the high cost of moving data. [2]
V. USE CASES
Use cases of big data is enormous, it may be used to collect analyse data in every aspect of life. Some examples of the cases are listed below:
- Understanding and Targeting Customers.
- Understanding and Optimizing Business Processes.
- Personal Quantification and Performance Optimization.
- Improving Healthcare and Public Health.
- Improving Sports Performance.
- Improving Science and Research.
- Optimizing Machine and Device Performance.
- Improving Security and Law Enforcement.
- Improving and Optimizing Cities and Countries.
- Financial Trading. [5]
It is clear that, Big Data is mainly needed where analysing, understanding and improvement is required and data that could be collected in huge volumes. By the rapid increase in digital data, this means that big data will be a part of our daily life in a very near future.
VI. BIG DATA TOOLS
There are several tools that can be used in Big Data application and it is possible to group them in four subjects which are:
A. Storage
The key requirements of big data storage are that it can handle very large amounts of data and keep scaling to keep up with growth, and that it can provide the input/output operations per second (IOPS) necessary to deliver data to analytics tools. [7]
B. Data Mining
Data mining is the process of discovering insights within a database as opposed to extracting data from web pages into databases. The aim of data mining is to make predictions and decisions on the data you have at hand. [6]
C. Data Cleaning
Before you can really mine your data for insights you need to clean it up. Even though it’s always good practice to create a clean, well-structured data set, sometimes it’s not always possible. Data sets can come in all shapes and sizes (some good, some not so good!), especially when you’re getting it from the web. [6]
D. Analyzing
While data mining is all about sifting through your data in search of previously unrecognized patterns, data analysis is about breaking that data down and assessing the impact of those patterns overtime. Analytics is about asking specific questions and finding the answers in data. You can even ask questions about what will happen in the future! [6]
RESOURCES
[2] An Enterprise Architect’s Guide to Big Data, March’16
[3] http://www.forbes.com/sites/bernardmarr/2015/09/30/big-data-20-mind-boggling-facts-everyone-must-read/#7252dcf76c1d
[4] http://assets1.csc.com/insights/downloads/CSC_Infographic_Big_Data.pdf
[5] http://www.ap-institute.com/big-data-articles/how-is-big-data-used-in-practice-10-use-cases-everyone-should-read.aspx
[6] https://www.import.io/post/all-the-best-big-data-tools-and-how-to-use-them/
[7] http://www.computerweekly.com/podcast/Big-data-storage-Defining-big-data-and-the-type-of-storage-it-needs
Emre Sami Süzer – Operations Director – Aktif Mühendislik