You are here:Home » Big Data » The Evolution of Big Data

The Evolution of Big Data

To truly understand the implications of Big Data analytics, one has to reach back into the annals of computing history, specifically business intelligence (BI) and scientific computing. The ideology behind Big Data can most likely be tracked back to the days before the age of computers, when unstructured data were the norm (paper records) and analytics was in its infancy. Perhaps the first Big Data challenge came in the form of the 1880 U.S. census, when the information concerning approximately 50 million people had to be gathered, classified, and reported on.

With the 1880 census, just counting people was not enough information for the U.S. government to work with—particular elements, such as age, sex, occupation, education level, and even the “number of insane people in household,” had to be accounted for. That information had intrinsic value to the process, but only if it could be tallied, tabulated, analyzed, and presented. New methods of relating the data to other data collected came into being, such as associating occupations with geographic areas, birth rates with education levels, and countries of origin with skill sets.

The 1880 census truly yielded a mountain of data to deal with, yet only severely limited technology was available to do any of the analytics. The problem of Big Data could not be solved for the 1880 census, so it took over seven years to manually tabulate and report on the data.

With the 1890 census, things began to change, thanks to the introduction of the first Big Data platform: a mechanical device called the Hollerith Tabulating System, which worked with punch cards that could hold about 80 variables. The Hollerith Tabulating System revolutionized the value of census data, making it actionable and increasing its value an untold amount. Analysis now took six weeks instead of seven years. That allowed the government to act on information in a reasonable amount of time.

The census example points out a common theme with data analytics: Value can be derived only by analyzing data in a time frame in which action can still be taken to utilize the information uncovered. For the U.S. government, the ability to analyze the 1890 census led to an improved understanding of the populace, which the government could use to shape economic and social policies ranging from taxation to education to military conscription.

In today’s world, the information contained in the 1890 census would no longer be considered Big Data, according to the definition: data sets so large that common technology cannot accommodate and process them. Today’s desktop computers certainly have enough horsepower to process the information contained in the 1890 census by using a simple relational database and some basic code.

That realization transforms what Big Data is all about. Big Data involves having more data than you can handle with the computing power you already have, and you cannot easily scale your current computing environment to address the data. The definition of Big Data therefore continues to evolve with time and advances in technology. Big Data will always remain a paradigm shift in the making.

That said, the momentum behind Big Data continues to be driven by the realization that large unstructured data sources, such as those from the 1890 census, can deliver almost immeasurable value. The next giant leap for Big Data analytics came with the Manhattan Project, the U.S. development of the atomic bomb during World War II. The Manhattan Project not only introduced the concept of Big Data analysis with computers, it was also the catalyst for “Big Science,” which in turn depends on Big Data analytics for success. The next largest Big Science project began in the late 1950s with the launch of the U.S. space program.

As the term Big Science gained currency in the 1960s, the Manhattan Project and the space program became paradigmatic examples. However, the International Geophysical Year, an international scientific project that lasted from July 1, 1957, to December 31, 1958, provided scientists with an alternative model: a synoptic collection of observational data on a global scale.

This new, potentially complementary model of Big Science encompassed multiple fields of practice and relied heavily on the sharing of large data sets that spanned multiple disciplines. The change in data gathering techniques, analysis, and collaboration also helped to redefine how Big Science projects are planned and accomplished. Most important, the International Geophysical Year project laid the foundation for more ambitious projects that gathered more specialized data for specific analysis, such as the International Biological Program and later the Long-Term Ecological Research Network. Both increased the mountains of data gathered, incorporated newer analysis technologies, and pushed IT technology further into the spotlight.

The International Biological Program encountered difficulties when the institutional structures, research methodologies, and data management implied by the Big Science mode of research collided with the epistemic goals, practices, and assumptions of many of the scientists involved. By 1974, when the program ended, many participants viewed it as a failure.

Nevertheless, what many viewed as a failure really was a success. The program transformed the way data were collected, shared, and analyzed and redefined how IT can be used for data analysis. Historical analysis suggests that many of the original incentives of the program (such as the emphasis on Big Data and the implementation of the organizational structure of Big Science) were in fact realized by the program’s visionaries and its immediate investigators. Even though the program failed to follow the exact model of the International Geophysical Year, it ultimately succeeded in providing a renewed legitimacy for synoptic data collection.

The lessons learned from the birth of Big Science spawned new Big Data projects: weather prediction, physics research (supercollider data analytics), astronomy images (planet detection), medical research (drug interaction), and many others. Of course, Big Data doesn’t apply only to science; businesses have latched onto its techniques, methodologies, and objectives, too. This has allowed the businesses to uncover value in data that might previously have been overlooked.

Taken from : Big Data Analytics: Turning Big Data into Big Money


Post a Comment