I will admit that prior to reading this MIT Sloan article, I had heard the phrase “Big Data” many times, but never actually knew what it specifically entailed. It lead me to looking up the term on Google, which complicated things even further. For instance, a Forbes article on big data provided 12 separate definitions, which I listed below.
#1) “data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges.”
#2) “an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications.”
#3) “datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze”
#4) “The ability of society to harness information in novel ways to produce useful insights or goods and services of significant value” and “…things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value.”
#5) “The broad range of new and massive data types that have appeared over the last decade or so.”
#6) The new tools helping us find relevant data and analyze its implications.
#7) The convergence of enterprise and consumer IT.
#8) The shift (for enterprises) from processing internal data to mining external data.
#9) The shift (for individuals) from consuming data to creating data.
#10) The merger of Madame Olympe Maxime and Lieutenant Commander Data.
#11) The belief that the more data you have the more insights and answers will rise automatically from the pool of ones and zeros.
#12) A new attitude by businesses, non-profits, government agencies, and individuals that combining data from multiple sources could lead to better decisions.
Now that the definition of big data has been cleared up, I’ll address the specifics of the MIT Sloan article. The body of the article was split up into three key ways that differentiate organizations that capitalize on big data from those that use traditional data analysis methods. The three factors are:
1. They pay attention to data flows as opposed to stocks.
This section discussed several types of big data applications. One type was customer-facing processes that do things like identify fraud risks in real time or score medical patients for health risk. A second type uses continuous process monitoring to detect things like changes in consumer sentiment or needs for service. A third type analyzes network relationships such as those on LinkedIn. In each of these methods, there is not a set amount of data, but rather a continuous flow that is changing and is in need of analysis. Organizations are moving away from looking at data that occurred in the past and are moving toward continuous flows of data. It used the example of a hospital in Toronto that uses algorithms to anticipate infections in premature babies before they occur. If you think about it, this method of continuously analyzing new data makes sense. Under traditional methods, by the time organizations gathered the information they needed and analyzed it to make a decision, new data is available that makes what they gathered obsolete.
2. They rely on data scientists and product process developers rather than data analysts.
Data scientists are professionals who understand analytics, but are also very knowledgeable in IT. They often have degrees in computer science, computational physics, biology, or network-oriented social sciences. Additionally, they have very strong data management skills, such as programming, math, statistics, business, and communications. This allows them to not only understand the underlying analytics, but also communicate effectively with decision-makers. And as you can guess by this vast skill-set, these professionals are very limited in supply. This has led many big data corporations to develop their own talent and train their own data scientists. Furthermore, because their skill sets go beyond traditional data analytics, many corporations are involving these professionals in product development and other features of the business.
3. They are moving analytics away from the IT function and into core business, operational, and production functions.
Because of the technological advancements of big data, the traditional programs, skills, and processes of the IT function are not satisfactory. New products, such as Hapdoop, supports the massive amounts of data generated and managed. Cloud-based computing has also helped greatly in this area because many big-data applications use information that is not proprietary, such as social network modeling and sentiment analysis. Another method is using virtual data marts that allow data scientists to share existing data without replicating it.
The subject of data analysis is continuously changing. As big data continues to evolve, the processes, methods, and technology used to analyze this data must advance as well. The MIT Sloan refers to it as an information ecosystem; “a network of internal and external services continuously sharing information, optimizing decisions, communication results, and generating new insights for business.”
MIT Sloan Article, “How ‘Big Data’ is Different” by Thomas H. Davenport, Paul Barth, and Randy Bean. Blog Group 1C