Big data, data discovery, and data science.
On their own, they form the foundation for modern data analysis and data-driven business practices. Combined, they form the future of big data discovery.
Gartner has already dubbed big data discovery as the next big trend in analytics. It should also be viewed as the next evolutionary phase in data analytics.
From an evolutionary standpoint, you could look at big data discovery as the most advanced species of big data, data discovery, and data science – merging the strengths of all three areas while masking the weaknesses.
(It’s kind of like how the grolar bear has evolved. What’s a grolar bear? It’s a polar bear and a grizzly bear combined.)
As ZDNet explained, while data discovery may provide ease of use and greater agility compared to big data, it offers less depth of information and less depth of exploration. Data science is able to handle complex analyses, but it is tougher to implement and more difficult to use. Big data is also tough to implement but it does offer a greater depth of information.
ZDNet continues, “Since the advantages of the three technologies map nicely to the advantages of the others, they are now starting to blend, and Gartner believes Big Data Discovery will be a distinct new market category by 2017.”
It’s interesting to take a step back and look at how we arrived at the big data discovery age.
Peter Schlampp, Vice President of Product for Platfora, drew a comparison between biological evolution and the evolution of analytics as we know it today.
We started with the enterprise data warehouse (EDW) and business intelligence tools that could organize and make sense of structured data.
However, if data wasn’t structured, and an extensive extract, transform, and load (ETL) process wasn’t exercised, accurate answers to pointed questions were much tougher to ascertain. These batch queries would also eat up valuable time.
Hadoop has come along to bridge the unstructured data processing and analysis gap.
Even still, pre-analysis organization of data is necessary for fruitful discovery and iteration from large data sets. As Schlampp described, “To perform SQL queries on Hadoop data, organizations need to first have data administrators organize data in SQL-on-Hadoop systems.”
The age of data lakes and self-service analytics is where we are currently.
Data lakes (which are organized through Hadoop) act as repositories for data that is collected in its native form. As the data is made available to everyone across the organization, there is more potential for business analysts to bypass IT to get the insight they need. There is also an opportunity to overcome traditional data silos from disparate data sources.
Analysts looking to run their own analytics still need to have the skills to properly manipulate and organize data for analysis. Self-service analytics is giving business users the chance to take ownership of their information.
At each turn, data analysis standards evolved as data analysis needs changed and data sources became more widespread.
Today, data insights can be gleaned from everywhere—from our cars, from our phones, from our watches)—and that’s how we want it. We need to know everything!
But this thirst for knowledge from big data sources has required data analytics technology to go through several incarnations.
Big data discovery is the latest version. The common thread that will bring big data, data science, and data discovery together is Apache Spark.
Given its processing power to handle big data workloads and its ability to handle advanced analytics, Apache Spark is streamlining the process of big data discovery from data lakes. Regardless of the data structure, Spark can simplify data preparation and handle the most intensive analytics demands.
Do you agree with ZDNet about big data discovery being the next big thing?