Blog

QueBIT Blog: Hadoop vs. Data Warehousing – Do You Have to Make a Choice?

Posted by Jennifer Field

Jan 19, 2016 7:18:54 AM

With the advent of Hadoop, the question of whether it would take over certain data warehousing functions or replace data warehouses altogether has become a hot-button topic of discussion. Finding an answer to this question really depends on how a data warehouse is viewed by individual organizations.

In many cases, Hadoop acts as a complement to a data warehouse.

But it’s not able to replace a data warehouse altogether. Hadoop can handle certain data management, integration, and workload needs that are more suited for its architecture. For example, Hadoop is geared for advanced analytics, while data warehouses are built to handle reporting and OLAP functions.

Hadoop and data warehouses are better when working together than working apart.

As a TDWI author explained, “At this point, I personally don’t believe Hadoop can replace a relational database management system, much less a relational data warehouse. [However], I do believe we can reduce our footprint on expensive relational databases by migrating some data to Hadoop.”

In moving those workloads to Hadoop, data warehouses can retain greater value from a functionality and economic standpoint. And by offloading certain data processing tasks to Hadoop, data warehouse capacity can be maximized to harness various data management and workload management needs.

However, when it comes to dealing with large unstructured data (the not-so-pretty data structures that are simply a pain to handle without the right processing tools), Hadoop has an extra processing gear that data warehouses can’t compete with.

As Mark Madsen, President of the Third Nature, described to Data Informed, “Some of the workloads, particularly when large data volumes are involved, require new storage layers in the data architecture and new processing engines. These are the problems Hadoop and alternate processing engines are equipped to solve.” 

Specifically, Hadoop is made for data lakes. Moreover, with the development of YARN, Hadoop gains cluster management horsepower while providing data warehousing support.

Can data warehouses and Hadoop come together?

These two platforms don’t exactly mix. They are separate but complementary technology solutions that can co-exist. By having both technology solutions in place, your business stands a better chance to handle the intensity of big data workloads.

Bob Page, VP of Development at Hortonworks, stated, “I don’t know anybody who would try to build an [integrated data warehouse) IDW in Hadoop. Comparing these two for an IDW workload is comparing apples and oranges.”

How does your organization view Hadoop?

   

Blog Search

Subscribe to Email Updates

Follow Me