It seems like you can’t pick up a technical magazine without reading about how big data is changing the world—and the untold implications of this technology. But what the heck is big data? And didn’t we already solve this thing with business intelligence and data warehousing?
Big data, or BD, is the collection of transaction-level detail for analysis. The data is kept close to the transactional detail so it can be examined for hidden trends only seen when you analyze the individual transactions. The data can come from different sources but is analyzed in a common pool. This is most often a feed (or copy) of the transactions as they occur; they are streamed to the BD solution. Often, the value of the data is very time-dependent; the sooner the information is available, the more valuable it is.
There are four key terms used when talking about BD:
- The volume of records in scope is large. Millions of records per day can and do occur.
- The velocity of records being created is fast, as BD is very granular, and collection of the data is close to real time.
- The veracity of the data, which is a fancy term for the quality of the data, refers to inaccuracies that can occur when processing high volumes of data from multiple sources. There is a need to develop methods to screen the data quickly to add an optimal level of accuracy to the volume and velocity.
- There is a variety of data-generating devices, and as the number increases, it will become even more important to be able to interpret and consume the data from these different sources.
These four factors make using a conventional relational database management system impractical for storing and quickly analyzing BD, so new methods are being developed.
So, what is a business intelligence data warehouse?
A BIDW is a data analysis system that collects the transactional information and typically provides summaries on selected key fields of the transactions being watched. These summaries can be used to better understand the overall health and trends in the transactions being monitored. The BIDW data is a copy of production and is not in real time, so long-running queries can be initiated without concerns about impacting the live customer actions. Data may be loaded daily or weekly, depending on the data source. The data is kept at several levels to serve the different customers of the BIDW; summary data and dashboards are the most common outputs of a BIDW, but if needed, you can drill into the transactions.
It is reasonable that at this point you are not seeing a real difference between BD and BIDW, as both can contain transactional-level detail, but these two tools are typically used for very different purposes.
The following examples should make this difference clearer.