Two statements
- Modern query tools allow multiple data sources to be interrogated in a single query => there is no need to consolidate information in a monolithic data warehouse
- The size of databases are increasing. Last year’s Wintercorp survey illustrated it very clearly, as did a straw poll during a VLDB presentation at UK OUG 2006 that showed around 50% of the audience had databases over 5TB, a significant increase in proportion since the presenter last asked the same question a year or so ago.
So on the face of it we have a conflict – “you don’t have to have everything in the same place to query it” and “people are storing more in single locations“. But in reality this is not a conflict at all, it is far more subtle than that
Query tool vendors realise that in an enterprise many disparate information systems exist, some at the global or regional corporate level, some existing just within one operating unit and yet others are are point solutions at a personal or at most departmental level; coupled with the notion that many organisations have a hotch-potch of technologies from various mergers and acquisitions or incarnations of a corporate IT strategy and the challenges of consolidating data in a single place become evident. Joining data on the fly has many attractions. But there are many counter arguments to this approach
- OLTP sources such as ERP systems are optimised to give fast and robust access to single row data. The systems are not designed to trawl through masses of data
- Retrieving large data sets and then aggregating them away in a user query tool can waste precious network bandwidth – is it not better to move and aggregate just once?
- Many operational systems have only a sense of ‘now’ – the amount of history is limited
Furthermore, legislation (SOx, accounting rules, national security etc.) may mandate long-term storage away from the transactional systems. I suspect the outcome is that the data warehouses will not become extinct, they will just evolve into data warehouses!
The data warehouse is dead, long live the data warehouse


Well said! errr… I think I get what you mean!
By: Nicholas Goodman on November 19, 2006
at 3:55 am
I’ll give special support to your third bullet point because the amount of analysis required to determine a historical point-in-time status for OLTP data is definitely non-trivial.
I’d add a fourth point — that different systems almost inevitably cannot be directly joined without some form of code translation or worse still data cleansing, and embedding such logic in a query tool ranges from the extremely difficult to the not possible.
There are many more of course — ware we going to start running parallel queries during regular work hours against OLTP tables, with the resultant checkpointing every time a table is queried? Are we happy to share our sensitive OLTP system’s shared pool with non-sharable DW/DSS queries? Are we going to throw away partition pruning?
I think not.
By: David Aldridge on November 19, 2006
at 6:47 am