Posted by: Peter Scott on: February 11, 2007
The other day I was asked about the approach I took to refreshing materialized views. To be honest I use my own techniques – others do similar things.
In a data warehouse summary table very little actually changes; there is perhaps a small window were new data is added, but even allowing for late arriving data the vast majority of data is unchanging. Perhaps this is untrue with type 1 SCD summaries where items may migrate between parents, but the SC part of SCD is slowly changing so this may not occur that often.
In the data warehouses I run, we tend to rely heavily on range partitioning on date. This gives improved maintainability, and perhaps more importantly allows us to use partition elimination to boost query performance by reducing the amount of data to be manipulated. It is also the key to the way we maintain mviews. We simply do the following for each partition to be refreshed:
The query we use to refresh the partition is often not the same as the one used in mview definition. Doing this allows us to present a whole suite of mviews as being defined from a single master fact table, but can take advantage of the other summary tables to minimise the need to constantly repeat the same aggregations (if we aggregate by both state and by region then it a lot quicker to roll-up state into region than having to go back to the store level fact table again). Making the mview suite appear to have single fact source solves two problems with mview nesting:
Of course, this is a simplified outline of what goes on – compression can be applied on data load as we are using direct path techniques, parallel may be an option but you then need to think about how much data you are loading as each parallel process will direct path load to its own database blocks and you can quite easily waste space
Others have said