Posted by: Peter Scott | February 5, 2007

Summary table design

Most data warehouses used for reporting and built on conventional database technology such as SQL Server, Oracle and DB2 use some form of summarisation to boost query performance. In principle we doing once in the batch load and refresh process what would need to be done for each query at run time if that summary did not exist. The key benefit of the summary is the saving of IO required to read the detailed information, and possibly the IO associated with a the sorting of a large volume of data as part of the aggregation

But how do you choose the summary tables you need? Obviously, if the business only ever runs one report, then a single specific summary to drive it would be ideal. But most DW systems are used for significant ad-hoc query load and choosing the best aggregates is not quite so simple; building a summary for every possible access path is not going to be viable.

We are fortunate that modern technology can help us mitigate the need for an excessive number of aggregates. Many query tools have ‘aggreagte awareness’, that is, the ability to map a query to different source table if that would result in a more efficient extraction of data from the database. And Oracle has the ability to transparently rewrite queries so that they access more efficient summaries; in Oracle 8 & 9 this was limited to Materialized Views, but since Oracle 10g other rewrites could also be defined. With either approach we use the best available summary which is not the same thing as having an exact match summary.

So back to our question, how do we choose the summary tables we need? The first thing to consider is what are users likely to want to know; users often have specific business interests which will make certain combinations of dimension level unlikely, for example in retail not many people interested in product supplier also have a great interest in specific named customers but they might well have an interest in customer demographic. This type of analysis allows us to discount the more implausible combinations. But we still will arrive an over large list of summary candidates.

Here a good approach is to estimate the size reduction between the aggreagte and its source table and, perhaps, weight this by a predicted frequency of use. If the summary is not appreciably smaller than its source it is probably not worth building; that is the IO from hitting the summary is not that different from doing the calculation from source. I think that a size ratio of 10:1 is borderline and good candidates should be 50:1 or better. The only real way to calculate the size ratio is to use queries against the data set, remember that the source data is probably sparse along several dimensions and the aggregation reduces sparseness. And the third thing I consider is that choosing a aggregate that can answer three queries with simple further aggregation is probably better than building an aggregate for of those three queries.

So in summary, good aggregate tables should be considerably smaller than their source fact tables. If the size reduction is not significant then consider the use of suitable indexes instead

About these ads

Responses

  1. Pete – this post must be in good timing. We are currently on 9i and are in the process of building summary tables – Since our OLTP system has numerous tables the summary tables are smaller and go against fewer tables. We have had one issue though – we planned on using mviews with query rewrite enable and refresh automatic – thr problem we encountered is mviews error for complex queries – what would be out options in this case for building summary tables (refresh on demand may be an option, but business would prefer realtime data) – any input will be greatly appreciated.

  2. Vidya:
    I know David Alrdridge (http://oraclesponge.wordpress.com/) has posted a lot on mviews in the past and is worth reading.
    I think your question is worth more than a quick reply here so I will write a new post on that subject


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: