Pete Scott’s random notes

DW Design (part 1)

Posted by: Peter Scott on: June 28, 2006

“Which comes first, the physical or logical design for an Oracle data warehouse?” has always been a question that I have struggled to come up with a convincing answer. To my thinking they both develop simultaneously. True, when designing a data warehouse there will be a set of design criteria that define the bounds of the project. Typically initial sizing parameters will be known, we know how many years of history we need to access, we know how many items of reference data we need hold and its projected growth rate, we also know from whence our source data comes and whether it is in our design remit to cleanse it. We may have knowledge of the division of time between data load, user access and the essential routine maintenance tasks such as backups. There will be many unknowns (such as summary sizing) but we have probably enough to start an iterative initial design.

For a long while I have favoured a three section data warehouse design: a staging area where raw fact and reference data is validated for referential integrity, a third-normal form layer to hold the reference data and historical fact, and finally a presentation layer to hold denormalised reference data and aggregated fact. The staging layer is ‘private’ to the data warehouse but user query access (subject to business security rules) to other layers is permitted. In some cases it will not be possible to use a denormalised layer; but if you can use one, you should.

In the next part I will look at staging in more detail.

4 Responses to "DW Design (part 1)"

Hi
I have a data source which does not have a “updated time”, what can I do for incremental loading?

Thanks

You could load all of the data to a stage table and then use a set operator such as SQL MINUS to find the records that are either changed or new. The subset generated can then be used to update or insert into the target table – if you have a merge or upsert operator available the whole update could be done in a single pass.
I would also add a time of change marker to the data being loaded, it can be quite useful

I am new to DW. I don’t know should I use oracle data warehouse builder or write my own ETL program? based on performance and project time line consideration. (I just start reading oracle DW builder.)
If I write my own code, will it be faster if I export the source table to flat file and compare two files? or use use SQL MINUS?

Many Thanks

Oracle Warehouse Builder can be a very productive tool and well worth learning to use. It can take a bit of getting use to, but once you get the hang of it can be very rapid to build ETL maps. Writing your own ETL is certainly possible, but the PL/SQL generated by OWB is of very good quality – I would think about the amount of code I needed to write versus the effort to install OWB and learn to use it, most Oracle DW projects would pay for the effort to use OWB pretty quickly though.

Exporting to flat file depends on where the tables are and the speed of the links between systems. Where the source and target are both Oracle databases and the links are fast, I might well use the source database directly.

Leave a Reply