Posted by: Peter Scott on: June 22, 2006
Over the years there has been a collective set of received wisdom about Business Intelligence in general and about data warehouses in particular. Some of these ideas are lead by the technology involved:
Others stem from business needs of the project sponsors:
Only recently has a holistic view a of corporate data warehouse taken hold:
But are these truths still valid?
Technology has certainly moved on since I first became involved in data warehouses (and that was sometime in the last century!) With SAN and NAS technology is possible to manage all of the data storage of a company as single unit and technologies such as Oracle RAC means that processing power can be added in by simply bolting another commodity processor to the storage network. Monolithic, multiprocessor, dedicated disk DW servers are now less likely to be specified for new DW developments than the inherently more scaleable RAC systems. A possible exception to this is the Data Warehouse appliance approach where a lot of the predicate processing of the query engine is pushed out to processors attached directly to the individual disk drives; an interesting trend in this type of machine is to use low cost, low power consumption devices (perhaps those developed for the game console industry).
Use small disks: DW systems are IO intensive. Very rarely do we target a single row from a fact table, we tend to read large numbers of rows and aggregate the results, even when we used indexed reads we tend to retrieve many rows, and often (because of the time based batch nature of our dataload process) they are clustered together in adjacent blocks. This means that our IO throughput depends on speed of the disk and the number of IO channels available to us. This is not necessarily an indicator that small is good but, rather, effective data striping across multiple drives (either on the storage array or by using Oracle ASM) is good. Of course, a single disk can only physically read from one location at a time and if too many reads need to access the same disk at the same time there is potential for an IO bottleneck. So, there is some truth in “small is good” but perhaps more truth in enough disks are good.
Next time I’ll write on OLTP versus DW and parallel processing. I may even stray into RAID5 as there is a popular misconception about that too!
[...] DW Wisdom [...]
June 24, 2006 at 12:54 am
So, there is some truth in “small is good” but perhaps more truth in enough disks are good.
Well said.
The days of saying, ‘We need X GB of storage space’ should be behind us. Database “capacities” should always, these days, be measured in spindles. “We need X spindles to achieve a given I/O rate”. If achiveing the right spindle count happens to mean you buy 300 times more disk space than you actually need, well – that’s just something you live with. Therea re always things like Backups, Flashback and Workspace Manager that will happily start consuming all that “spare” capacity anyway.
Put another way, the marginal cost for each GB of disk space these days is negligble, barely measurable in fact. But throughput is always at a premium, and it’s that you should be paying for.