In a recent post David Aldridge discusses an article on Oracle Myths that Mike Ault posted on searchoracle.com. One myth that got particular mention was the placing of tables and their indexes in separate tablespaces. Now David and I 'do data warehouses' (he is by his own admission a data warehouse "engineer", and I am, perhaps, an engineer's oily rag) so our (collective but separate) experiences could well be different to those of OLTP people (this is the 'it worked for me, but mileages can vary, so test it yourself' bit)
Last year I was asked to manage the enhancement and migration of a legacy data warehouse (the code was written for the first release of 8i). The old DWH sat on a 24 processor Sun box with 4TB of raw Oracle datafiles. This 4TB of disk was physically a mixture of 9Gb & 18 Gb drives (the largest available at the time). As we were moving to a new box on a SAN we had the luxury of being able to build the new system without loss of production service, it also meant that we could fundamentally redesign the way we did the things that hurt us on the old system. We had contemplated moving straight to Oracle 10g but the customer's front-end query tool was only certified for Oracle 9.2 and as vendor support is important to this customer we were compelled to stick with 9.2.
The original system stored its fact data and aggregates in weekly partitioned tables, the largest of these were sub-partitioned ( 4-way hash on product code). Each partition or sub-partition sat in its own tablespace as did the partitioned indexes. Storing 115 weeks of history meant we had between 230 tablespaces (table + index) and 920 tablespaces per table; no wonder the DBA team hated to change things!
My high level redesign approach was:
use the SAN to stripe the disks – why should the DBA team have to work out where to put individual data files.
use locally managed tables with uniform extents
get rid of most of those tablespaces
use multiple files (on different IO controllers) per tablespace
change to daily partitions and remove the sub-partitions (I'll post on this later)
and (at last getting to the point) put the indexes and the tables in the same tablespaces.
Most of the above went by on a nod from the systems team, the partitioning change was also easy to prove to be beneficial for the majority of user queries captured on the current system. But was there resistance to moving away from the "let's keep the indexes away from the tables" stuff! 'We'll get disk contention', 'All the users will hit the same disk at the same time'. So with patience and a flip-chart I sat down with then and worked through the arguments:
Four years experience on the old DWH shows that very, very few queries hit the same tables at the same time. In addition, on the new DWH these tables will be spread across multiple files by a round-robin process and the files further striped across multiple disks by the SAN so even if two users hit the same table at the same time they are now less chance that they physically hit the same disk a the same time
Our index strategy is to use bitmaps on the dimension keys for fact data. To select a given set of rows a query is likely to read multiple bitmap indexes, combine the results sets and then go to the table to read those rows. Processing the indexes and table has to be sequential so can not contend for resource. There is the augment about the disk heads having to move more, but as posted in David's piece in a multi-user system it is feasible that the disk heads have moved anyway!
And my 'ace' card. "Do it my way, and benchmark it in the parallel run stage – if I am wrong we can move the indexes before go-live and try again"