For various reasons I have gained a reputation in data warehousing – whether it is from my older blog posts, speaking at conferences, or my role in a high profile BI consultancy, I am not sure.
One outcome of this is that I get asked to take part in various activities to promote new products; PR companies write (perhaps under the mistaken impression that I am a real journalist) to invite me to interview some CEO or CTO of a software company to discuss a new product; I also get invited to try new software, either at launch or as part of a beta program.
Recently, a software company (NOT Oracle, hence I write here on my personal blog, and not the company one) invited me to try their beta release BI database. There are a lot of innovative features but one really caught more eye as a one of boosting performance without breaking the hardware budget: the UNGROUP BY clause.
Readers of my postings on my employers blog, or attendees of my DW design courses will know that I am strong advocate of the principle that IO throughput is the biggest constraint on DW performance. You need the right capacity of disk, the right number of IO channels working at fast enough speeds (fibre channel, InfiniBand etc.), some vendors move some processing close to the disk so that smaller (skinny?) result sets are passed over the wire to the CPUs. Other database vendor-specific techniques centre on the use of segment compression and partition elimination to reduce the number of bytes brought from disk to answer the query. But for all database vendors the traditional way of reducing IO is to fetch back “summarised” data that answers the user’s query. But until now there was no way back from the aggregated data to detail. BUT, for this database vendor’s product. Is no longer the case.
Suppose we have a table of total sales by year for our international company and we need to get the yearly results for each country – simple we just execute:
SELECT country, sum(sales_value) from GLOBAL_SALES UNGROUP BY country
need those sales broken down by quarter as well then:
SELECT country, quarter, sum(sales_value) from GLOBAL_SALES UNGROUP BY country, quarter ORDER BY country, quarter.
“This looks magic, can we use it for any query?” Well there are some restrictions; the column names used must match the key column names defined in dimension objects (the next release will also support unique attribute matching), there must be a column in the table to be ungrouped for the parent key – that is, we we need an ALL_GEOGRAPHY column in table, we can’t rely on the fact that the column is not there to assume the total level. There is also a practical limitation on the number of levels you can drill to using this query – two levels. So ALL_GEOGRAPHY can drill to region and country, but not to state or city.
Under the covers we use a novel kind of composite “index” to assign a “proportionality” to the dimensional components, this is the really clever part as the index uses bitmap semantics to provide a compressed array of proportions.
Early days in testing this feature but I am impressed.
Late breaking news – Just been told that this is an April Fools posting and should not be used as part of a functional specification