Posted by: Peter Scott | April 1, 2009

UNGROUP BY

For various reasons I have gained a reputation in data warehousing – whether it is from my older blog posts, speaking at conferences, or my role in a high profile BI consultancy, I am not sure.

One outcome of this is that I get asked to take part in various activities to promote new products; PR companies write (perhaps under the mistaken impression that I am a real journalist) to invite me to interview some CEO or CTO of a software company to discuss a new product; I also get invited to try new software, either at launch or as part of a beta program.

Recently, a software company (NOT Oracle, hence I write here on my personal blog, and not the company one) invited me to try their beta release BI database. There are a lot of innovative features but one really caught more eye as a one of boosting performance without breaking the hardware budget: the UNGROUP BY clause.

Readers of my postings on my employers blog, or attendees of my DW design courses will know that I am strong advocate of the principle that IO throughput is the biggest constraint on DW performance. You need the right capacity of disk, the right number of IO channels working at fast enough speeds (fibre channel, InfiniBand etc.), some vendors move some processing close to the disk so that smaller (skinny?) result sets are passed over the wire to the CPUs. Other database vendor-specific techniques centre on the use of segment compression and partition elimination to reduce the number of bytes brought from disk to answer the query. But for all database vendors the traditional way of reducing IO is to fetch back “summarised” data that answers the user’s query. But until now there was no way back from the aggregated data to detail. BUT, for this database vendor’s product. Is no longer the case.

Suppose we have a table of total sales by year for our international company and we need to get the yearly results for each country – simple we just execute:
SELECT country, sum(sales_value) from GLOBAL_SALES UNGROUP BY country
need those sales broken down by quarter as well then:
SELECT country, quarter, sum(sales_value) from GLOBAL_SALES UNGROUP BY country, quarter ORDER BY country, quarter.
“This looks magic, can we use it for any query?” Well there are some restrictions; the column names used must match the key column names defined in dimension objects (the next release will also support unique attribute matching), there must be a column in the table to be ungrouped for the parent key – that is, we we need an ALL_GEOGRAPHY column in table, we can’t rely on the fact that the column is not there to assume the total level. There is also a practical limitation on the number of levels you can drill to using this query – two levels. So ALL_GEOGRAPHY can drill to region and country, but not to state or city.
Under the covers we use a novel kind of composite “index” to assign a “proportionality” to the dimensional components, this is the really clever part as the index uses bitmap semantics to provide a compressed array of proportions.
Early days in testing this feature but I am impressed.

Late breaking news – Just been told that this is an April Fools posting and should not be used as part of a functional specification

About these ads

Responses

  1. Now I would have run this on the Rittman Mead blog, and actually said it was a secret new feature in 11gR2…

  2. :D

  3. Joking aside, we basically have this feature, although using it for sales would be a bit… silly.

    A more practical example is what we use it for — activity costing/accounting. So, if you’re spending $400k on a data-center and you want to know what the data-center is being used for, one way you could work this out — from a decision making perspective, is pull another feed, such as a list of servers per datacenter, and weight the data-center cost out per-server based on some criteria — for example, a combination of the physical size of the server, how much power it uses, etc.

    It isn’t very much more difficult than a single statement along the lines of the above — a kind of “WEIGHT BY Server.Size+Server.Power/1500″ type of thing. Then all the usual BI reporting and analysis is available on top of the ending model.

  4. Good one.. http://desperateindw.blogspot.com/2011/11/fooled-by-ungroup-by-clause.html

    Thanks,
    Soma


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: