Posted by: Peter Scott | March 23, 2006

ROW_NUMBER() rules!

Despite working with DW systems for some while, the use of Oracle Analytic functions on a production system seems to have passed me by. Not because they don’t work or are slow – far from it, but because the business questions I usually get asked to look at typically can be solved with ‘more traditional’ SQL.

Today a customer asked us to help write a ‘price list’ for their stores based on the products NOT stocked at each store, a sort of ‘special order’ pricelist. The problem for the customer was that price was not an attribute of product but of a lower level member of the product hierarchy, the stock unit. These stock units could represent differing pack sizes and perhaps price-marked and offer-marked stock. The customer’s business rule was to select the price of the stock unit that had the highest price for the smallest pack size for any given product (and no, this was not to maximize income!)

To my mind this seemed a great use of analytic functions and in this case ROW_NUMBER. ROW_NUMBER() like ROWNUM enumerates the rows in a result set, however unlike rownum the count resets to 1 on each change of partition key. By ordering the result set in each partition we can fulfil the customer’s business requirement and force the rows that we are interested in to have a ROW_NUMBER() of 1. An example query would be:

SELECT PRODUCT_ID, PRICE from
(
SELECT P.PRODUCT_ID, S.PRICE,
ROW_NUMBER() over (PARTITITION BY P.PRODUCT_ID
ORDER BY S.PACK_SIZE, S.PRICE DESC) RN
FROM
PRODUCT P, STOCK_UNIT S
WHERE P.PRODUCT_ID = S.PRODUCT_ID
)
WHERE RN = 1;

This code is amazingly fast to run an a lot easier to write than more conventional SQL. As someone said “Analytics Rock”

About these ads

Responses

  1. … And here is another example of row_number in action.

    Analytics Rock indeed.

  2. Nice example, but I wonder whether you could generalise a bit: what is the difference between one of these ‘analytics’ and what you earlier called ‘more traditional’ SQL? What is it about the one that distinguishes it from the other?

    I suppose my real point is: when do I know when to not try and do something in traditional SQL, and switch to using analytics instead?

    Or is there no answer to that question? (Which is OK if so).

  3. Howard,
    It’ usually very easy:
    if the old SQL does several sorts, or uses temp table transforms to do the job, then analytical functions can be a very nice, faster and way more elegant solution.

    rgds

  4. Howard,
    Perhaps the best way to think about analytics is that they work on ‘windows’ over the data.
    In concept the PARTITION BY clause is similar to a GROUP BY except it only applies to the function and not the select as a whole. Thus you can mix aggregated and non-aggregated data in the same row of a select – very useful for sub-totals and counts

    In my example the more traditional approach would be to construct a list of stock items with the smallest pack size, then filter that to get the highest price and then join that result to the product table to get the list. I would probably end up joining to my stock_unit table three times and have two group by clauses in the sub-queries.

    But to answer you question: Don’t use analytic functions to replace simple SQL (including simple aggregates) but do think about them if you have correlated sub-queries that use aggregates.

  5. Analytics allow you to analyze a row as it relates to a partition of the result set that the row resides in.

    Analytics become useful when you want to get aggregate information without squashing the results. In these situations, “standard sql” usually requires you to do at least one pass of the data to calculate the aggregate information and then another pass to match up the aggregate information with the correct details.

    In Pete’s case, “standard sql” would require him to:

    1) Select the smallest stock unit for each product.

    2) select the biggest price for each combination of product and stock unit.

    Something like:

    SELECT
    s.PRODUCT_ID,
    s.PACK_SIZE,
    MAX( s.PRICE ) PRICE
    FROM STOCK_UNITS s
    WHERE ( s.PRODUCT_ID, s.PACK_SIZE ) IN (
    SELECT
    s1.PRODUCT_ID,
    MIN( s1.PACK_SIZE )
    FROM STOCK_UNITS s1
    GROUP BY s1.PRODUCT_ID
    )
    GROUP BY s.PRODUCT_ID, s.PACK_SIZE

    For this particular problem, we could jam the pack size and price into a single string in a particular way and arrive at the same result with a single pass of the STOCK_UNITS table, but it would become less readable and less maintainable and this type of option is not always available.

  6. Thanks Bob
    In fact for the ‘real’ version of this problem there were some other attributes to be returned by the query and another table in the join. This made analytics very atractive.

  7. I spot at least one Wiki article there, if anyone would care to write it!

    All I’ve ever been able to say about analytics is that they wield fiendishly tricky syntax to achieve much more cheaply what multiple sub-selects and aggregations would normally be required to achieve -more ghassan salem’s kind of answer than anything else, I think, but not the kind of excited enthusiasm for them that inspires people to try them out, I think!

    My point in asking the question of those that actually use them for real is that I think if the “set of analytics” could be better defined, or if at least members of the set could be clearly identified; and/or if the real-world circumstances in which they can be deployed could be better described, more people would use them more often with a greater degree of confidence.

    Just my view of things from below, I guess.

    Thanks for all the comments though!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: