Pete Scott’s random notes

Indexing the unusual

Posted by: Peter Scott on: March 10, 2008

For many years I had an interest in non-standard indexing and exotic data types, that is things that weren’t NUMBER or VARCHAR2. In fact before I came in to data warehousing I was involved in indexing free text such as conversation transcripts and and narrative reports; some of this was pushing the technology of the time, but was as achievable. As Noons pointed out in passing on a response to yesterday’s piece technology moves on and we will soon have to resolve the challenge of developing new indexing techniques to cope with the grossly unstructured such as HD Video and recorded sound.

I briefly discussed this last year on my blog and probably also over at Datageekgal’s blog (and congratulations Beth on the new job!) The indexing needs of the security services, medicine and a whole host of organisations that need to index patterns within a LOB type object will spawn some pretty clever indexing methods, and hopefully some of those will become accessible through database vendors products.

In a way there are some similarities with data mining, except that a LOB could (I think would) contain bit patterns for more than one index key value. We are probably talking about non-unique indexes, as for non-archival purposes researchers are usually concerned with finding similar records.

But one good thing about the need to index LOB contents is that they are usually non-volatile, a recorded conversation or a DNA sequence is historical fact and is not going to change so perhaps index updates are not going to be important. Most of the building blocks to do this type of indexing are already available (especially if we choose to create an index of the “index” by using some form of indirect table approach) the only bit to do is to write the domain specific code to identify the keys values in the LOB… hang on that’s the hard bit!

Data Warehouses are not dead, yet

Posted by: Peter Scott on: March 9, 2008

One of the reasons that I left my old job to work with Rittman Mead Consulting was to get back to technology. Don’t get this wrong, I really did enjoy managing people and having responsibility for a BI practice, but that was at the expense of involvement in a lot of the delivery of systems; for every one day of project work, I tended to spend eight doing “other stuff”, and I was getting to miss the buzz of being at the sharp end of a project, well maybe not missing the 2:30am coding whilst eating a slice of pizza stuff (which is a young person’s game)

Recently, I have been set the challenge to grow the amount of data warehouse and data quality work I am involved in and perhaps at the same time help dispel the perception that Rittman Mead Consulting just do short engagements and training; in the three months or so I have been here I have spent just three days delivering training and 2.5 months on-site with a single customer developing a data warehouse, so that notion is a bit of a myth anyway. Maybe the new Rittman Mead web site will feature some of the longer term type of engagements we are involved in.

I suppose a good question to start with is “do people still develop data warehouses?” Mark brought up a similar idea the other week in his ‘future BI architecture’ piece which also attracted some good comments. Indirectly, I mentioned alternatives to DWs in my article on Real Time BI for the Evaluation Centre – I hope to either post the article here on the blog or perhaps on the Rittman Mead web site in the next few days. A lot of organisations have already invested in data warehouses, these will continue to need attention, either from the support perspective or for enhancement as new classes of information are added. Data warehouses still have a place in BI as way of facilitating the delivery of quality data to the reporting layer in an efficient way so I can’t quite see the death of the Data Warehouse just yet.

OLAP and summary management

Posted by: Peter Scott on: March 8, 2008

So, why is someone who normally writes about data warehouses going to talk about Oracle OLAP, and in particular, cube organized materialized views?

I could run out the argument that I work in a BI consultancy and none of us here at Rittman Mead are 100% aligned to a specific expertise area; for example I do OBIEE and OWB as well as data warehouse design and performance. And besides, in my past I did look after one of the largest Oracle Sales Analyzer systems in Europe (ROLAP reporting on a couple terabytes of relational data), so there is quite a bit of Oracle Express Server (and OLAP DML) lurking in my past. But in reality my talk is going to look at the use of OLAP Cube Organized Materialized Views in the context of summary management and that, of course, is right at the heart of what I do!

Summary management has always been one of those ‘art form’ areas of DW/BI – which summary tables to build and how many of them are needed to balance performance, maintenance time and space usage. Query rewrite was great step forward in Oracle 9.2 (OK, it was there before, but I tended not to use it) at last, I only needed to map a single fact table (the base table) into the query tool, and for some third-party tools that was a significant thing. And now with Oracle 11g, the ability to transparently rewrite a query against the base fact table to a OLAP cube is a significant advance, it means that I needn’t think about the materialized views I need create since the cube effectively contains them all!  This is a bit of a simplification of course and I will say a lot more in the 50 minutes or so of speaking slot – so if you are Collaborate 08 – and that is just a month away, you could get to hear the whole talk.

Meetings and blogs

Posted by: Peter Scott on: March 5, 2008

Mark Rittman mentions a couple of the meetings that I will be attending in the next few weeks; the first is  UKOUG BI & Performance Management event in Central London which is just a stone’s throw from an old office of mine, so at lunchtime I will be throwing stones at my old office. No, in fact I will be with all of my colleagues on the new Rittman Mead exhibition stand, so if you are at the event – (and there are still a few places left) why not drop by and say hello. Mark says I will be talking about data warehousing, but I am happy to talk about any BI topic, just as all of my colleagues can talk about DW. This event will also be an opportunity to meet up with my new colleague Borkur Steingrimsson – we spent a couple of weeks working together on a project in Iceland earlier in the year, but I suspect that it was too cold and dark for him so he went back to Belgium.

The other event Mark mentions is Collaborate 08 in Denver. This is going to be my first conference presentation in the USA, which fills me with a mixture of emotions (including fear!) – I am on first thing on the last morning. Truth be known I am looking forward to presenting, but even more so the chance to meet up with some new people, perhaps even readers of this blog – I know the majority of my readership (according to Statcounter) come from US based domains. So if you see me there come up and say “hi”.

Which brings me back to subject of the blog. I am seriously thinking of moving the technical content over to the company web site, maybe keeping the non-BI stuff and the truly random bits here, so in effect running this a personal blog. Looking at referrer stats on WordPress, most referrals come from blog rolls, feed readers and aggregators. But there also a few articles that very frequently get referred to; in number one spot is Doug Burns who linked to my small series on deadlocks, followed by Tom Kyte on test cases then Nicholas Goodman who linked to my series on data warehouse design and data modelling. So perhaps leaving the content here but putting a copy up on the company site is the best option

Other writing

Posted by: Peter Scott on: March 1, 2008

Over the past couple of months I have been away from home a lot, the sort of working in another country away from home stuff; and staying a hotel with a relatively slow Internet connection prevented regular blogging forays. Oh and the long on-site days and the need to eat at night and other writing…

I have already mentioned the piece for Collaborate 08 on cube organized materialized views; and will write more on the topic later this month, but I have also written a piece for the Evaluation Centre on real time BI see here (registration required)

Any way I am home for a while, so over the next month I will probably get some time to write a few more pieces for the blog and perhaps even get time to drop by the Oracle forums again.

Coming home

Posted by: Peter Scott on: February 23, 2008

Well the draft Collaborate white paper is written, I think I’ll take the evening off and pop around the corner for a burger, and maybe go for a swim in the morning. I am staying about 5 minutes walk from a large open air pool, and even if it is snowing it will be warm in the water. Then after a few final touches on the paper hit that submit button

To be honest, I enjoyed researching it. I used to do quite a lot of Express (that’s the old standalone OLAP server) in a past life, and 11g OLAP was a bit like coming home, but of course not like home as I was really looking at (and being impressed by) the new cube organised materialized views

And talking about coming home; the project here is drawing to a close, and Friday night I should be back home, it has been a long trip. But if you need any BI consultancy in March give me a shout!

Oracle OLAP, partitions and time

Posted by: Peter Scott on: February 20, 2008

One of the reasons I am not writing too much on the blog at present is that I am busy working with a client just about to put a new (small) data warehouse into production. Coupled with the need to concentrate on submitting a white paper to Collaborate 08 by the weekend and I am not getting too much free time. I write this whilst an OLAP cube organised materialized view is building on a VM on my laptop.

One side note from my research on 11g OLAP is about partitioning the OLAP cube along the time dimension.

Often organisations have multiple hierarchies for reporting time, financial reporting may be aligned to a 13 x 4 week period and each week starting on a Saturday and the year starting in August, and other reporting going against the civil calendar. Of course “the day” is common to both calendars and you could build both calendars as a part of a single time dimension.

But if you put both calendars into a single time dimension it may get interesting when you decide to build an OLAP cube. If both hierarchies are used in the same cube and you elect to partition on the time dimension you will only be able to partition on one of the two hierarchies. You specify a granularity for the partition, say fiscal quarter and the fiscal periods, weeks and days will be divided into partitions based on the fiscal period they belong to, fiscal years go into a sort of ‘cap’ partition, that is the one used to catch the members that don’t belong in the other lower partitions. But what of the sales for June? June is not a member of  any fiscal quarter so that too goes into the cap partition… which could be very bad for performance.

So don’t partition on time, or build separate cubes for fiscal and calendar reporting.

I have noticed the partition advisor in OLAP 11g AWM can give some odd partitioning advice. I had a two hierarchy time dimension for some tests I am doing for a presentation but elected only to aggregate on the civil calendar. But the partition advice was to partition on the fiscal (the default hierarchy, which did not have any aggregates specified for the cube

Almost started OLAP 11g…

Posted by: Peter Scott on: February 16, 2008

The proposal deadline for the upcoming Collaborate 08 was shortly before I was to leave my old employer – it was already known that I would be leaving them to join Mark and Jon (and possibly, it was known before Rittman Mead Consulting existed!) so it came as no real surprise for Mark to ask me to consider putting forward a presentation.

Moving employers meant that I could not really present some sort of “been there, done it” case study on some exciting piece of technology that I have put together for an old customer – it is just not right to pass things off as fully your own when you had a team to back you up (just as it not right to not credit the people who have helped you) So I took the brave approach, I looked at what was new in Oracle 11g for BI and found a topic that was “now, I think I could have put that to good use” and base my proposal on that. So I chose Cube Organized Materialized Views, and when my paper was accepted started to work… slowly, as I still needed to be out there earning money.

To be honest it is scary stuff to make a proposal based on something you have not even used yet, but it does focus the mind no end. I started off with my 64-bit Oracle 11g database on Oracle Linux and decided to go with the SH schema as it was quite big, then ran into a few problems and bugs, like it did not seem to like building a cube for data belonging to another owner, then the ‘A’ OLAP patch came out, but not (at that time) for 64 bit Linux, so I decided to build a 32 bit VM and go 32 bit and apply the new patch for the server side, and the new AWM client. Of course this takes a while to do. And then I still had a few problems (new ones!) which needed to be worked through. I decided to have a look at the stuff Mark wrote about in his initial glance at Oracle 11g OLAP and repeat his work with the global schema – this too was not straight forward as the schema download that I found needed a bit of 11g-erising to get the schema load scripting to work. After finding the last few remaining privileges to grant I succeed to build my first cube organised m-view. But query re-write eluded me, perhaps because the ‘A’ patch has fixed the CBO to not over favour cubes so much when straight relational is faster.

So, I have gone back to SH and managed to get the cube built, and rewrite to work too! – so more on that another time.

OWB map testing

Posted by: Peter Scott on: February 16, 2008

I am still overseas working on a OWB project for a client. We are now into that final phase of checking everything before we promote the first release to the production system. Each OWB map for the first release is redeployed on the development system and then test data run through the map – we then check that the right number of rows (and content too!) appears at the right target. This is painstaking, but necessary work. Where we have problems we track them down, resolve (redeploy if we need to alter the mapping) and then repeat the test until the problem goes. When we get all of the code in the first release working, we go through the loop again for all of the maps to see the affects of using the SCD2 functionality in the dimension load plug-in (and yes, we have applied that patch to remove the horrendous “compare effective dates as text strings” coding that somehow crept into OWB) We are looking to make sure that execution times don’t go off on the update rather than pure insert operations of the first round of testing

Which brings me to a nice OWB 10.2 feature in trouble-shooting complex maps. The ability to generate code for indivdual mapped object input or output groups; Click on the code generator icon on the map editor and click on a drop down on the code window to change the mode then click on a mapped objects grouping bar. If I loose one or two rows in a mapping (or even all of them) I just cut and paste the generated code into SQLDeveloper and check that each output is doing the right thing

Oracle Data Integration Suite

Posted by: Peter Scott on: February 5, 2008

Keeping my head down building Oracle 11g OLAP cubes for research and self-education meant the I missed yesterday’s product announcement from Oracle, but with the wonders of Blog aggregators (and in particular Beth’s) I spotted a mention on Vincent McBurney’s blog of the newly announced (and available)  Oracle Data Integration Suite.

This is one of the fruits of the recent purchases by Oracle of Hyperion (Data Relationship Manager), Tangosol (Coherence *) and Sunopsis (Data Integrator) and with a bit of Application Server, BPEL and Enterprise Service Bus thrown in and the ability to use an embed data quality and profiling product from Trillium *

* Coherence and the data quality options are add-on to the base ODI Suite

This looks interesting, I might write more on this later