Pete Scott’s random notes

One man’s meat is another man’s butchery product

Posted by: Peter Scott on: May 25, 2007

Like buses in London you wait ages for one to arrive and then three come along at the same time. I started writing this piece a while back and since I started Beth wrote on data quality, Jon Mead touched on conformed dimensions in a follow up to his posting on dimensions, and Andy Whitehurst of Xansa spoke about Master Data Management at yesterday’s UKOUG BIRT SIG meeting in London.

What I was going to write touches on all thee topics but is hopefully far enough removed from them not to be seen as mere band-wagon hopping.

Quality of data has always been key to successful data warehouses. In database terms this could be related to traditional constraints: unique, to avoid duplicates; foreign key, to ensure that each child has a parent; not null, to make sure mandatory attributes are present. So far, so understandable. But there is another form of duplication we need to avoid in data warehouses – the same item, different name duplicate. By this I don’t mean where a product is simply renamed (such as when Marathon bars became Snickers in the UK) but when you expand a data warehouse to take in data from other functional groups and find that part of the business uses completely different terms for their reference data, or the case of mergers and acquisitions where the two companies effectively referred to the identical items in different ways, perhaps with differing hierarchies. The challenge here is to create a common data model that allows the business to understand the data but also allows the proper amount of validation to ensure the the facts in the data warehouse add up.

5 Responses to "One man’s meat is another man’s butchery product"

Completely agree. I have just been doing a few days work with a company that has this type of problem, and I think it is going become more and more of an issue for DW projects. They are building a Warehouse that brings together data from a number of different subsidaries from all over the globe, most of them acquisitions. One of the main challenges is maintaining some of the key dimensions and their hierarchies (product and organisation) which seem hardly ’slow changing’. Oracle’s traditional MDM, like CDH seemed a bit like the nut and sledgehammer analogy. A more tactical solution is sought – however the key reqirement is that the source system owners have a mechanism to manage data that is centrally held in a controlled fashion. Would be grateful if you could forward on Andy Whitehurt’s contact details.

Jon

Jon – I have emailed you Andy’s details.

One of the key things to think about with MDM is the scope – is this a massive project to present a common data view throughout a business (ERP, CRM, PM & BI, supply chain) or is just a tactical fix to allow common reporting.

I agree that CDH is over-the-top for many data warehouses; de-duplicating customers is less of an issue for me than the (one-off) sorting out the higher level of the hierarchies when data comes from differing sources – but perhaps I don’t deal with enough e-commerce BI systems!

You’ve hit on a bit that (in my opinion) is becoming crucial in MDM and DW. Our group calls it “Harmonization of reference data.” In many cases it isn’t appropriate to go back standardize the source systems, yet there are operational (in addition to reporting) requirements that create a need to map the values to a common model.

A good MDM system should allow for this. As Jon points out you need a management mechanism, which implies some level of data governance policies as well.

Whether full-on MDM is required is definitely a matter of scope. (And of course, you could be dealing with Product or Vendor master data instead of Customer.) That said, we’re seeing an increasing level of interest in Information Integrity Assessment-type engagements, all with an eye to understanding what will be needed for MDM.

Beth – I agree – we IT folks are just looking after the business people’s data. This implies that the business should own the meaning of the data. There are several tools available (with varying degrees of success and or sophistication) that allow ownership and management of business data to go to where it belongs – and that is not with IT!

I also agree that going back and fixing the source is not always feasible for many reasons – size of data set involved, accountancy rules and the like, but drawing a line in the sand and harmonising from a given point is often plus from fixing up a reporting issue.

Most of the MDM problems (from practical experience) I have come across has been with the product dimension and not with customers :-)

Jon and I were discussing this earlier – I mentioned to him that Oracle recently made a podcast available (MP3 format) on their MDM initiative – interesting stuff if you’re wondering what their strategy is:

http://feeds.feedburner.com/~r/OracleAppcast/~3/114852604/5471953.mp3

regards

Mark

Leave a Reply