Posted by: Peter Scott on: May 25, 2007
Like buses in London you wait ages for one to arrive and then three come along at the same time. I started writing this piece a while back and since I started Beth wrote on data quality, Jon Mead touched on conformed dimensions in a follow up to his posting on dimensions, and Andy Whitehurst of Xansa spoke about Master Data Management at yesterday’s UKOUG BIRT SIG meeting in London.
What I was going to write touches on all thee topics but is hopefully far enough removed from them not to be seen as mere band-wagon hopping.
Quality of data has always been key to successful data warehouses. In database terms this could be related to traditional constraints: unique, to avoid duplicates; foreign key, to ensure that each child has a parent; not null, to make sure mandatory attributes are present. So far, so understandable. But there is another form of duplication we need to avoid in data warehouses – the same item, different name duplicate. By this I don’t mean where a product is simply renamed (such as when Marathon bars became Snickers in the UK) but when you expand a data warehouse to take in data from other functional groups and find that part of the business uses completely different terms for their reference data, or the case of mergers and acquisitions where the two companies effectively referred to the identical items in different ways, perhaps with differing hierarchies. The challenge here is to create a common data model that allows the business to understand the data but also allows the proper amount of validation to ensure the the facts in the data warehouse add up.
You’ve hit on a bit that (in my opinion) is becoming crucial in MDM and DW. Our group calls it “Harmonization of reference data.” In many cases it isn’t appropriate to go back standardize the source systems, yet there are operational (in addition to reporting) requirements that create a need to map the values to a common model.
A good MDM system should allow for this. As Jon points out you need a management mechanism, which implies some level of data governance policies as well.
Whether full-on MDM is required is definitely a matter of scope. (And of course, you could be dealing with Product or Vendor master data instead of Customer.) That said, we’re seeing an increasing level of interest in Information Integrity Assessment-type engagements, all with an eye to understanding what will be needed for MDM.
May 25, 2007 at 11:25 pm
Completely agree. I have just been doing a few days work with a company that has this type of problem, and I think it is going become more and more of an issue for DW projects. They are building a Warehouse that brings together data from a number of different subsidaries from all over the globe, most of them acquisitions. One of the main challenges is maintaining some of the key dimensions and their hierarchies (product and organisation) which seem hardly ’slow changing’. Oracle’s traditional MDM, like CDH seemed a bit like the nut and sledgehammer analogy. A more tactical solution is sought – however the key reqirement is that the source system owners have a mechanism to manage data that is centrally held in a controlled fashion. Would be grateful if you could forward on Andy Whitehurt’s contact details.
Jon