I have written recently about the trend towards ever larger databases. But what should we call these VLDB systems?
A few years back a set of terms seem to fit well, DSS, MIS, data warehouse, datamart; sometimes the distinction between them was very blurred. But by and large we were dividing the reporting from the transactional. Today things are not so clear cut, as well as the hybrid dashboards and scorecards driven from a mixture of live transactional data and historic information we also storing a whole class of data that doe not really fit in the “data warehouse drives the reporting system” world.
Today vast tracts of raw data can actually be used, but not by traditional BI reporting applications. For example RFID tracking, DNA sequences, weather records, call records from telco, particle physics experimental data and text storage systems all produce what is essentially information that can not be aggregated and reduced in complexity. It is also the type of data where we are interested in discrete records or possibly the interaction between records. But finding the records of interest quickly is essential if this type of application is to have any use. And if we can’t use it why store it (to paraphrase a recent comment on this blog)?


Have you actually started using live operational data in your projects? I’ve been thinking about this and the problem of data quality seems to be important and difficult to handle on real-time or near-real-time data.
By: Michael on March 12, 2007
at 8:51 am
We apply some operational data, but not the fast and furious kind that comes from some potential sources
Data quality can be a big problem. I just hate the idea of storing bad data in a data warehouse.
I tend to validate that the dimensional keys already exist, but not check the individual fact values for sense. Anything that fails this simple test is ‘parked’ for further investigation / fix. But as the loads become more and more frequent and the techniques to apply new data become more complex to minimise the impact of applying data we probably get to a stage where it not possible to catch up on missed data. But really that will be a business choice, not a technical one; if the business must have it, we must work out how to.
By: Peter Scott on March 12, 2007
at 7:18 pm
This seems to be a persistently interesting question. I wrote an article about this topic several years ago (check it out here –> http://www.craigsmullins.com/zjdp_010.htm)
By: Craig S. Mullins on March 19, 2007
at 1:16 am
[...] onto a ‘size’ issue, we now have Pete Scott asking ‘So what do you call large databases?’ Answers on a postcard [...]
By: Log Buffer #36: a Carnival of the Vanities for DBA’s « Lisa Dobson's blog for all things Oracle… on March 30, 2011
at 2:15 pm