Posted by: Peter Scott | March 11, 2007

So what do you call large databases?

I have written recently about the trend towards ever larger databases. But what should we call these VLDB systems?

A few years back a set of terms seem to fit well, DSS, MIS, data warehouse, datamart; sometimes the distinction between them was very blurred. But by and large we were dividing the reporting from the transactional. Today things are not so clear cut, as well as the hybrid dashboards and scorecards driven from a mixture of live transactional data and historic information we also storing a whole class of data that doe not really fit in the “data warehouse drives the reporting system” world.

Today vast tracts of raw data can actually be used, but not by traditional BI reporting applications. For example RFID tracking, DNA sequences, weather records, call records from telco, particle physics experimental data and text storage systems all produce what is essentially information that can not be aggregated and reduced in complexity. It is also the type of data where we are interested in discrete records or possibly the interaction between records. But finding the records of interest quickly is essential if this type of application is to have any use. And if we can’t use it why store it (to paraphrase a recent comment on this blog)?

About these ads

Responses

  1. Have you actually started using live operational data in your projects? I’ve been thinking about this and the problem of data quality seems to be important and difficult to handle on real-time or near-real-time data.

  2. We apply some operational data, but not the fast and furious kind that comes from some potential sources
    Data quality can be a big problem. I just hate the idea of storing bad data in a data warehouse.
    I tend to validate that the dimensional keys already exist, but not check the individual fact values for sense. Anything that fails this simple test is ‘parked’ for further investigation / fix. But as the loads become more and more frequent and the techniques to apply new data become more complex to minimise the impact of applying data we probably get to a stage where it not possible to catch up on missed data. But really that will be a business choice, not a technical one; if the business must have it, we must work out how to.

  3. This seems to be a persistently interesting question. I wrote an article about this topic several years ago (check it out here –> http://www.craigsmullins.com/zjdp_010.htm)

  4. [...] onto a ‘size’ issue, we now have Pete Scott asking ‘So what do you call large databases?’ Answers on a postcard [...]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: