I have written recently about the trend towards ever larger databases. But what should we call these VLDB systems?
A few years back a set of terms seem to fit well, DSS, MIS, data warehouse, datamart; sometimes the distinction between them was very blurred. But by and large we were dividing the reporting from the transactional. Today things are not so clear cut, as well as the hybrid dashboards and scorecards driven from a mixture of live transactional data and historic information we also storing a whole class of data that doe not really fit in the “data warehouse drives the reporting system” world.
Today vast tracts of raw data can actually be used, but not by traditional BI reporting applications. For example RFID tracking, DNA sequences, weather records, call records from telco, particle physics experimental data and text storage systems all produce what is essentially information that can not be aggregated and reduced in complexity. It is also the type of data where we are interested in discrete records or possibly the interaction between records. But finding the records of interest quickly is essential if this type of application is to have any use. And if we can’t use it why store it (to paraphrase a recent comment on this blog)?