More and more, enterprises are importing data from external sources which they do not control. Data may be imported once, but often in repeated and regular data flows. It is easy to make unwarranted assumptions about the architecture, meaning, quality, scope, and consistency of that data. First, we will bring those assumptions out onto the table (pun intended), and examine ways where they may be wrong, and show simple techniques for testing the data. But also, before agreeing to buy or import data, some additional assumptions must be addressed about how you expect to use it. Often, such data is to be integrated into an internal data warehouse. When that is the intent, a number of tests must be performed to ensure the data meets expectations in meaning and architecture. This presentation is full of simple, practical examples of real data, showing how, for example, two fields with the same name have very different domains and meaning. Tests for compatibility must be made at the database, table, key, and non-key level.
Michael Scofield is a popular speaker in the topics of data management, data quality, data visualization, and semantic data integration. He is an Assistant Professor at Loma Linda University in the Department of Health Information Management. He is the recipient of the 2008 DAMA International Community Award.
Driving Directions
Presentation