Data Quality

Facilitator: Evan Waters
Notetaker: Christian Neumann

What makes "good data" quality? What are we trying to achieve and capture?

  • What actually happened
  • Accurate data
  • Timely
  • Complete
  • Consistent
  • Legible
  • Used (usable) -> for decision making
  • Reusable
  • Relevant
  • Necessary (only)

What are the challenges? What leads to poor data quality? What can get in the way?

  • Too much data... (impacts the quality)
  • All the inverse of good data quality
  • Lack of understanding, training
  • Process, capacity
  • Lack of standards definitions
  • Illegible
  • Machine errors
  • Incomplete and lost data
  • Transcription errors
  • Redundancies

What are the problems and possible solutions?

  • Sensitize on data quality
  • Overcome transcription errors with Education
  • Measure impact
  • Overcome Transit errors (time, distance) by reducing steps in process, being more timely, mobile platform
  • Find answer for who is responsible for checking data. Everybody, not just data manager can check this
  • Define categories of data quality areas and use different interventions to overcome the errors by different people
  • Paper-based Forms can get lost; having 2 forms filled out by the responsible persons provides a fall-back in case the initial forms is lost or entered dfferently
  • Gap of time and distance
  • Community data 9more mobile) vs. clinical data (more statical)
  • Complexity in forms
  • Depending on environment patients might reuse Identifiers (to save costs, black market, privacy, ...)
  • Problem of uniquely identifying persons, e.g. with Checkdigit in Identifier, Barcodes
  • Name spelling, Soundex, birthdate, biometrics (fingerprint)
  • Additional Identification by Secret questions & answers
  • Tradeoff between Privacy and scaled-up National unique patient identification

What is "necessary data"?

  • 1. Key logistics: like Drug box (stocks): How much meds have been used, how much is left?
  • 2. M&E data, Data for founders
  • Depends on the audience, e.g. Government, Facilities, ..., Task: Identify customers
  • Minimum data set, task: Who defines minimum data set?
  • Relationship information crucial for preventions, e.g. kids of an HIV mother
  • Coordinating on nation level, harmonizing data sets

How do we measure "data quality"?

  • Quarterly assessments, but who is actually doing these formal measurements?
  • Validation against forms, but this needs access to the paper-based forms
  • Look at completeness of data in the forms
  • Random samples by data entry, e.g. check 20% of the data 3x a week
  • Keeping log of errors for each data assistant
  • Incentive for staff if error rate is low

What is already available in OpenMRS?

  • DoubleEntryModule for Infopatch
  • Patient Flags Module
  • Reporting tools, which can be used for data quality

Things that we would like to see?

  • Data Statistics module
  • Data Integrity module
  • "Pre-canned" rules for data quality
  • Audit trail
  • Double-entry for HTML & XForms
  • Soundex module for fuzzy search in non-english languages
  • Idgen

How do we continue?

Make some noise on

  • OpenMRS Groups
  • Tickets
  • Wiki