Data Quality

Facilitator: Evan Waters
Notetaker: Christian Neumann

What makes "good data" quality? What are we trying to achieve and capture?

  • What actually happened

  • Accurate data

  • Timely

  • Complete

  • Consistent

  • Legible

  • Used (usable) -> for decision making

  • Reusable

  • Relevant

  • Necessary (only)

What are the challenges? What leads to poor data quality? What can get in the way?

  • Too much data... (impacts the quality)

  • All the inverse of good data quality

  • Lack of understanding, training

  • Process, capacity

  • Lack of standards definitions

  • Illegible

  • Machine errors

  • Incomplete and lost data

  • Transcription errors

  • Redundancies

What are the problems and possible solutions?

  • Sensitize on data quality

  • Overcome transcription errors with Education

  • Measure impact

  • Overcome Transit errors (time, distance) by reducing steps in process, being more timely, mobile platform

  • Find answer for who is responsible for checking data. Everybody, not just data manager can check this

  • Define categories of data quality areas and use different interventions to overcome the errors by different people

  • Paper-based Forms can get lost; having 2 forms filled out by the responsible persons provides a fall-back in case the initial forms is lost or entered dfferently

  • Gap of time and distance

  • Community data 9more mobile) vs. clinical data (more statical)

  • Complexity in forms

  • Depending on environment patients might reuse Identifiers (to save costs, black market, privacy, ...)

  • Problem of uniquely identifying persons, e.g. with Checkdigit in Identifier, Barcodes

  • Name spelling, Soundex, birthdate, biometrics (fingerprint)

  • Additional Identification by Secret questions & answers

  • Tradeoff between Privacy and scaled-up National unique patient identification

What is "necessary data"?

  • 1. Key logistics: like Drug box (stocks): How much meds have been used, how much is left?

  • 2. M&E data, Data for founders

  • Depends on the audience, e.g. Government, Facilities, ..., Task: Identify customers

  • Minimum data set, task: Who defines minimum data set?

  • Relationship information crucial for preventions, e.g. kids of an HIV mother

  • Coordinating on nation level, harmonizing data sets

How do we measure "data quality"?

  • Quarterly assessments, but who is actually doing these formal measurements?

  • Validation against forms, but this needs access to the paper-based forms

  • Look at completeness of data in the forms

  • Random samples by data entry, e.g. check 20% of the data 3x a week

  • Keeping log of errors for each data assistant

  • Incentive for staff if error rate is low

What is already available in OpenMRS?

  • DoubleEntryModule for Infopatch

  • Patient Flags Module

  • Reporting tools, which can be used for data quality

Things that we would like to see?

  • Data Statistics module

  • Data Integrity module

  • "Pre-canned" rules for data quality

  • Audit trail

  • Double-entry for HTML & XForms

  • Soundex module for fuzzy search in non-english languages

  • Idgen

How do we continue?

Make some noise on

  • OpenMRS Groups

  • Tickets

  • Wiki