De-Identified Patient Data Export

Primary mentor

Steven Githens

Backup mentor

Ben Wolfe

Assigned to

Sara Fatima


OpenMRS currently has available a set of demo data that was created by de-identifying patient a subset of data from existing implementations and duplicating several times over. This data set was developed and curated manually and is difficult to re-create. It would be incredibly helpful to be able to create more comprehensive & realistic data sets for developers and researchers to use.

Make patient data truly de-identified requires following some fairly stringent rules. For example, any dates associated with a patient (including birthdate, visit/encounter dates, dates of observations, etc.) may only use the year (not month or day), so the timestamps on patient data would need to be randomized enough to satisfy HIPAA rules but without losing the sequence of results so the trends of results could remain relatively realistic. Simply shifting all timestamps by the same amount would not meet HIPAA requirements, since the intervals between tests could be used to re-identify the patient.  Basically, creating truly de-identified data means creating a dataset that even a team of expert statisticians could not use to establish the identity of any of the patients.

This project would develop an OpenMRS module capable of transforming and exporting data that adheres to HIPAA privacy guidelines.

Project Champions

Burke Mamlin


  • Create an OpenMRS module
  • Successfully export a patient's de-identified data (replacing demographics with suitable substitutes and randomizing dates of visits, encounters, observations, etc.)
  • Create a process that can export many/all patients' data from a system in de-identified format.

Extra Credit

  • Provide a mechanism for an administrator to control the destination, format, and/or resource usage for the export process
  • Provide a mechanism to easily import de-identified data from your export into another OpenMRS instance