Below are the set of rules ( in plain english psuedocode ) that we are applying to OpenMRS Patients and their encounters to de-identify them. This is a work in progress.
For a org.openmrs.Patient and org.openmrs.Person Object we need to remove the 18 PHI Identifiers:
- Names
- Remove all names (a patient can have multiple names in OpenMRS with a preferred name) and replace with a fake name (e.g., name(s) selected from a pool of fake names).
- Geographic data
- Remove all addresses (a patient can have multiple addresses) and generate a fake address.
- Remove all GPS data.
- Dates
- For birthdate, replace month & day with random values for patients under 60 years of age. For patients 60+ years of age, adjust year randomly by ±5 years.
- For all data (observations, encounters, etc.) replace month & day with random values, keeping sequence of data (intervals will change randomly).
- For all other dates, randomly replace month & day.
- Person attributes
- Remove all person attributes, which could include telephone data, fax numbers, or other identifiable data.
Q: What attributes would come under 'other identifiable data'? How would the code come to know what other attributes are added? How should the admin be able to configure this data?
- Remove all person attributes, which could include telephone data, fax numbers, or other identifiable data.
- Telephone numbers
- These are often included in a Person's extra attribute data.
- FAX numbers
- These are often included in a Person's extra attribute data.
- Email addresses
- These are often included in a Person's extra attribute data.
- National identifiers
- Remove all patient identifiers, replacing with a fake (randomly generated & unique) identifier.
- Medical record numbers
- Remove all patient identifiers, replace with a fake (randomly generated & unique) identifier.
- Health plan beneficiary numbers
- Account numbers
- Certificate/license numbers
- Vehicle identifiers and serial numbers including license plates
- Device identifiers and serial numbers
- Web URLs
- Internet protocol addresses
- Biometric identifiers (i.e. retinal scan, fingerprints)
- Full face photos and comparable images
- Any unique identifying number, characteristic or code
For each Patients org.openmrs.Encounter Object we will need to do the following:
- Randomize the month & day of Encounter Datetime, keeping encounters in the same sequence without maintaining intervals between them.
- Assign random location of the encounter.
- We may also need some sort of 'obs filter' that includes a list of obs and rules specific to the concept dictionary that must be removed from encounters.
Family Data:
- Remove all relationships between persons.