Below are the set of rules ( in plain english psuedocode ) that we are applying to OpenMRS Patients and their encounters to de-identify them. This is a work in progress.
For a org.openmrs.Patient and org.openmrs.Person Object we need to remove the 18 PHI Identifiers:
- Names
- Remove all names (a patient can have multiple names in OpenMRS with a preferred name) and replace with a fake name (e.g., name(s) selected from a pool of fake names).
- Geographic data
- Remove all addresses (a patient can have multiple addresses) and generate a fake address.
- Remove all GPS data.
- Dates
- For birthdate, replace month & day with random values for patients under 60 years of age. For patients 60+ years of age, adjust year randomly by ±5 years.
- For all data (observations, encounters, etc.) replace month & day with random values, keeping sequence of data (intervals will change randomly).
- For all other dates, randomly replace month & day.
- Person attributes
- Remove all person attributes, which could include telephone data, fax numbers, or other identifiable data.
Q1: What attributes would come under 'other identifiable data'? How would the code come to know what other attributes are added? should admin be able to configure this data as removable attributes from our module? Currently we are not storing contact information in any form for person, so to remove it should admin configure this also?
- Remove all person attributes, which could include telephone data, fax numbers, or other identifiable data.
- Telephone numbers
- These are often included in a Person's extra attribute data.
- FAX numbers
- These are often included in a Person's extra attribute data.
- Email addresses
- These are often included in a Person's extra attribute data.
- National identifiers
- Remove all patient identifiers, replacing with a fake (randomly generated & unique) identifier.
Q2: As far as I know SSN is termed as national identifier which can be included as a concept, so should the admin couple this with the removable attributes.
- Remove all patient identifiers, replacing with a fake (randomly generated & unique) identifier.
- Medical record numbers
- Remove all patient identifiers, replace with a fake (randomly generated & unique) identifier.
Q2 for MR Number also.
- Remove all patient identifiers, replace with a fake (randomly generated & unique) identifier.
- Health plan beneficiary numbers
- Account numbers
- Certificate/license numbers
- Vehicle identifiers and serial numbers including license plates
- Device identifiers and serial numbers
- Web URLs
- Internet protocol addresses
- Biometric identifiers (i.e. retinal scan, fingerprints)
- Full face photos and comparable images
- Any unique identifying number, characteristic or code
For each Patients org.openmrs.Encounter Object we will need to do the following:
- Randomize the month & day of Encounter Datetime, keeping encounters in the same sequence without maintaining intervals between them.
When I try to randomize the month and date of the encounter, say original sequence, 1 oct, 15 oct, 30 oct and randomized dates give us 1 jan, 12 dec, 15 march, then the sequence (or the flow of months) is lost. Is this is the sequence which you want me to preserve? Am I getting you correct on this? - Assign random location of the encounter.
When I randomize the locations should the locations be from openmrs's database or any other random location set? - We may also need some sort of 'obs filter' that includes a list of obs and rules specific to the concept dictionary that must be removed from encounters.
I did not understand what exactly you mean by this statement? Should we group the concepts under encounter which we want to remove?
Family Data:
- Remove all relationships between persons.