Patient Matching 2.0 Project

Primary mentor	Burke Mamlin
Backup mentor	Shaun Grannis
Assigned to	Lahiru Jayathilake

Abstract

Properly identifying patients is a critical feature of any electronic medical record system. In the real world, there can be many challenges to patient identification. While some countries have reliable universal identifiers, most countries do not and we rely on patient demographics (e.g., names, gender, date of birth, etc.) to identify patients. In many cases, proper identification can be challenging (e.g., variable spelling of names, estimated or unknown date of birth). While ongoing efforts to incorporate biometrics and other means of reliable identification, the reality is many systems ending up with duplicate records for patients.

Identifying and correcting duplicate records can be a painful process when performed manually. The OpenMRS Patient Matching module was created by one of the world's leading experts on patient matching techniques, Dr. Shaun Grannis, to use statistical methods to maximize the value of automated patient matching. We're very lucky to have this module; however, it has not been widely adopted because of some key features needed to make it easy for implementations to use.

The goal of this project is address the key features missing in the Patient Matching module, release a new 2.0 version with the features addressed, and, thereby, help the OpenMRS community benefit from the power of this patient matching module.

Project Champions

Skills Needed

Java
SQL
HTML/CSS

Objectives

(Primary) Refactor the Patient Matching module to perform “incremental” patient matching. Because larger implementations tend to have more duplicate patient records, they are likely to benefit most from the Patient Matching module. However, larger implementations require efficient performance. Currently, the Patient Matching module scans all records each time it is run. This approach does not scale well to larger implementations. As part of this project, you will refactor the Patient Matching module to implement a more efficient approach to only processing patient records that have been recently added, or their identifying information has changed since the last time patient matching was performed.
(Secondary) Avoid repeated manual review of previous manually reviewed matches. The match process generates a list of highly likely matches (probable duplicates) for human review. The human reviewer will declare true matches and non-matches from this list. Currently, the Patient Matching module repeatedly presents all likely matches each time it runs, without being informed by the human reviewer information learned from prior runs. We propose to develop a process to maintain a list of known matches and non-matches that would not be repeatedly presented to the human reviewer. Additionally, we propose to explore a potential approach to flag these known pairs so the module can, for example, exclude known matches or known non-matches on subsequent runs.
Supply a default matching strategy “out of the box.” Basic name, gender, and date of birth matching strategy available when you first install the module.

Extra Credit

Currently blocking schemes are not chosen using data-driven evidence, but instead are selected in an ad-hoc fashion based on user experience. Evaluate the feasibility for generating automatic recommendations for users to create more efficient blocking schemes to further improve matching efficiency. e.g., “Using last name, first name, and gender will create 5 million potential pairs.”
Investigate approaches for using clinical data to improve matching accuracy by augmenting currently used patient identifiers. First deliverable would be a written approach and feasibility assessment for the proposed approach.
Plot a path for further scaling the module (e.g., re-writing in Go, using Docker containers in a cluster, using Hadoop to approach patient matching as a map-reduce problem).

Resources

Patient Matching Module
Source on GitHub: openmrs-module-patientmatching