Patient Matching 2.0 Project

Primary mentor

Burke Mamlin

Backup mentor

Shaun Grannis

Assigned to

Lahiru Jayathilake



Properly identifying patients is a critical feature of any electronic medical record system. In the real world, there can be many challenges to patient identification. While some countries have reliable universal identifiers, most countries do not and we rely on patient demographics (e.g., names, gender, date of birth, etc.) to identify patients. In many cases, proper identification can be challenging (e.g., variable spelling of names, estimated or unknown date of birth). While ongoing efforts to incorporate biometrics and other means of reliable identification, the reality is many systems ending up with duplicate records for patients.

Identifying and correcting duplicate records can be a painful process when performed manually. The OpenMRS Patient Matching module was created by one of the world's leading experts on patient matching techniques, Dr. Shaun Grannis, to use statistical methods to maximize the value of automated patient matching. We're very lucky to have this module; however, it has not been widely adopted because of some key features needed to make it easy for implementations to use.

The goal of this project is address the key features missing in the Patient Matching module, release a new 2.0 version with the features addressed, and, thereby, help the OpenMRS community benefit from the power of this patient matching module.

Project Champions

Skills Needed

  • Java
  • SQL


  • (Primary) Refactor the Patient Matching module to perform “incremental” patient matching. Because larger implementations tend to have more duplicate patient records, they are likely to benefit most from the Patient Matching module. However, larger implementations require efficient performance. Currently, the Patient Matching module scans all records each time it is run. This approach does not scale well to larger implementations. As part of this project, you will refactor the Patient Matching module to implement a more efficient approach to only processing patient records that have been recently added, or their identifying information has changed since the last time patient matching was performed. 
  • (Secondary) Avoid repeated manual review of previous manually reviewed matches. The match process generates a list of highly likely matches (probable duplicates) for human review. The human reviewer will declare true matches and non-matches from this list. Currently, the Patient Matching module repeatedly presents all likely matches each time it runs, without being informed by the human reviewer information learned from prior runs. We propose to develop a process to maintain a list of known matches and non-matches that would not be repeatedly presented to the human reviewer. Additionally, we propose to explore a potential approach to flag these known pairs so the module can, for example, exclude known matches or known non-matches on subsequent runs. 
  • Supply a default matching strategy “out of the box.” Basic name, gender, and date of birth matching strategy available when you first install the module. 

Extra Credit

  • Currently blocking schemes are not chosen using data-driven evidence, but instead are selected in an ad-hoc fashion based on user experience. Evaluate the feasibility for generating automatic recommendations for users to create more efficient blocking schemes to further improve matching efficiency. e.g., “Using last name, first name, and gender will create 5 million potential pairs.”
  • Investigate approaches for using clinical data to improve matching accuracy by augmenting currently used patient identifiers. First deliverable would be a written approach and feasibility assessment for the proposed approach. 
  • Plot a path for further scaling the module (e.g., re-writing in Go, using Docker containers in a cluster, using Hadoop to approach patient matching as a map-reduce problem). 


The whole project can be grouped into two categories 

  1. Incremental Patient Matching
  2. Merging and Excluding Patients from the Patient Matching Report

Incremental Patient Matching

In this part mainly one of the limitations of Patient Matching Module is addressed, which is making comparisons with all of the patient records.

For instance, if we have 10,000 patients in our system and we need to match the patients using first name and the date of birth. Goal is to check for the duplicates among them. If we compare all patients to all the others that is roughly 50 million comparisons ( 10,000 x (10,000 - 1) / 2 ). After couple of days if we run the same match where 90 patients have been added and 10 updated, with the current version it would still carry out the same method of comparison and this time it would be about 51 million comparisons!

In this project a solution is given, where the module performs comparisons only for the added and updated records for that particular strategy. Patient Matching 2.0 has this amazing method to perform this task by making about 1 million [(100 x 99 / 2) + (100 x 9990)] comparisons rather than 51 million of comparisons. Congratulations we saved the valuable time of the module neglecting 50 million comparisons.

User Interface

Things to be Highlighted
  • By default the checkbox Incremental Match is checked. User can perform a patient match as an incremental or as a normal patient match where all the patients are compared to all of them.
  • A single report per a strategy. The report will be named as incremental-report-[strategy_name]
  • To perform an incremental patient match it is necessary to have an already generated report for that particular strategy. If not module will compare all the patients to all others.
  • Incremental patient matching will be performed if only a single strategy is selected. 

Merging & Excluding Patients

User Interface of a Report

Merge Patients

If some of the patients in a group supposed to be the same then the user can merge those patients making they will not appear again in a patient matching report. To merge patients you have to select set of patients from a group and hit the corresponding merge button. For example in the above image the three selected patients can be merged by clicking "Merge GroupId 0" button.

Exclude Non-Matching Patients

There can be some scenarios where the module results some of patients to be same but in real life those patients are related to totally different people. If this happens Patient Matching 2.0 project provides a functionality to eliminate such records without them repeatedly appearing on a patient matching report. To exclude patients you have to select set of patients from a group and hit the corresponding "Exclude Match" button.