Engineering Notes page (OMOP CDM project)

Engineering Notes page (OMOP CDM project)

Plan Outline

In progress Mapping OpenMRS Datamodel to OMOP CDM Entities
DONE First: Support Person entity (because can do a lot with that)
DONE Way to handle concepts (because can’t just put OpenMRS terminology directly into OMOP tools)
DONE Connect with OMOP ecosystem Data Viz tool(s): At least visualize a single entity

OMOP CDM Entities Covered so Far

Note: As of March 10, these still need a way to handle OpenMRS concepts.

Table Name

Status

Table Name

Status

PERSON

DONE

OBSERVATION_PERIOD

DONE

VISIT_OCCURRENCE

DONE

VISIT_DETAIL

DONE

CONDITION_OCCURRENCE

DONE

DRUG_EXPOSURE

Pending

PROCEDURE_OCCURRENCE

Pending

DEVICE_EXPOSURE

Pending

MEASUREMENT

DONE

OBSERVATION

DONE

DEATH

Pending

NOTE

DONE

NOTE_NLP

Pending

SPECIMEN

Pending

FACT_RELATIONSHIP

Pending

LOCATION

DONE

CARE_SITE

DONE

PROVIDER

DONE

PAYER_PLAN_PERIOD

Pending

COST

Pending

DRUG_ERA

Pending

DOSE_ERA

Pending

CONDITION_ERA

Pending

EPISODE

Pending

EPISODE_EVENT

Pending

METADATA

Pending

CDM_SOURCE

PRE

CONCEPT

PRE

VOCABULARY

PRE

DOMAIN

PRE

CONCEPT_CLASS

PRE

CONCEPT_RELATIONSHIP

PRE

RELATIONSHIP

PRE

CONCEPT_SYNONYM

PRE

CONCEPT_ANCESTOR

PRE

SOURCE_TO_CONCEPT_MAP

PRE

DRUG_STRENGTH

Pending

COHORT

Pending

COHORT_DEFINITION

Pending

Useful links:

Instructions on how to setup the visualization tools

  1. Setup WebAPI by following the instructions https://github.com/OHDSI/WebAPI/wiki/WebAPI-Installation-Guide

  2. Add the converted data into a separate database, remember the schema name you are importing it into, (whether its into the public schema or any other, I usually added it into a separate schema that I make and called cdm). Create two schemas results and temp in this database.

  3. Add the connection details into the source and source_daimon tables in the database setup in step 1 by following the instructions in https://github.com/OHDSI/WebAPI/wiki/CDM-Configuration#source-and-source_daimon-table-setup.

    1. If you are configuring a PostgreSQL connection and are facing difficulities connecting, it may require the addition of the OpenSourceSubProtocolOverride to your source_connection connection string.

    2. Make sure that the entry into the source_daimon table has the correct schema name for where your omop data is (i.e, public or in my case cdm)

    3. Get the SQl script necessary to set up the results schema by running the url (see more instructions on here):

      <http://server:port>/WebAPI/ddl/results?dialect=<your_cdm_database_dialect>&schema=<your_results_schema>&vocabSchema=<your_vocab_schema>&tempSchema=<your_temp_schema>&initConceptHierarchy=true
  4. Setup the R environment by following https://ohdsi.github.io/Hades/rSetup.html

  5. Install Achilles in R following https://ohdsi.github.io/Achilles/#installation

  6. To run Achilles on the dataset, you must first create a connection to the database

    1. Before creating a connection, you must have the JAR file for the database you are connecting to downloaded. You can use an already installed R package DatabaseConnector to do that a https://ohdsi.github.io/DatabaseConnector/reference/downloadJdbcDrivers.html

    2. You can add the path to where you downloaded the driver to your R environment otherwise you have to keep passing in the path as a parameter to the createConnectionDetails function

    3. Create the connection details to the database

      connectionDetails <- createConnectionDetails(dbms = 'postgresql', user = 'omop', password = 'omop', server = 'localhost/omop', port = '5432', extraSettings = 'OpenSourceSubProtocolOverride=true')
    4. Run achilles

      achilles(connectionDetails = connectionDetails, cdmDatabaseSchema = 'public', resultsDatabaseSchema = 'results', sourceName = 'omop', vocabDatabaseSchema = 'public', cdmVersion = '5.4', createTable = TRUE)
    5. Run the script to create the concept_count tables. You can get the script by running the following URL (see more here):

      http://<server:port>/WebAPI/ddl/achilles?dialect=<your_cdm_database_dialect>&schema=<your_results_schema>&vocabSchema=<your_vocab_schema>
  7. Setup Atlas by following - https://github.com/OHDSI/Atlas/wiki/Atlas-Setup-Guide and see visualizations.

Note - the WebAPI and R studio have differing versions of java required. I use sdkman - https://sdkman.io/ to manage it. I have a JAVA_HOME variable set up in the .Renvironment file in R Studio to point to the java version that it requires.

Updates

May Week 1

After some help from the folks at OHDSI, we’ve realized that we missed a step where we have to generate tables in the results schema using a script. We now have functioning visualizations in Atlas with results generated by processing our converted data through their Achilles tool.

Since I did import the vocabulary files that we downloaded from Athena into a vocabulary schema, my next step is to not use an external vocabulary schema and check for any missed tables that we need in our converted data. Doing this didn’t change the visualization outcome, so looks like our converted data has the necessary vocabulary files to get visualizations.

Visualizations for a single person
Visualizations for observation
Visualizations for observation period
Visualizations for measurement

April Week 3

We’ve added a postgres database that has converted data and the vocabulary as a source into WebAPI and ran it through achilles but it didn’t generate all the result tables, so the profile visualizations don’t work. To view a profile, achilles generates characteristics for each person in a table in the results schema and because the table doesn’t exist it leads to a 500 error.

 

Apr Week 2

Turns out that the reason the visualizations failed to work on Atlas was because the connection to the database failed, and that was because of a trailing space in the connection string and because in the case of PostgreSQL connections, we have to add OpenSourceSubProtocolOverride to the connection string.

We can now see charts in the dashboard option, but not for the person option, and I’ve asked about this in the forums - https://forums.ohdsi.org/t/help-with-visualizing-data-in-atlas/23467/2, but I haven’t had any response so far.

The person chart also loads now, but we need to fix some mapping errors in the database.

 

Apr week 1

Converted following entities: Notes and observation period.

To populate the OBSERVATION_PERIOD table in OMOP CDM, I used a combination of visit start/stop dates and encounter datetimes from the OpenMRS source database. This decision is based on the need to accurately capture the time span during which we can consider a patient’s medical data to be complete and reliable.

Here’s the model: https://github.com/jayasanka-sack/openmrs-to-omop/blob/de40d38bf7ba3dba535e32ed1c095eb397bd2f52/models/observation_period.sql

Why visit.date_started for observation period start?

  • The visit table in OpenMRS represents high-level patient interactions with the healthcare system.

  • date_started gives the earliest recorded point of care, often more reliable than scattered observations.

  • Using the earliest date_started per patient ensures we mark the beginning of valid, intentional medical data.

Why GREATEST(visit.date_stopped, encounter.encounter_datetime) for observation period end?

  • Some visits may not have a date_stopped, especially if not explicitly closed in OpenMRS.

  • Encounters within a visit may occur after the visit’s date_stopped (due to data entry lag or configuration).

  • To ensure we capture the latest available data, we take the greater of:

    • MAX(visit.date_stopped): expected end of clinical engagement

    • MAX(encounter.encounter_datetime): actual last recorded interaction

Additional Considerations:

  • Patients may have long gaps between visits, but for simplicity and consistency, only a single continuous observation period is generated per patient in this model.

Updates on Data Viz using OHDSI tools

Atlas is a web application that supports the design and execution of observational analyses and shows visualizations of patient-level data in the CDM format. To run Atlas locally, we have to first setup and install WebAPI which contains all the RESTful services used by OHDSI applications.

After following the wiki to set up the WebAPI and Atlas, I downloaded our converted data into CSVs and loaded them into a Postgresql database and added the database as a source in the WebAPI records. The converted data needs to be processed by OHDSI’s Achilles tool, which is a R package, for Atlas to be able to visualize the data. Unfortunately, the WebAPI wasn’t picking up the database despite numerous tries.

OHDSI has a project Broadsea, which runs the Atlas, WebAPI and Hades (which contains Achilles) tools, in a docker container, which means we wouldn’t be blocked by not being able to correctly setup WebAPI anymore. After setting up Broadsea, the next task was to figure out how to load our data into the database that Broadsea uses.

Broadsea has a repository for the prefilled container of the database with data - atlas-db. I replaced the demo csv files with the csv files containing the data converted from the OpenMRS database and spun up the db docker container. I tried to add atlas-db as a source to the local WebAPI setup, but it still wasn’t able to see the data on Atlas.

I then tried spinning up the entire Broadsea application while having the atlas-db container running, but it led to errors. So I deleted the local atlas-db docker image and volume, and let Broadsea spin up and then replaced the data in the database that is within Broadsea. Unfortunately, it is when spinning up that Broadsea processes the data in the database and saves in a different results schema that Atlas shows. So replacing the data after spinning up had no effect, because Atlas wasn’t that data. Since Broadsea comes inbuilt with Achilles, my next step is to try and run Achilles on the database again after the container has been spun up.

Links for data viz related tools

  1. WebAPI repository - https://github.com/OHDSI/WebAPI

  2. Atlas repository - https://github.com/OHDSI/Atlas

  3. Achilles - https://ohdsi.github.io/Achilles/

  4. Broadsea - https://github.com/OHDSI/Broadsea

  5. Broadsea atlasdb - https://github.com/OHDSI/Broadsea-Atlasdb

Mar week 2 -4

One of the key challenges in translating entities was handling concepts. Initially, I approached this by converting existing OpenMRS concepts into the OMOP concept table and then using those concepts when translating observations. However, this made the data OpenMRS-specific, which was not ideal. The better approach was to use existing OMOP concept IDs so that the transformed data remains universal, regardless of whether it originated from OpenMRS or another system.

To achieve this, I used the standardized concepts available in the OMOP vocabulary, which I downloaded from their official repository:
CONCEPT.csv file.
This file contains 2,059,343 standard concepts.

Using the OpenMRS concept table, concept_reference_map table, and the downloaded OMOP concepts CSV, I created a mapping table with the following structure:

omrs_concept_id

omop_concept_id

relationship_id

source_vocabulary

description

omrs_concept_id

omop_concept_id

relationship_id

source_vocabulary

description

5

45919733

SAME-AS

CIEL

Tetra brand of tetracycline

5

1836948

SAME-AS

RxNorm

tetracycline

5

4187581

NARROWER-THAN

SNOMED

Tetracycline

5

4281393

SAME-AS

SNOMED

Tetracycline-containing product

6

45919413

SAME-AS

CIEL

ETH-Oxydose

6

1124957

SAME-AS

RxNorm

oxycodone

With this table, we can easily retrieve the relevant OMOP concept ID during the conversion. Currently, I am using the CIEL SAME-AS mapping for the conversion process, but there is flexibility to use alternate vocabularies like RxNorm or SNOMED in the future.

With concept mapping in place, observations can now be properly translated.

In OpenMRS, all observations are stored in a single obs table, whereas OMOP distributes them across multiple tables such as measurement, observation, and note. For now, I’ve filtered observations based on encounter types—specifically vitals (encounter type ID 5) and lab results (ID 11)—and included them in the measurement table. The rest are currently placed in the observation table. The plan is to further refine and categorize these as the project progresses.

March Week 1

According to the SQLMesh documentation (Model Kinds), when using model kind: FULL, the model should be persisted as a table. However, for some reason, SQLMesh still uses views when populating data.

I just checked how the view is populated; this is what I observed:

  • SQLMesh writes translated data to a table in a separate database and then creates a view in the target database.

  • The view in the OMOP database is not querying openmrs.patient directly. Instead, it points to an intermediate managed table stored in sqlmesh__omop_db.

Why is it referring to an intermediate table?

  • SQLMesh manages versions of models by creating a unique identifier (2955369539) for each version.

  • The actual data is stored in a temporary or managed table in sqlmesh__omop_db.

  • The patient view in omop_db is pointing to this managed table instead of querying openmrs.patient directly.

Anyway, This behavior aligns with the expected functionality of model kind: VIEW, but not model kind: FULL. Read more: Model Kinds

Next Steps & Workaround

  • For now, I am okay with this approach because views behave like tables when querying.

  • The main issue arises when exporting the database, as views can create complications.

  • My focus for now will be on writing models and gaining a deeper understanding of SQLMesh.

  • Once I have more experience working with SQLMesh, I’ll revisit potential solutions for handling persistent tables.

The source can be found : https://github.com/jayasanka-sack/openmrs-to-omop/

DDL for postgres: https://github.com/OHDSI/CommonDataModel/blob/v5.4.0/inst/ddl/5.4/postgresql/OMOPCDM_postgresql_5.4_ddl.sql

Jan-Feb 2025

I initially explored DBT to perform a simple data transformation, but encountered a key limitation: DBT only supports a single database connection. This restriction makes it difficult to achieve the required task of populating a PostgreSQL database for OMOP, as the OpenMRS database is in MariaDB. DBT Documentation explicitly states that connecting to multiple databases is not supported.

The source code for DBT approach:

https://github.com/jayasanka-sack/openmrs-to-omop/tree/dbt

DBT models:

  1. Patient: https://github.com/jayasanka-sack/openmrs-to-omop/blob/dbt/dbt/models/staging/stg_patient.sql

  2. Concept: https://github.com/jayasanka-sack/openmrs-to-omop/blob/dbt/dbt/models/staging/stg_openmrs_concept.sql

Alternative Approaches

1. Using an External Loader Service (SymmetricDS)

Instead of trying to connect DBT to multiple databases, the DBT documentation suggests reconsidering the data-loading approach:

"Instead of thinking 'how do I connect my dbt project to two databases,' ask 'what loader services will best prepare our warehouse for dbt transformations.'”

Based on this, I explored SymmetricDS as a potential external loader service. However, its documentation is complex, and before committing to this approach, I am considering other alternatives.

2. Exploring SQLMesh

Another alternative I considered is SQLMesh, which supports multiple databases and engines. I created a PostgreSQL database and attempted to write transformed patient data, but found it somewhat challenging. This could be due to either a lack of understanding on my part or the fact that these tools aren't inherently designed for cross-database transformations (e.g., MariaDB → PostgreSQL).

I couldn't find any existing documentation or articles that attempt this exact workflow, but the following SQLMesh resources helped clarify standard transformations and model structures:

One major advantage of SQLMesh is its comprehensive documentation and built-in UI, which visualizes entity relationships—making debugging much easier.

3. Revised Approach: OMOP-like MySQL Database

Since SQLMesh allows writing to multiple databases within the same engine, a more practical approach is:

  1. Transform data into an OMOP-like MySQL database first.

  2. Later, use a service like SymmetricDS to convert it into a PostgreSQL OMOP database.

This allows us to focus on the core taskmapping OpenMRS entities to OMOP entities—without getting blocked by cross-database limitations.