Engineering Notes page (OMOP CDM project)
Plan Outline
OMOP CDM Entities Covered so Far
Note: As of March 10, these still need a way to handle OpenMRS concepts.
Table Name | Status |
---|---|
PERSON | DONE |
OBSERVATION_PERIOD | DONE |
VISIT_OCCURRENCE | DONE |
VISIT_DETAIL | DONE |
CONDITION_OCCURRENCE | DONE |
DRUG_EXPOSURE | Pending |
PROCEDURE_OCCURRENCE | Pending |
DEVICE_EXPOSURE | Pending |
MEASUREMENT | DONE |
OBSERVATION | DONE |
DEATH | Pending |
NOTE | DONE |
NOTE_NLP | Pending |
SPECIMEN | Pending |
FACT_RELATIONSHIP | Pending |
LOCATION | DONE |
CARE_SITE | DONE |
PROVIDER | DONE |
PAYER_PLAN_PERIOD | Pending |
COST | Pending |
DRUG_ERA | Pending |
DOSE_ERA | Pending |
CONDITION_ERA | Pending |
EPISODE | Pending |
EPISODE_EVENT | Pending |
METADATA | Pending |
CDM_SOURCE | PRE |
CONCEPT | PRE |
VOCABULARY | PRE |
DOMAIN | PRE |
CONCEPT_CLASS | PRE |
CONCEPT_RELATIONSHIP | PRE |
RELATIONSHIP | PRE |
CONCEPT_SYNONYM | PRE |
CONCEPT_ANCESTOR | PRE |
SOURCE_TO_CONCEPT_MAP | PRE |
DRUG_STRENGTH | Pending |
COHORT | Pending |
COHORT_DEFINITION | Pending |
Useful links:
OMOP CDM v5.4: https://ohdsi.github.io/CommonDataModel/cdm54.html
SQL for loading vocabulary csv into the database - https://github.com/OHDSI/CommonDataModel/blob/v5.3.1/PostgreSQL/VocabImport/OMOP%20CDM%20vocabulary%20load%20-%20PostgreSQL.sql
Instructions on how to setup the visualization tools
Setup WebAPI by following the instructions https://github.com/OHDSI/WebAPI/wiki/WebAPI-Installation-Guide
Add the converted data into a separate database, remember the schema name you are importing it into, (whether its into the
public
schema or any other, I usually added it into a separate schema that I make and calledcdm
). Create two schemasresults
andtemp
in this database.Add the connection details into the
source
andsource_daimon
tables in the database setup in step 1 by following the instructions in https://github.com/OHDSI/WebAPI/wiki/CDM-Configuration#source-and-source_daimon-table-setup.If you are configuring a PostgreSQL connection and are facing difficulities connecting, it may require the addition of the
OpenSourceSubProtocolOverride
to yoursource_connection
connection string.Make sure that the entry into the
source_daimon
table has the correct schema name for where your omop data is (i.e,public
or in my casecdm
)Get the SQl script necessary to set up the
results
schema by running the url (see more instructions on here):<http://server:port>/WebAPI/ddl/results?dialect=<your_cdm_database_dialect>&schema=<your_results_schema>&vocabSchema=<your_vocab_schema>&tempSchema=<your_temp_schema>&initConceptHierarchy=true
Setup the R environment by following https://ohdsi.github.io/Hades/rSetup.html
Install Achilles in R following https://ohdsi.github.io/Achilles/#installation
To run Achilles on the dataset, you must first create a connection to the database
Before creating a connection, you must have the JAR file for the database you are connecting to downloaded. You can use an already installed R package
DatabaseConnector
to do that a https://ohdsi.github.io/DatabaseConnector/reference/downloadJdbcDrivers.htmlYou can add the path to where you downloaded the driver to your R environment otherwise you have to keep passing in the path as a parameter to the
createConnectionDetails
functionCreate the connection details to the database
connectionDetails <- createConnectionDetails(dbms = 'postgresql', user = 'omop', password = 'omop', server = 'localhost/omop', port = '5432', extraSettings = 'OpenSourceSubProtocolOverride=true')
Run achilles
achilles(connectionDetails = connectionDetails, cdmDatabaseSchema = 'public', resultsDatabaseSchema = 'results', sourceName = 'omop', vocabDatabaseSchema = 'public', cdmVersion = '5.4', createTable = TRUE)
Run the script to create the
concept_count
tables. You can get the script by running the following URL (see more here):http://<server:port>/WebAPI/ddl/achilles?dialect=<your_cdm_database_dialect>&schema=<your_results_schema>&vocabSchema=<your_vocab_schema>
Setup Atlas by following - https://github.com/OHDSI/Atlas/wiki/Atlas-Setup-Guide and see visualizations.
Note - the WebAPI and R studio have differing versions of java required. I use sdkman - https://sdkman.io/ to manage it. I have a JAVA_HOME
variable set up in the .Renvironment
file in R Studio to point to the java version that it requires.
Updates
May Week 1
After some help from the folks at OHDSI, we’ve realized that we missed a step where we have to generate tables in the results
schema using a script. We now have functioning visualizations in Atlas
with results generated by processing our converted data through their Achilles
tool.
Since I did import the vocabulary files that we downloaded from Athena
into a vocabulary
schema, my next step is to not use an external vocabulary schema and check for any missed tables that we need in our converted data. Doing this didn’t change the visualization outcome, so looks like our converted data has the necessary vocabulary files to get visualizations.
April Week 3
We’ve added a postgres database that has converted data and the vocabulary as a source into WebAPI and ran it through achilles but it didn’t generate all the result tables, so the profile visualizations don’t work. To view a profile, achilles generates characteristics for each person in a table in the results schema and because the table doesn’t exist it leads to a 500 error.
Apr Week 2
Turns out that the reason the visualizations failed to work on Atlas was because the connection to the database failed, and that was because of a trailing space in the connection string and because in the case of PostgreSQL connections, we have to add OpenSourceSubProtocolOverride
to the connection string.
We can now see charts in the dashboard option, but not for the person
option, and I’ve asked about this in the forums - https://forums.ohdsi.org/t/help-with-visualizing-data-in-atlas/23467/2, but I haven’t had any response so far.
The person chart also loads now, but we need to fix some mapping errors in the database.
Apr week 1
Converted following entities: Notes and observation period.
To populate the OBSERVATION_PERIOD
table in OMOP CDM, I used a combination of visit start/stop dates and encounter datetimes from the OpenMRS source database. This decision is based on the need to accurately capture the time span during which we can consider a patient’s medical data to be complete and reliable.
Here’s the model: https://github.com/jayasanka-sack/openmrs-to-omop/blob/de40d38bf7ba3dba535e32ed1c095eb397bd2f52/models/observation_period.sql
Why visit.date_started
for observation period start?
The
visit
table in OpenMRS represents high-level patient interactions with the healthcare system.date_started
gives the earliest recorded point of care, often more reliable than scattered observations.Using the earliest
date_started
per patient ensures we mark the beginning of valid, intentional medical data.
Why GREATEST(visit.date_stopped, encounter.encounter_datetime)
for observation period end?
Some visits may not have a
date_stopped
, especially if not explicitly closed in OpenMRS.Encounters within a visit may occur after the visit’s
date_stopped
(due to data entry lag or configuration).To ensure we capture the latest available data, we take the greater of:
MAX(visit.date_stopped)
: expected end of clinical engagementMAX(encounter.encounter_datetime)
: actual last recorded interaction
Additional Considerations:
Patients may have long gaps between visits, but for simplicity and consistency, only a single continuous observation period is generated per patient in this model.
Updates on Data Viz using OHDSI tools
Atlas is a web application that supports the design and execution of observational analyses and shows visualizations of patient-level data in the CDM format. To run Atlas
locally, we have to first setup and install WebAPI
which contains all the RESTful services used by OHDSI applications.
After following the wiki to set up the WebAPI
and Atlas
, I downloaded our converted data into CSVs and loaded them into a Postgresql database and added the database as a source
in the WebAPI
records. The converted data needs to be processed by OHDSI’s Achilles tool, which is a R package, for Atlas
to be able to visualize the data. Unfortunately, the WebAPI
wasn’t picking up the database despite numerous tries.
OHDSI has a project Broadsea, which runs the Atlas
, WebAPI
and Hades
(which contains Achilles
) tools, in a docker container, which means we wouldn’t be blocked by not being able to correctly setup WebAPI
anymore. After setting up Broadsea
, the next task was to figure out how to load our data into the database that Broadsea
uses.
Broadsea
has a repository for the prefilled container of the database with data - atlas-db
. I replaced the demo csv files with the csv files containing the data converted from the OpenMRS database and spun up the db docker container. I tried to add atlas-db
as a source to the local WebAPI
setup, but it still wasn’t able to see the data on Atlas
.
I then tried spinning up the entire Broadsea application while having the atlas-db
container running, but it led to errors. So I deleted the local atlas-db
docker image and volume, and let Broadsea
spin up and then replaced the data in the database that is within Broadsea
. Unfortunately, it is when spinning up that Broadsea
processes the data in the database and saves in a different results
schema that Atlas
shows. So replacing the data after spinning up had no effect, because Atlas
wasn’t that data. Since Broadsea
comes inbuilt with Achilles
, my next step is to try and run Achilles
on the database again after the container has been spun up.
Links for data viz related tools
WebAPI repository - https://github.com/OHDSI/WebAPI
Atlas repository - https://github.com/OHDSI/Atlas
Achilles - https://ohdsi.github.io/Achilles/
Broadsea - https://github.com/OHDSI/Broadsea
Broadsea atlasdb - https://github.com/OHDSI/Broadsea-Atlasdb
Mar week 2 -4
One of the key challenges in translating entities was handling concepts. Initially, I approached this by converting existing OpenMRS concepts into the OMOP concept
table and then using those concepts when translating observations. However, this made the data OpenMRS-specific, which was not ideal. The better approach was to use existing OMOP concept IDs so that the transformed data remains universal, regardless of whether it originated from OpenMRS or another system.
To achieve this, I used the standardized concepts available in the OMOP vocabulary, which I downloaded from their official repository:
CONCEPT.csv file.
This file contains 2,059,343 standard concepts.
Using the OpenMRS concept
table, concept_reference_map
table, and the downloaded OMOP concepts CSV, I created a mapping table with the following structure:
|
|
|
|
|
---|---|---|---|---|
5 | 45919733 | SAME-AS | CIEL | Tetra brand of tetracycline |
5 | 1836948 | SAME-AS | RxNorm | tetracycline |
5 | 4187581 | NARROWER-THAN | SNOMED | Tetracycline |
5 | 4281393 | SAME-AS | SNOMED | Tetracycline-containing product |
6 | 45919413 | SAME-AS | CIEL | ETH-Oxydose |
6 | 1124957 | SAME-AS | RxNorm | oxycodone |
… | … | … | … | … |
With this table, we can easily retrieve the relevant OMOP concept ID during the conversion. Currently, I am using the CIEL SAME-AS mapping for the conversion process, but there is flexibility to use alternate vocabularies like RxNorm or SNOMED in the future.
With concept mapping in place, observations can now be properly translated.
In OpenMRS, all observations are stored in a single obs
table, whereas OMOP distributes them across multiple tables such as measurement
, observation
, and note
. For now, I’ve filtered observations based on encounter types—specifically vitals (encounter type ID 5) and lab results (ID 11)—and included them in the measurement
table. The rest are currently placed in the observation
table. The plan is to further refine and categorize these as the project progresses.
March Week 1
According to the SQLMesh documentation (Model Kinds), when using model kind: FULL
, the model should be persisted as a table. However, for some reason, SQLMesh still uses views when populating data.
I just checked how the view is populated; this is what I observed:
SQLMesh writes translated data to a table in a separate database and then creates a view in the target database.
The view in the OMOP database is not querying
openmrs.patient
directly. Instead, it points to an intermediate managed table stored insqlmesh__omop_db
.
Why is it referring to an intermediate table?
SQLMesh manages versions of models by creating a unique identifier (
2955369539
) for each version.The actual data is stored in a temporary or managed table in
sqlmesh__omop_db
.The
patient
view inomop_db
is pointing to this managed table instead of queryingopenmrs.patient
directly.
Anyway, This behavior aligns with the expected functionality of model kind: VIEW
, but not model kind: FULL
. Read more: Model Kinds
Next Steps & Workaround
For now, I am okay with this approach because views behave like tables when querying.
The main issue arises when exporting the database, as views can create complications.
My focus for now will be on writing models and gaining a deeper understanding of SQLMesh.
Once I have more experience working with SQLMesh, I’ll revisit potential solutions for handling persistent tables.
The source can be found : https://github.com/jayasanka-sack/openmrs-to-omop/
DDL for postgres: https://github.com/OHDSI/CommonDataModel/blob/v5.4.0/inst/ddl/5.4/postgresql/OMOPCDM_postgresql_5.4_ddl.sql
Jan-Feb 2025
I initially explored DBT to perform a simple data transformation, but encountered a key limitation: DBT only supports a single database connection. This restriction makes it difficult to achieve the required task of populating a PostgreSQL database for OMOP, as the OpenMRS database is in MariaDB. DBT Documentation explicitly states that connecting to multiple databases is not supported.
The source code for DBT approach:
https://github.com/jayasanka-sack/openmrs-to-omop/tree/dbt
DBT models:
Patient: https://github.com/jayasanka-sack/openmrs-to-omop/blob/dbt/dbt/models/staging/stg_patient.sql
Alternative Approaches
1. Using an External Loader Service (SymmetricDS)
Instead of trying to connect DBT to multiple databases, the DBT documentation suggests reconsidering the data-loading approach:
"Instead of thinking 'how do I connect my dbt project to two databases,' ask 'what loader services will best prepare our warehouse for dbt transformations.'”
Based on this, I explored SymmetricDS as a potential external loader service. However, its documentation is complex, and before committing to this approach, I am considering other alternatives.
2. Exploring SQLMesh
Another alternative I considered is SQLMesh, which supports multiple databases and engines. I created a PostgreSQL database and attempted to write transformed patient data, but found it somewhat challenging. This could be due to either a lack of understanding on my part or the fact that these tools aren't inherently designed for cross-database transformations (e.g., MariaDB → PostgreSQL).
I couldn't find any existing documentation or articles that attempt this exact workflow, but the following SQLMesh resources helped clarify standard transformations and model structures:
One major advantage of SQLMesh is its comprehensive documentation and built-in UI, which visualizes entity relationships—making debugging much easier.
3. Revised Approach: OMOP-like MySQL Database
Since SQLMesh allows writing to multiple databases within the same engine, a more practical approach is:
Transform data into an OMOP-like MySQL database first.
Later, use a service like SymmetricDS to convert it into a PostgreSQL OMOP database.
This allows us to focus on the core task—mapping OpenMRS entities to OMOP entities—without getting blocked by cross-database limitations.