Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Pentaho ETL and Dimensional Modeling (Design Page, R&D)

Primary mentor

Unlicensed user Darius Jazayeri

Backup mentor

Assigned to

Unlicensed Former user (Deleted)

Abstract

Excerpt

OpenMRS has few tools in place allowing for easier analysis of concept, patient, location, encounter or visit data in an aggregated, dimensional manner. OLAP (Online Analytical Processing)  is one technology encompassed under the umbrella of business intelligence that facilitates rapid answers to multi-dimensional querying of data.

Click on the image at right for a simple sample of what dimensional modeling looks like at a high level. 

This functionality extends beyond traditional reporting in several ways:

  • OLAP tends to be more adhoc in nature, although static queries can be issued to the underlying engine;
  • OLAP deals specifically with querying at an aggregate level (whether that means simply counting patients meeting requested dimensional criteria or performing complex calculations on some other measure), and drilling into interesting intersections (or tuples) for more detailed data;
  • OLAP requires a significantly different data model than the underlying OpenMRS data model in order to craft the dimensional models that the OLAP engine understands. 

The community edition of the Pentaho Business Intelligence suite includes Pentaho Analysis, an OLAP engine (specifically ROLAP) project named Mondrian. This suite also includes the powerful and easy to use Pentaho Data Integration, an ETL engine project named KETTLE. The Pentaho BI platform is architected as a pluggable and componentized set of pillars (or engines) that can operate in the Pentaho BI Server as an integrated set of tools, or can be utilized as embedded engines in other applications individually.

The purpose of this project is to research the feasibility of integrating the Pentaho toolset with OpenMRS to provide advanced ETL and analytic modeling within an OpenMRS module.The design should take into consideration the fact that data semantics can vary significantly between OpenMRS implementations, which makes standard ETL and modeling challenging. The module will require rules definition and logic that guide the ETL processes in the movement of dimensional members and aggregation of fact data. Design considerations for this logic should include utilizing existing forms, rules engines, fuzzy matching, semantic modeling and other techniques to bridge the gap between concrete facts and data-relationship semantics.

The project will include ongoing development of a set of prototype ETL transformations and models in order to flesh out detailed requirements and validate design decisions. This requires a decent sized implementation of OpenMRS to partner with, and Andy Kanter from the Millenium Villages (Institute, Columbia University) project has agreed to participate.  

Project Champions

Unlicensed user Andrew Kanter Unlicensed user

Burke Mamlin

Objectives

There will be two sets of parallel objectives defined. The first set of objectives pertain specifically to the work that will be done with the MVP project. The second set pertain to the overall project, which aim to create functionality that will benefit the largest set of implementations of OpenMRS as possible.   This approach is necessary to first understand the problem space and the requirements of a single implementation clearly in order to better understand the needs and thus the design required for the abstract (the broader set of implementers). The objectives we set for the prototypes developed for the MVP project compliment the objectives for the overall project.

...