Project Status Overview
The following requirements were identified during the sprint as highest priority for the phase 1 implementation of the MVP warehouse initiative:
- Develop a data extraction strategy that can support extracting from 14 different MVP sites.
- Capture and correct as many unexpected data scenarios as possible.
- Extract as many observations possible to model the indicators in the attached spreadsheet, eHealth Indicators for Pentaho Dashboard 2011-10-24.xls.
- Develop a load strategy and physical database schema that can support aggregation of the 14 sites' data into a common, centralized data store.
- Develop the needed models to allow Pentaho Analysis reports to be created against the loaded star schemas in the warehouse.
MVP Tasks and Status
Task #1: Design deployment topology for the MVP ETL and data warehouse initiative.
This task is complete. The following images depict the recommended deployment structure for MVP.
Where there is an implementation of OpenMRS, a Pentaho Data Integration Carte server will be deployed. The Carte server harnesses the ETL engine for executing the extraction jobs and transformations necessary to pull the desired data out of OpenMRS. The data will be written to CSV text files, and compressed for eventual distribution to the central warehouse site (presumably in the United States).
Once the CSV text files have been received at the central warehouse site, load transformation and jobs are executed on the central warehouse's PDI Carte server. These jobs essentially complete the build out of the star schemas for the OLAP cubes, and load the data into the central warehouse. A Pentaho BI Server is set up for this site, to serve up the OLAP models and analysis reports that are built against the central warehouse star schemas. The dotted lines in the diagram depict possible machine boundaries in the deployment topology.
Task #2: Implement the ETL processes that will extract, de-identify and load OpenMRS data from the source OpenMRS instances in the field to a centralized warehouse.
The work to refactor the original prototype ETL into the ETL jobs and transformations needed to meet Columbia University requirements is in a very early stage. The extraction and load jobs and transformations for the birth and death cubes have been implemented but not validated. The most recent revision of the ETL is attached here.
No work has been started on the health indicators cube ETL since the prototype stage.
Task #3: Design and implement the OLAP schema models for births, deaths and health indicators.
The OLAP models for the birth and death cubes have been refactored and sanity-checked, but the data has not been validated. The most recent revision of the models can be found here.
No work has been started on the health indicators OLAP models since the prototype stage.
Task #4: Install and configure servers per design, both in the United States and Africa (refer to the design slides attached).
This is a future task to be addressed by the Columbia University team.
Task #5: Deploy, schedule, monitor and maintain the warehouse solution.
This is a future task to be addressed by the Columbia University team.
Sprint Objectives & Summary
See the OpenMRS Pentaho sprint page for sprint objectives and a summary of the development effort.
Attachments and Resources
Sprint Summary for Columbia University, Sprint Summary for Columbia University.pdf
eHealth Indicators from Columbia University, eHealth Indicators for Pentaho Dashboard 2011-10-24.xls