Documentation For ETL Module
Documentation
Overview
This module provide's a mechanism for performing ETL. Extract transform and loading is done between the MySQL database which is using by the OpenMRS application and the datawarehouse. Datawarehouse is here HIVE/Hadoop where we are loading the extracted data. Feature accomplished with this module latest release is:-
Select data from various table and database and as many columns you can.
Drag and Drop UI for ease of column selection.
Login to Datawarehouse and execute queries over it.
Graphical Statistics and Charts.
Save queries instantly and load it later whenever you want.
Downloading Queries in different format like Excel Sheets, CSV, and TSV.
Sleek and Interactive UI.
Prerequisite
OpenMRS version 1.8.2+
No special dependency module required.
Browser Compatibility
Google Chrome 18 +
Firefox 3.5 +
Internet Explorer 9 +
Opera
Safari
Datawarehouse Prerequisite
Hadoop
Hive
Sqoop
SSH
Quick Start Guide
There are two admin panel option: '''ETL Process''' and '''Datawarehouse Login'''.
Those with ETL Process can select the MySQL database data called the extraction procedure of the module. After the data is extracted it is transformed and loaded to datawarehouse at datawarehouse login page by completing configurations and credentials for it. Then after you moved to Hive Login page where a datawarehouse must be located. You can run your queries and show graphical statistics for the queries run.
Those with Datawarehouse login moved directly to Hive Login page where a datawarehouse must be located. You can run your queries and show graphical statistics for the queries run.
Once you have all things setup. User Guide is like a cakewalk. The user Interface and infrastructure is designed such a way that all things are very easy to handle by a normal user.
Installing the module
Download the omod file from here.
Now go to the Administrator Page and then Click on Manage Module.
Then Click on the Add or Upgrade Module button , after that a popup will appear , click the browse button then select the omod file that you downloaded and then click on upload button.
After this wait for some until the message "Module loaded and started successfully" comes.
After this you will see following section added to the admin page which means ETL Module is installed :
Getting Started
MySQL credentials
You have to fill up the following:-
MySQL Hostname: - The host over which MySQL database is running. It may be a Host name or Host IP address.
MySQL Port: - The port of the host over which MySQL database is running.
MySQL Username: - Username to login into your MySQL database. For OpenMRS by default it is “openmrs”.
MySQL Password: - Password for your MySQL database is present in openmrs runtime properties file. At the same folder from where you launch the openmrs standalone jar file.
Select Database
Select one database by clicking on it or you can choose multiple database at a time. By choosing, next page will load all the tables present in selected database.
Select Tables
Select one table by clicking on it or you can choose multiple table at a time. By choosing, next page will load all the columns present in selected tables.
Select Columns
Select columns by drag drop them from left hand side to right hand side of the web page. By choosing, next page will load the JOIN condition page if it is require one join condition or multiple join condition should provide otherwise skip it by clicking next.
Select type of join, table from the auto created dropdown. Add your ON Condition. It is must require. Then add any clause if you want. After completing this procedure click "Add Condition". This will append the JOIN Condition list made by you which is initially null. You can view the auto generated JOIN Condition Statement by clicking "Show Condition". In case you have your own complex JOIN condition which doesn't fit in this dropdown criteria, you can make your raw sql join condition using "Raw SQL JOIN Condition" text box.
Datawarehouse Login
This is the credential configuration page for datawarehouse. Which decide destination of where you want to load the extracted data. This requires ssh credentials of the remote machine.
You have to fill up the following:-
SSH Hostname: - The host over which Hive/Hadoop is running. It may be a Host name or Host IP address.
SSH Port: - The port of the host over which Hive/Hadoop is running. By default it is 22.
SSH Username: - Username to login into your SSH Remote Login.
SSH Password: - Password for your SSH Remote Login.
Loading
Wait for Loading to completed it tooks several minutes. So you have to be patient at this time. You can skip the page any time but the process will remain continue until loading is completed.
After Completion of loading this kind of page will appear.
Hive Login
You can reach this page three way. At admin page click "Datawarehouse Login". Second In "ETL Process" MySQL Login you can skip to "Hive Login" by bottom left corner link. Third is after completion of successful loading you can skip to it.
Create Queries
Create simple, complex or multiline HIVE queries using HiveQL, a SQL-Like language. You should know this language to run such queries over Hive.
Execute Queries
Click on Execute Queries will bring a new page loaded with table of resulted query. If you don’t want to load you can click on Execute and download queries. This will download a Tsv file for you of resultant query.
Download Excel Sheet, Csv, Tsv
You can download result in three format XLS (Excel Sheet), CSV (Comma Separated) and TSV (tab Separated).
Tables, Graphs and Statistics
At bottom left corner of the page, you will find a drop down contains chart type. Chart type are line, area, column, spline and pie charts. This are different mode of graphical view available. Choose any one and click “Graphs & Charts”.
Downloads
https://modules.openmrs.org/#/show/165/etl-module
YouTube Walkthrough
Release Notes
First Official release 20 July, 2014
About
This module was developer by Vineet Kumar for OpenMRS in Google Summer of Code 2014.