...
ETL via the Apache Hive is implemented by connecting to the MySQL OpenMRS database and reading tables information, and writing them into a file in HDFS.
Figure 1: MySQL to HDFS Integration
...
Databases are mapped as separate directories, with their tables mapped as sub-directories with a Hive data warehouse directory. Data inserted into each table is written into text files (named as datafile1.txt) in Hive / HDFS. Data can be in comma separated format; or any other, that is configurable by command line arguments.
Figure 2: Mapping between MySQL and HDFS Schema
...
First we try to perform a sample ETL from a MySQL DB to another MySQL DB and Export it to Hive.
Mockup | ||||||||
---|---|---|---|---|---|---|---|---|
|
Mockup | ||||||||
---|---|---|---|---|---|---|---|---|
|
Mockup | ||||||||
---|---|---|---|---|---|---|---|---|
|
Mockup | ||||||||
---|---|---|---|---|---|---|---|---|
|
Mockup | ||||||||
---|---|---|---|---|---|---|---|---|
|
Mockup | ||||||||
---|---|---|---|---|---|---|---|---|
|
Mockup | ||||||||
---|---|---|---|---|---|---|---|---|
|
Mockup | ||||||||
---|---|---|---|---|---|---|---|---|
|
B. Second Stage Process -
I. After a ETL is performed, now for Second Stage we require two component
1. EDW
2. OpenMRS Predictive Analysis Module.
II. Performing Predictive Analysis - we can use Apache Mayhout for implementation.
Project Timeline
- 19 May - 1 June: Study and Perform MySQL DB Transformation in JSP
- 2 June - 8 June: Transforming MockupUI to Web Interface in JSP
- 9 June - 6 July: Studying About Hadoop, Hive, Apache Drift and implimentation
- 7 July - 28 July: Studying Apache Mahout, Implementaion and Web Interface
- 28 July - 18 August: Code Fixing, Bug Solving and Later Finishing
Extra Credit
1. Adding a UI interface for doing ETL similar to what Informatica has.
2. Providing a interactive UI to analyze the predictive modeling results coming out from the DW compliance.
...