Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

ETL via the Apache Hive is implemented by connecting to the MySQL OpenMRS database and reading tables information, and writing them into a file in HDFS.

 

MySQL to HDFS IntegrationImage Modified

 

Figure 1: MySQL to HDFS Integration

...

Databases are mapped as separate directories, with their tables mapped as sub-directories with a Hive data warehouse directory. Data inserted into each table is written into text files (named as datafile1.txt) in Hive / HDFS. Data can be in comma separated format; or any other, that is configurable by command line arguments.

Mapping between MySQL and HDFS SchemaImage Modified

Figure 2: Mapping between MySQL and HDFS Schema

...

First we try to perform a sample ETL from a MySQL DB to another MySQL DB and Export it to Hive.

Mockup
Version3
Width600
NameLogin - Source Database

Mockup
Version2
Width600
NameSelecting DB

Mockup
Version1
Width600
NameSelect DB Table

 

Mockup
Version4
Width600
NameSelecting Data - Null Select

Mockup
Version8
Width600
NameSelect Columns

Mockup
Version1
Width600
NameJoinning Condition

Mockup
Version2
Width600
NameDatawarehouse Login

Mockup
Version2
Width600
NameFinal Progress

B. Second Stage Process -
I. After a ETL is performed, now for Second Stage we require two component
1. EDW
2. OpenMRS Predictive Analysis Module.
II. Performing Predictive Analysis - we can use Apache Mayhout for implementation.

 

Project Timeline

  • 19 May -  1 June: Study and Perform MySQL DB Transformation in JSP
  • 2 June -  8 June: Transforming MockupUI to Web Interface in JSP
  • 9 June -  6 July: Studying About Hadoop, Hive, Apache Drift and implimentation
  • 7 July - 28 July: Studying Apache Mahout, Implementaion and Web Interface
  • 28 July - 18 August: Code Fixing, Bug Solving and Later Finishing

Extra Credit

1. Adding a UI interface for doing ETL similar to what Informatica has.
2. Providing a interactive UI to analyze the predictive modeling results coming out from the DW compliance.

...