Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Demo Video Added

...

  1. Manual Loading - Export MySQL Table as csv data and Import that csv data to Hive tables.
  2. Sqoop
  3. Tungsten Replicator

 

 

Manual via CSV

Sqoop

Tungsten Replicator

Process

Manual/ Scripted

Manual/ Scripted

Fully automated

Incremental Loading

Possible with DDL changes

Requires DDL changes

Fully supported

Latency

Full-load

Intermittent

Real-time

Extraction Requirements

Full table scan

Full and partial table scans

Low-impact binlog scan

 

 

...

I. After a ETL is performed, now for Second Stage we require two component
     1. EDW
     2. OpenMRS Predictive Analysis Module.
II. Performing Predictive Analysis - we can use Apache Mayhout for implementation.

...

Mockup UI

Mockup
Version4
Width600
NameQuery Editor

Mockup
Version1
Width600
NameTabularView

Project Timeline

  • 19 May -  22 May: Study MySQL DB Transformation using JSP
  • 22 May - 26 May: Study  Direct Web Remoting and Spring Controller
  • 26 May - 4 June. DWR implementation and back end designing using JavaScript and Java.
  • 5 June -  8 June:  Make a Web Wizard similar to mentioned Mockup UI.
  • 9 June - 12 June: jQuery Drag and Drop support for columns in table.
  • 12 June - 14 June: Hadoop, Hive, Thrift Study, Setup, Resource Collection.
  • 14 June - 15 June: Testing and Implementing Datawarehouse Login (Backend and Frontend).
  • 16 June - 22 June: Transform and Load Table from MySQL to Hive Datawarehouse. Implementing various Joins.
  • 23 June -  30 June: Add more feature like multiple table or database selection and Hive Table Editing.1 29 June: Loading data to Hive using ssh and thrift
  • 30 June - 6 July: Test and execute Hive queries using ssh, hive-jdbc and thrift in OpenMRS Module.
  • 7 July -  13 July: Get results of queries on page in interactive manner.
  • 14 July - 28 July: Studying Apache Mahout, Implementaion and Web Interface
  • 28 July - 18 August: Code Fixing, Bug Solving and Later Finishing

Demo Video

Image Added

Resources

  1. Data Warehousing : http://en.wikipedia.org/wiki/Data_warehouse
  2. ETL ( Extraction - Transformation - Loading) : http://en.wikipedia.org/wiki/Extract,_transform,_load
  3. Apache Hive : http://hive.apache.org/
  4. Apache Mahout :https://mahout.apache.org/