...
Peform ETL in web app from MySQL to Hadoop and perform predictive analysis over it.
Approach
I am dividing the whole project in 2 stage Process
A. First Stage Process -
I. ETL can performed in following ways -
...
3. OpenMRS ETL Tool-kit - This is separate application that doesn't require the running of server. It will have the same functionality as first way integration with some extending capability. The benefit of this method is we can also use other cross platform implementation other than java like in C# for performing ETL. This gives the independency of language selection.
IV. Selection of EDW - Using Apache Hive. Apache Hive is probably the best way to store data in Hadoop as it uses a table concept and have a SQL like language, HiveQL
MySQL to Hadoop ETL
...
ETL via the Apache Hive is implemented by connecting to the MySQL OpenMRS database and reading tables information, and writing them into a file in HDFS.
...
- Data Warehousing : http://en.wikipedia.org/wiki/Data_warehouse
- ETL ( Extraction - Transformation - Loading) : http://en.wikipedia.org/wiki/Extract,_transform,_load
- Apache Hive : http://hive.apache.org/
- Apache Mahout :https://mahout.apache.org/
...
1. Integration in the Web Application (Dynamic) - We can perform ETL in OpenMRS web application at some regular interval of time. Where a JS library is required which is integrated in JSP pages.
...
- /
...
3. OpenMRS ETL Tool-kit - This is separate application that doesn't require the running of server. It will have the same functionality as first way integration with some extending capability. The benefit of this method is we can also use other cross platform implementation other than java like in C# for performing ETL. This gives the independency of language selection.
...