Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: More details for exporting data from MySQL to Hive according to my studies

...

IV. Selection of EDW - Using Apache Hive. Apache Hive is probably the best way to store data in Hadoop as it uses a table concept and have a SQL like language, HiveQL

 

What Is Hadoop, Exactly?

  • A distributed file system
  • A method of processing massive quantities of data in parallel
  • The Cutting family's stuffed elephant
  • All of the above

 

Traditional Hadoop Deployments

  • Data Analytics
    • Single databases
    • Collective databases
    • Databases and external information
    • Non-structured data
    • Long term datastores and archiving

MySQL to Hive ETL

ETL via the Apache Hive is implemented by connecting to the MySQL OpenMRS database and reading tables information, and writing them into a file in HDFS.

...

Figure 2: Mapping between MySQL and HDFS Schema


Exporting Tables From MySQL To Hive

  1. Manual Loading - Export MySQL Table as csv data and Import that csv data to Hive tables.
  2. Sqoop
  3. Tungsten Replicator

 

Manual via CSV

Sqoop

Tungsten Replicator

Process

Manual/ Scripted

Manual/ Scripted

Fully automated

Incremental Loading

Possible with DDL changes

Requires DDL changes

Fully supported

Latency

Full-load

Intermittent

Real-time

Extraction Requirements

Full table scan

Full and partial table scans

Low-impact binlog scan

 

 

 

 

 

 

 


MockUp UI

First we try to perform a sample ETL from a MySQL DB to another MySQL DB and Export it to Hive.

...