Data Synchronization Project

Data Synchronization Project

This page refers to a Summer Of Code 2007 project. Documentation and code have been placed into the Sync Module for general use.

Background

Intern: Anders Gjendem
Mentor: Maros Cunderlik
Other developers: Christian Allen, Darius Jazayeri, Julie

Abstract: As described under the remote data entry project and mobile data collection projects, we have a need to synchronize local data storage with the central servers. Local data storage may entail a fully functioning OpenMRS instance or perhaps a scaled down version of the server. While a complete end-to-end synchronization solution featuring master-master replication, conflict resolution, contextual data replication, automatic schema updates is clearly beyond the scale of a summer project, significant contributions toward this critical feature could be made.

In the initial phase of this effort, we would like to implement data synchronization under the assumption that the remote data entry site would be able to add new patients and encounters but not modify existing records. As such, it is assumed that there will be only a single 'master' copy of data at any given point and thus conflict detection and resolution are not needed at this point. Consequently, the synchronization in this context can be viewed as export, transfer, and import (analogous to ETL techniques used in data warehousing). Unlike ETL (which generally deals with the problem domain of the one directional transform between the different E-R schemas and data semantics) however, it is assumed and expected that the advanced synchronization features will be added to OpenMRS in the near future. Thus the broader consideration and awareness of the challenges in bi-directional replication and contextual replication are desired and expected to be reflected in the proposed solution.

Project News

The latest source in SVN, branch: data_synchronization_bidirectional. We are preparing to move the branch to trunk however it will not happen until after 1.4 release. The branch code is running in Rwanda in production use. Pls. see the following presentation of the feature overview:

 

 

 

 

 

 

 

 

 

Unable to render embedded object: File (RITA_&_Sync.pptx) not found.

 

 

 

 

 

 

 

 

 

 

Task Backlog

Tasks to complete by 1.0 release:

  1. Darius 'one-click' process of creating a child database that’s going to sync to a given parent.

    1. status: in-progress: initial code committed, needs work

    2. This should also support ability to 'dump and refresh' client DB from parent (i.e. after exporting sync journal, rebuild client DB from parent)

    3. I assume we duplicate the parent database, give the child a new GUID, and that’s all?

    4. idea from Christian: it would automatically adjust global properties for the new instance, or at least prompt you for a minimal amount of changes to make.

  2. TBD Handle deletes in interceptor

  3. Maros Fix/replace the activateTransactionSerialization() behavior (doesn't work in case of session.merge() as in updateSyncImportRecord())

  4. TBD Documentation

    1. online user help: done

    2. TBDwiki detailed user docs

    3. TBD wiki detailed tech docs

  5. all Code/pkg clean-up.

    1. Maros Split add_sync.sql to DDL and DML; add server guid generation to it.

    2. Move several common files (i.e constants) from org.openmrs.synchronization.engine to org.openmrs.synchronization

    3. Clean-up/consolidate the state handling of sync transmission/sync record

    4. Comment the code

    5. Collectively look at/review bidi code (i.e. sending data in both direction): is it ready to be used?

    6. look into bug that Ben saw Maros will do

    7. Maros there seems to be a bug with syncTx serialization when exporting/processing form edits; i.e. I edited Form properties and corresponding syncTx failed to create valid XML file – some serialization issues most likely

  6. TBD LastRecordGuid/OriginalGuid - needs to be cleaned up/finished; as of now it is just a dup of SyncRecord.guid

  1. ALL Testing.

    1. Minimally, create a list of scenarios that need to be tested and passed to release sync

    2. Automated test suite. First, using latest in-mem db testing facilities, begin creating automated 'integration'/user scenario tests (i.e. add patient, sync, verify data came across). This will probably not get fully done by 1.0 release, but we should start

Tasks with unassigned priorities

  1. compression: SyncTx are getting large in a hurry

  2. journal/data archive

  3. Error handling/reporting to user in scheduled task

    1. If error occurs in scheduled sync, where should it be reported? Currently, it will throw an exception on the background thread.

  4. Consolidation of the different hardcoded domain object instantiation schemes and metadata.

    1. For example:

      1. in SyncUtil.updateOpenmrsObject, .getOpenmrsObj()

      2. in SynchronizationHistoryListController.referenceData()

      3. SyncClass

      4. HibernateSynchronizationDAO.createDatabaseForChild

    2. Priority: post 1.0 release most likely?

Post 1.0 release: