Data Synchronization Project
This page refers to a Summer Of Code 2007 project. Documentation and code have been placed into the Sync Module for general use.
Background
Intern: Anders Gjendem
Mentor: Maros Cunderlik
Other developers: Christian Allen, Darius Jazayeri, Julie
Abstract: As described under the remote data entry project and mobile data collection projects, we have a need to synchronize local data storage with the central servers. Local data storage may entail a fully functioning OpenMRS instance or perhaps a scaled down version of the server. While a complete end-to-end synchronization solution featuring master-master replication, conflict resolution, contextual data replication, automatic schema updates is clearly beyond the scale of a summer project, significant contributions toward this critical feature could be made.
In the initial phase of this effort, we would like to implement data synchronization under the assumption that the remote data entry site would be able to add new patients and encounters but not modify existing records. As such, it is assumed that there will be only a single 'master' copy of data at any given point and thus conflict detection and resolution are not needed at this point. Consequently, the synchronization in this context can be viewed as export, transfer, and import (analogous to ETL techniques used in data warehousing). Unlike ETL (which generally deals with the problem domain of the one directional transform between the different E-R schemas and data semantics) however, it is assumed and expected that the advanced synchronization features will be added to OpenMRS in the near future. Thus the broader consideration and awareness of the challenges in bi-directional replication and contextual replication are desired and expected to be reflected in the proposed solution.
Project News
The latest source in SVN, branch: data_synchronization_bidirectional. We are preparing to move the branch to trunk however it will not happen until after 1.4 release. The branch code is running in Rwanda in production use. Pls. see the following presentation of the feature overview:
Unable to render embedded object: File (RITA_&_Sync.pptx) not found.
Task Backlog
Tasks to complete by 1.0 release:
Darius 'one-click' process of creating a child database that’s going to sync to a given parent.
status: in-progress: initial code committed, needs work
This should also support ability to 'dump and refresh' client DB from parent (i.e. after exporting sync journal, rebuild client DB from parent)
I assume we duplicate the parent database, give the child a new GUID, and that’s all?
idea from Christian: it would automatically adjust global properties for the new instance, or at least prompt you for a minimal amount of changes to make.
TBD Handle deletes in interceptor
Maros Fix/replace the activateTransactionSerialization() behavior (doesn't work in case of session.merge() as in updateSyncImportRecord())
TBD Documentation
all Code/pkg clean-up.
Maros Split add_sync.sql to DDL and DML; add server guid generation to it.
Move several common files (i.e constants) from org.openmrs.synchronization.engine to org.openmrs.synchronization
Clean-up/consolidate the state handling of sync transmission/sync record
Comment the code
Collectively look at/review bidi code (i.e. sending data in both direction): is it ready to be used?
look into bug that Ben saw Maros will do
Maros there seems to be a bug with syncTx serialization when exporting/processing form edits; i.e. I edited Form properties and corresponding syncTx failed to create valid XML file – some serialization issues most likely
TBD LastRecordGuid/OriginalGuid - needs to be cleaned up/finished; as of now it is just a dup of SyncRecord.guid
ALL Testing.
Minimally, create a list of scenarios that need to be tested and passed to release sync
Automated test suite. First, using latest in-mem db testing facilities, begin creating automated 'integration'/user scenario tests (i.e. add patient, sync, verify data came across). This will probably not get fully done by 1.0 release, but we should start
Tasks with unassigned priorities
compression: SyncTx are getting large in a hurry
journal/data archive
Error handling/reporting to user in scheduled task
If error occurs in scheduled sync, where should it be reported? Currently, it will throw an exception on the background thread.
Consolidation of the different hardcoded domain object instantiation schemes and metadata.
For example:
in SyncUtil.updateOpenmrsObject, .getOpenmrsObj()
in SynchronizationHistoryListController.referenceData()
SyncClass
HibernateSynchronizationDAO.createDatabaseForChild
Priority: post 1.0 release most likely?
Post 1.0 release: