Data Integrity Checks, Data Validation & Quality Assurance
- 1 1. Introduction
- 2 2. Data Integrity Overview
- 3 3. Data Validation
- 4 4. Data Quality Assurance (DQA)
- 5 6. Using Playwright for Data Validation & Quality Assurance in OpenMRS
- 6 7. Integrating Automated Data Quality Checks into CI/CD
- 6.1 Best Practices
- 6.2 Example CI/CD Flow
- 7 8. Best Practices and Recommendations
- 8 9. Conclusion
1. Introduction
In modern data-driven systems, data quality is critical to application reliability, analytics accuracy, regulatory compliance, and overall business decision-making. Ensuring data is accurate, consistent, complete, and trustworthy requires a combination of data integrity checks, data validation practices, and robust quality assurance (QA) workflows, often supported by automated testing frameworks.
This document outlines best practices for implementing data integrity and validation processes, and provides guidance on how automated testing—particularly using Playwright—can be used to verify data quality across APIs, UI components, and backend systems in openmrs.
2. Data Integrity Overview
2.1 What is Data Integrity?
Data integrity refers to the accuracy, consistency, and reliability of data throughout its lifecycle—from creation and storage to transmission and retrieval.
Good data integrity ensures that:
Data is free of corruption
Data is consistent across systems
Data remains unchanged unless modified through controlled mechanisms
Unauthorized or accidental changes are prevented
2.2 Types of Data Integrity
a. Physical Integrity
Protection against hardware failures, data loss, storage corruption.
b. Logical Integrity
Ensuring data follows business rules, constraints, and relationships:
Entity integrity (unique keys, primary keys)
Referential integrity (foreign keys, relationships)
Domain integrity (valid ranges, formats, enumerations)
3. Data Validation
3.1 What is Data Validation?
Data validation ensures that input or processed data adheres to defined rules before being accepted into a system.
3.2 Common Data Validation Techniques
Format validation (email, phone numbers, UUIDs)
Range and boundary checks (ages, dates, numeric thresholds)
Type validation (integer, string, boolean)
Uniqueness checks
Cross-field validation
Reference validation (matching foreign keys)
Business rule validation (domain-specific logic)
3.3 When Should Validation Happen?
Client-side (UI)
API gateway / backend
Database layer
ETL pipelines
Data ingestion and transformation processes
4. Data Quality Assurance (DQA)
4.1 Goals of DQA
Ensure trustworthy analytics and reports
Prevent data corruption or loss
Maintain compliance
Maintain reliable system behavior
4.2 Components of a Strong QA Strategy
a. Data Profiling
Understanding data distributions, anomalies, missing values.
b. Data Quality Metrics
Completeness
Accuracy
Timeliness
Uniqueness
Consistency
Validity
c. Automated Quality Checks
ETL validation tests
Schema validation tests
API response validation
Front-end UI validation
d. Governance & Monitoring
Versioning of schemas
Data freshness checks
Alerting for data drifts and anomalies
Automated Testing in Data Quality Workflows
Automated testing ensures that data integrity rules and validation logic work consistently across environments.
5.1 What Should Be Automated?
a. Data-Related Tests
Schema validation (JSON schema, DB schema)
Referential integrity (FK relationships)
Data type consistency checks
Business rule validation
Data transformation correctness
API response accuracy
b. System-Related Tests
Form and input validation in UIs
Data flows across systems (end-to-end)
Regression checks after deployments
6. Using Playwright for Data Validation & Quality Assurance in OpenMRS
Playwright is a powerful end-to-end automation tool ideal for verifying data accuracy across the OpenMRS UI, REST API, and clinical workflows.
6.1 Why Playwright for OpenMRS?
Works seamlessly with OpenMRS 3.x (O3) frontend
Supports both UI workflows and REST API calls
Validates patient registration, program enrollments, encounters, and observations
Auto-waits for network calls used heavily in O3 React components
Multi-browser support for ensuring OpenMRS runs well across environments
Easy to integrate into OpenMRS CI/CD pipelines or QA workflows
6.2 Common Data-Focused Use Cases in OpenMRS with Playwright
1. UI Data Validation (OpenMRS 3.x Patient Registration & Forms)
Ensures OpenMRS UI components display correct and validated data.
Examples
Required field validation
await page.click('button:has-text("Submit")');
await expect(page.locator('[name="givenName"] ~ .error'))
.toHaveText("Given name is required");
Phone number or identifier format validation
await page.fill('[name="phoneNumber"]', '123');
await page.click('button:has-text("Submit")');
await expect(page.locator('.error-message'))
.toHaveText("Phone number must be at least 10 digits");Reference data validation
await expect(page.locator('#location-select')).toContainText("Outpatient Clinic");2. API Schema & Response Validation (OpenMRS REST API)
Validate patient response
const response = await request.get('/ws/rest/v1/patient/uuid-of-test-patient');
expect(response.status()).toBe(200);
const data = await response.json();
expect(data).toMatchObject({
uuid: expect.any(String),
person: {
gender: expect.stringMatching(/M|F/),
birthdate: expect.any(String)
}
});Validate encounter & obs datatypes
const response = await request.get('/ws/rest/v1/encounter/enc-uuid');
const encounter = await response.json();
const weightObs = encounter.obs.find(o => o.concept.uuid === WEIGHT_UUID);
expect(typeof weightObs.value).toBe("number");
3. Cross-System Data Integrity (UI → REST API)
Verifies that data submitted through the OpenMRS UI matches what is stored server-side.
Example:
await page.fill('[name="givenName"]', 'John');
await page.fill('[name="familyName"]', 'Doe');
await page.click('button:has-text("Save")');
const apiRes = await request.get('/ws/rest/v1/patient?q=John Doe');
const patient = await apiRes.json();
expect(patient.results[0].person.display).toBe("John Doe");
4. Indirect Database Validation
Playwright does not connect directly to the OpenMRS database but can verify backend behavior by:
Creating encounters from the UI
Fetching them via REST API
Ensuring obs, orders, and metadata match expected values
Confirming concepts and datatypes aren’t corrupted
7. Integrating Automated Data Quality Checks into CI/CD
Best Practices
Run schema validation tests on every pull request
Automate nightly ETL quality tests
Trigger Playwright end-to-end tests on every deploy
Use containerized test environments
Integrate test reporting (Allure, Playwright HTML Reporter)
Fail builds on data anomaly detection
Example CI/CD Flow
Developer pushes changes
Static validation (lint, schema validation)
API & business rule automated tests
Playwright end-to-end tests (UI + API)
DQA checks on ETL/warehouse
Deploy to staging
Deploy to production
8. Best Practices and Recommendations
For Data Validation
Validate at multiple layers (UI → API → DB)
Use centralized validation rules where possible
Version your schemas
For Automated Testing
Keep tests deterministic
Mock external systems only when necessary
Use data factories for test data
Prioritize E2E tests for mission-critical flows
For Data Quality Assurance
Monitor data in production
Track key data quality KPIs
Build feedback loops with engineering & analytics teams
9. Conclusion
Combining strong data integrity checks, data validation systems, and robust automated testing techniques—especially using Playwright—creates a reliable, scalable data quality framework.
This helps ensure:
Clean and trustworthy data
Reduced manual QA overhead
Faster development cycles
Improved system stability
Better business analytics and decision-making