Platform Team Meeting Notes 2024

2024-07-24

Performance improvements
- Added visit parameter to fetching of orders and observations (these are only in Platform 2.7)
- Adding features to RESTWS module
If we make changes to the Platform (e.g., in the latest version), how do we want to introduce these features to O3?
- Do we set up a “special server” to test out changes? Is it something temporary or do we want something more permanent?
- Using dev3 also theoretically means we can get the added benefit of leveraging Jayasanka’s load testing infrastructure (in theory)
OpenMRS 3.1 release planning – are there any modules that should be bumped?
- Will plan on pulling in latest released version of each module as part of preparing release

2024-07-10

QA Refapp is showing liquibase errors
Audit logging
- Tried creating an audit log using hibernate interceptors; however it logs only changes to data (inserts, edits, deletes), but not reads.
- We recognize there is a need for access logging (i.e., auditing who is looking at whose data). Burke gave the simple questions like (1) who has accessed this patient’s record over this time period? and (2) which patients' records has this user accessed over this time period? It’s likely implementers will have more requirements for their local security needs (e.g., being able to generate a report that summarizes access or identifies red flags).
  @dkayiwa will start a discussion on gathering requirements on access logging
@Njidda Salifu pointed out a couple of issues
- Docker nginx configuration limiting the size of data uploaded (proposed fix)
- SDK does not offer options for location when logging in, because there are no locations defined in the database
@Manoj Rathnapriya shared some errors when getting Flag resource (http://localhost:8080/openmrs/ws/fhir2/R4/Flag/d073f2af-779d-4160-b2a4-3aa87486cf6b)
- @Ian Bacher said it looks like something isn’t getting registered properly. It looks like the FHIR2 module was not listed as a dependency in a new maven submodule in the pom.xml. Adding a dependency on the patient-flags-fhir module should fix the problem.

2024-06-26

Performance
- Visit Summary
  - @dkayiwa investigated whether diagnoses can be loaded for all encounters within a visit without loading all the other encounter data from the database and found this doesn’t exist; however, realized that, instead of creating a new visitsummary resource to get around this, a simpler solution could be to add encounter diagnoses as a property of the visit – i.e., asking for visit diagnoses would efficiently return all diagnoses across all encounters in the visit without having to fetch every encounter.
    @dkayiwa to link ticket for this work into these notes
- Startup performance
  - Startup currently takes about 10 minutes. A lot of it appears to be a “ridiculous” number of setup steps for the first setup while only warnings for teleconsultation setup are being reported. Need to investigate cause.

2024-06-19

Performance
- Investigating approach to improving performance of the Visits views for O3 (Visit Summary and All Encounters tabs)
- TODO: Need a ticket for this work (@dkayiwa will make a ticket for this)
Working on cybersecurity issues from recent penetration testing
- Working on XStream whitelisting
  https://openmrs.atlassian.net/browse/TRUNK-6188
- Will need create releases for affected modules prior to repeat testing.
Reviewed progress for 2024 for report to funders

2024-06-12

Performance issues
- Palladium Kenya is currently deploying recent changes that Ian made
- Some calls for obs have sorting at the API level that the REST API does not use
  - Does not have natural sorting, so order of results may vary
- Some calls don’t have sorting capability at the API level (e.g., orders)
- Some calls don’t have sorting at API level, but does include some natural sorting (e.g., visits)
- https://openmrs.atlassian.net/browse/O3-3386
  - Search term
  - Observations within a specific concept or concept set
  - Currently, in O3 Visit Summary tab, we are making a call like:
    https://o3.openmrs.org/openmrs/ws/rest/v1/visit?patient=64d1f948-2535-4788- b585-8d338b4ca0de&v=custom:(uuid,encounters:(uuid,diagnoses:(uuid,display, rank,diagnosis),form:(uuid,display),encounterDatetime,orders:full,obs:full, encounterType:(uuid,display,viewPrivilege,editPrivilege),encounterProviders: (uuid,display,encounterRole:(uuid,display),provider:(uuid,person:(uuid, display)))),visitType:(uuid,name,display),startDatetime,stopDatetime,patient, attributes:(attributeType:ref,display,uuid,value)&limit=5
  - This is complicated query that covers a lot of data that is not initially rendered. Ideally, we would have a custom endpoint where we could get only the data needed for the initial view with references to data needed for expansion when needed.
  - Approach? Create a new view or new resource?
    - One option would be to define a new view for this specific need, e.g., fetch visits with v=summary (or something similar). It would be nice to follow any existing convention for this rather than to create a one-off solution.
    - The other option, following obstree approach, would be to create a separate visitsummary resource to meet this need.
  - Currently, we feel that following the existing convention of obstree of creating a new resource to meet the specific application/business need is cleaner than creating a one-off custom view. So, we plan to create visitsummary.
  - The new visitsummary resource would return an paged array of visits similar to the custom call mentioned above; however, we would want to avoid preloading all data for all encounters, observations, etc. Figuring out what level of data is needed and what calls can be used to complete the views in a performant way in O3 will take some additional discovery & discussion. For example, considering the image below, we would want all the data needed for the green parts of the view along with, perhaps, links to the API endpoints to get data to fill in the blue part of the view:

2024-06-05

@dkayiwa still working to find time with @Antony Ojwang to identify slow requests
Performance issues
- Most calls in O3 are using FHIR (only a few use OpenMRS custom REST API), which already supports filtering and sorting.
  - Very few (if any) take advantage of sorting.
  - Most use a large n to load 100+ results
- Will focus on adding missing support (e.g., for sorting) on existing endpoints that are being used by the frontend. If there is something that can be just as easily migrated to a FHIR endpoint, that would be preferable; however, in the many cases where supporting sorting in the OpenMRS REST API is a trivial fix, we will do that.
- We should focus on supporting as many sorting parameters as we can, but not support sorting on every property of every resource (would require too many indexes).
  - FHIR specs talk about unsupported sorting parameters, suggesting that unknown or unsupported parameters be ignored by default unless the client specifies handling=strict
- We’ve not some places in the frontend where resources are called multiple times (e.g., patient resource or session endpoint)
  - Per @Ian Bacher, the plan is to introduce some custom caching (for 100s of milliseconds) to avoid repeating requesting the same resource multiple times in a single operation
- PR for spa module set cachet headers to reduce unnecessary downloading of code
- Still need an endpoint to receive errors/exceptions from clients
  - Would ideally be able to capture catastrophic errors on the client (things that break the Spa module) and report these errors (post the exceptions) to the server
    - Use case: people in the field not uncommonly find OpenMRS 3 gets into an unusable state and have learned to solve the problem by clearing their cache. While this may get them functional again, it requires their client to re-download all the same code again, slowing down the client and wasting bandwidth. If critical exceptions occurring in the client were logged on the server, it would increase the chance of identifying & fixing the causes so the number of times the client gets into an unusable state heads toward zero.
  - Ideally, the server could tell the client which log level to report (e.g., ERROR, WARNING, INFO)
Tracking & prioritizing performance issues
- Not discussed

2024-05-29

@dkayiwa working with @Antony Ojwang to identify high priority slow requests
One clear area for performance improvement is to support filtering & sorting within the REST API so the client doesn’t need to request all data from the server
- Examples of large queries that could be improved by supporting sorting & filtering within the REST API:
  - Query for observations
  - Query for Vitals & Biometrics
- Do we have tickets for these? If not, we should create 1-2 epics (e.g., Support server-side sorting, filtering, and paging of observations)
- Strategy
  1. Enumerate the endpoints needed by the client (e.g., parameter supported for sorting, filtering, and paging) so the client can request a single page of data needed for display rather than all data to filter/sort locally.
  2. Build/refactor the needed endpoints in the Platform
  3. Refactor the frontend client to leverage the new endpoints, relying on the server to perform filtering, sorting, and paging of data rather than doing all of it in the client
  4. Determine the extent to which these changes needed to be backported to server implementation needs
- Outstanding question: how does this affect offline mode? Either these features become unavailable in offline mode or the client would need to prefetch some data to support a scaled down version of these features.
@dkayiwa to define a process/approach to track prioritization & progress on performance-related issues

2024-05-22

Performance and bandwidth issues
- @dkayiwa had discussions with Palladium Kenya and identified a number of performance issues issues
- Looked into specific performance issues
- Majority of bandwidth usage is for code that is unnecessarily reloaded. Caching can help, but the caching frequently needs to be cleared when pages don’t load completely.
  - If we could create an endpoint for receiving client-side errors, then its possible the SPA module could report errors to the server when errors occur
  - @Ian Bacher & @Antony Ojwang discussed trying to find a time when they could connect while Antony is in the field to do some live troubleshooting
- Do we have to use FHIR? It sends more information than our custom REST API.
  - Investigate FHIR’s GraphQL
- When the database has a lot of data (e.g., large number of observations), some queries perform more slowly.
  - Might be able to address these by improving indexing or queries/paging
- There are multiple points in the application where full representations are unnecessarily requested, when a custom representation could perform much better (return less unnecessary information)
- Old hardware can cause adverse performance
  - OpenMRS could publish hardware requirements
  - Make sure CI pipeline and developers are experiencing application that more closely reflects real world hardware
- In some cases, multiple calls are made to handle a single operation where a single call would be more efficient.
Clustering
- Created page: O3 Cluster and Cloud Deployments

2024-05-15

Performance Issues
- @Jan Flowers - working on finding “real-world” type data set for using in testing
  - other possible pathways - work with Palladium to work real time on troubleshooting together or via VPN, synthetic data (pros/cons)
- @dkayiwa - will follow up with Antony to determine pathway for troubleshooting their issue they reported
- Tracking/Prioritizing
  - Can we make an Epic at least? Grace is tagging
    - @Burke Mamlin making O3 chattiness Epic
  - How do we track the performance issues that are being reported
  - How do we make sure we are creating tickets for the performance issues we want to prioritize and focus on resolving; measure/track/target to resolve
  - E.g. Locations thread, supposedly fixed with indexing fix and closed, but with recent versions of Tomcat there is a noticeable slowness - is there a ticket for this and is it assigned to be addressed?
  - We are not in a situation where there is no actionable performance issues - Tomcat issue, and “chattiness” from O3 for Palladium
  - @Paul Biondich - can Daniel be responsible to driving the troubleshooting and resolving of OpenMRS performance issues
    - Daniel - challenges in troubleshooting to get to the point of creating epics/tickets
    - When Daniel can’t move something forward, should turn to Paul/Jan/Burke to help unblock and problem solve
    - Create momentum through shared responsibility for solving problems - holding folks to commitments for follow up, pinging when someone doesn’t follow up, etc.
Billing/Stock Management Module
- @dkayiwa - working with ___(?) to generalize module that was harvested from Banda Health
Docker Images for recent JDKs
- @raff - JDK 11 and 17? Ready for the master build, will backport for 2.6 and 2.5 release lines
Cloud hosting architecture
- Looking into cluster containers and drafting architecture and approach for cloud based deployment of OpenMRS3, started talk post - waiting for feedback; will start R&D on this approach next week
  - MVP definition - request for OpenMRS to be run on multi-tenant environment
    - multiple instances for multiple facilities in a cluster, via kubernetes with centralized platform for deployment with monitoring
    - advising for AWS, Azure, etc deployment
    - not just about scaling the API, but also about the backend db - kubernetes supports the cluster of db, instances, but more work needed on the API
  - Goal: get to the point that this is a “best practice” approach and is a straight forward recipe/lift for implementing
Auto de-activation of users / timeouts - @isaiahmuli
- Reviewing code and sorting through questions for Daniel
- Need guidance on how improvements are made at code level, pointers to documentation
- @Burke Mamlin use forums (talk and slack) as much as possible in public way so that others can help support (not just @dkayiwa directly), also improves knowledge base for others to get set up; edit documentation, point out gaps and problems, as you go through things
PM support for Platform/Backend
- Can @jmwiinga spend some time helping here? Jeremiah and Jan to follow up to determine how he could help