Platform Team Meeting Notes 2024

2024-06-19

2024-06-12

  • Performance issues

    • Palladium Kenya is currently deploying recent changes that Ian made

    • Some calls for obs have sorting at the API level that the REST API does not use

      • Does not have natural sorting, so order of results may vary

    • Some calls don’t have sorting capability at the API level (e.g., orders)

    • Some calls don’t have sorting at API level, but does include some natural sorting (e.g., visits)

    • https://openmrs.atlassian.net/browse/O3-3386

      • Search term

      • Observations within a specific concept or concept set

      • Currently, in O3 Visit Summary tab, we are making a call like:

        https://o3.openmrs.org/openmrs/ws/rest/v1/visit?patient=64d1f948-2535-4788- b585-8d338b4ca0de&v=custom:(uuid,encounters:(uuid,diagnoses:(uuid,display, rank,diagnosis),form:(uuid,display),encounterDatetime,orders:full,obs:full, encounterType:(uuid,display,viewPrivilege,editPrivilege),encounterProviders: (uuid,display,encounterRole:(uuid,display),provider:(uuid,person:(uuid, display)))),visitType:(uuid,name,display),startDatetime,stopDatetime,patient, attributes:(attributeType:ref,display,uuid,value)&limit=5
      • This is complicated query that covers a lot of data that is not initially rendered. Ideally, we would have a custom endpoint where we could get only the data needed for the initial view with references to data needed for expansion when needed.

      • Approach? Create a new view or new resource?

        • One option would be to define a new view for this specific need, e.g., fetch visits with v=summary (or something similar). It would be nice to follow any existing convention for this rather than to create a one-off solution.

        • The other option, following obstree approach, would be to create a separate visitsummary resource to meet this need.

      • Currently, we feel that following the existing convention of obstree of creating a new resource to meet the specific application/business need is cleaner than creating a one-off custom view. So, we plan to create visitsummary.

      • The new visitsummary resource would return an paged array of visits similar to the custom call mentioned above; however, we would want to avoid preloading all data for all encounters, observations, etc. Figuring out what level of data is needed and what calls can be used to complete the views in a performant way in O3 will take some additional discovery & discussion. For example, considering the image below, we would want all the data needed for the green parts of the view along with, perhaps, links to the API endpoints to get data to fill in the blue part of the view:

        image-20240612-161130.png



2024-06-05

  • @dkayiwa still working to find time with @Antony Ojwang to identify slow requests

  • Performance issues

    • Most calls in O3 are using FHIR (only a few use OpenMRS custom REST API), which already supports filtering and sorting.

      • Very few (if any) take advantage of sorting.

      • Most use a large n to load 100+ results

    • Will focus on adding missing support (e.g., for sorting) on existing endpoints that are being used by the frontend. If there is something that can be just as easily migrated to a FHIR endpoint, that would be preferable; however, in the many cases where supporting sorting in the OpenMRS REST API is a trivial fix, we will do that.

    • We should focus on supporting as many sorting parameters as we can, but not support sorting on every property of every resource (would require too many indexes).

      • FHIR specs talk about unsupported sorting parameters, suggesting that unknown or unsupported parameters be ignored by default unless the client specifies handling=strict

    • We’ve not some places in the frontend where resources are called multiple times (e.g., patient resource or session endpoint)

      • Per @Ian Bacher, the plan is to introduce some custom caching (for 100s of milliseconds) to avoid repeating requesting the same resource multiple times in a single operation

    • PR for spa module set cachet headers to reduce unnecessary downloading of code

    • Still need an endpoint to receive errors/exceptions from clients

      • Would ideally be able to capture catastrophic errors on the client (things that break the Spa module) and report these errors (post the exceptions) to the server

        • Use case: people in the field not uncommonly find OpenMRS 3 gets into an unusable state and have learned to solve the problem by clearing their cache. While this may get them functional again, it requires their client to re-download all the same code again, slowing down the client and wasting bandwidth. If critical exceptions occurring in the client were logged on the server, it would increase the chance of identifying & fixing the causes so the number of times the client gets into an unusable state heads toward zero.

      • Ideally, the server could tell the client which log level to report (e.g., ERROR, WARNING, INFO)

  • Tracking & prioritizing performance issues

    • Not discussed

2024-05-29

  • @dkayiwa working with @Antony Ojwang to identify high priority slow requests

  • One clear area for performance improvement is to support filtering & sorting within the REST API so the client doesn’t need to request all data from the server

    • Examples of large queries that could be improved by supporting sorting & filtering within the REST API:

      • Query for observations

      • Query for Vitals & Biometrics

    • Do we have tickets for these? If not, we should create 1-2 epics (e.g., Support server-side sorting, filtering, and paging of observations)

    • Strategy

      1. Enumerate the endpoints needed by the client (e.g., parameter supported for sorting, filtering, and paging) so the client can request a single page of data needed for display rather than all data to filter/sort locally.

      2. Build/refactor the needed endpoints in the Platform

      3. Refactor the frontend client to leverage the new endpoints, relying on the server to perform filtering, sorting, and paging of data rather than doing all of it in the client

      4. Determine the extent to which these changes needed to be backported to server implementation needs

    • Outstanding question: how does this affect offline mode? Either these features become unavailable in offline mode or the client would need to prefetch some data to support a scaled down version of these features.

  • @dkayiwa to define a process/approach to track prioritization & progress on performance-related issues

2024-05-22

  • Performance and bandwidth issues

    • @dkayiwa had discussions with Palladium Kenya and identified a number of performance issues issues

    • Looked into specific performance issues

    • Majority of bandwidth usage is for code that is unnecessarily reloaded. Caching can help, but the caching frequently needs to be cleared when pages don’t load completely.

      • If we could create an endpoint for receiving client-side errors, then its possible the SPA module could report errors to the server when errors occur

      • @Ian Bacher & @Antony Ojwang discussed trying to find a time when they could connect while Antony is in the field to do some live troubleshooting

    • Do we have to use FHIR? It sends more information than our custom REST API.

    • When the database has a lot of data (e.g., large number of observations), some queries perform more slowly.

      • Might be able to address these by improving indexing or queries/paging

    • There are multiple points in the application where full representations are unnecessarily requested, when a custom representation could perform much better (return less unnecessary information)

    • Old hardware can cause adverse performance

      • OpenMRS could publish hardware requirements

      • Make sure CI pipeline and developers are experiencing application that more closely reflects real world hardware

    • In some cases, multiple calls are made to handle a single operation where a single call would be more efficient.

  • Clustering

2024-05-15

  • Performance Issues

    • @Jan Flowers - working on finding “real-world” type data set for using in testing

      • other possible pathways - work with Palladium to work real time on troubleshooting together or via VPN, synthetic data (pros/cons)

    • @dkayiwa - will follow up with Antony to determine pathway for troubleshooting their issue they reported

    • Tracking/Prioritizing

      • Can we make an Epic at least? Grace is tagging

        • @Burke Mamlin making O3 chattiness Epic

      • How do we track the performance issues that are being reported

      • How do we make sure we are creating tickets for the performance issues we want to prioritize and focus on resolving; measure/track/target to resolve

      • E.g. Locations thread, supposedly fixed with indexing fix and closed, but with recent versions of Tomcat there is a noticeable slowness - is there a ticket for this and is it assigned to be addressed?

      • We are not in a situation where there is no actionable performance issues - Tomcat issue, and “chattiness” from O3 for Palladium

      • @Paul Biondich - can Daniel be responsible to driving the troubleshooting and resolving of OpenMRS performance issues

        • Daniel - challenges in troubleshooting to get to the point of creating epics/tickets

        • When Daniel can’t move something forward, should turn to Paul/Jan/Burke to help unblock and problem solve

        • Create momentum through shared responsibility for solving problems - holding folks to commitments for follow up, pinging when someone doesn’t follow up, etc.

  • Billing/Stock Management Module

    • @dkayiwa - working with ___(?) to generalize module that was harvested from Banda Health

  • Docker Images for recent JDKs

    • @raff - JDK 11 and 17? Ready for the master build, will backport for 2.6 and 2.5 release lines

  • Cloud hosting architecture

    • Looking into cluster containers and drafting architecture and approach for cloud based deployment of OpenMRS3, started talk post - waiting for feedback; will start R&D on this approach next week

      • MVP definition - request for OpenMRS to be run on multi-tenant environment

        • multiple instances for multiple facilities in a cluster, via kubernetes with centralized platform for deployment with monitoring

        • advising for AWS, Azure, etc deployment

        • not just about scaling the API, but also about the backend db - kubernetes supports the cluster of db, instances, but more work needed on the API

      • Goal: get to the point that this is a “best practice” approach and is a straight forward recipe/lift for implementing

  • Auto de-activation of users / timeouts - @isaiahmuli

    • Reviewing code and sorting through questions for Daniel

    • Need guidance on how improvements are made at code level, pointers to documentation

    • @Burke Mamlin use forums (talk and slack) as much as possible in public way so that others can help support (not just @dkayiwa directly), also improves knowledge base for others to get set up; edit documentation, point out gaps and problems, as you go through things

  • PM support for Platform/Backend

    • Can @jmwiinga spend some time helping here? Jeremiah and Jan to follow up to determine how he could help