Logic Service Design Thoughts

<html><head><title></title></head><body>reorganize|This is a work in progress by Burke & Vladimir

Upcoming Features

Introducing the LogicRequest

NOTE: we're considering using LogicContext instead of LogicRequest for the name of this class, since it more aptly covers the role of this class, not only buffering at any level but also providing a context in which index dates and global arguments can be scoped.

We will be adding a LogicRequest to the Logic Service as a behind-the-scenes, lightweight object to manage the buffering, program flow, and data scoped to a single request of the Logic Service from external code. The LogicRequest will allow for several enhancements:

*Request-level Buffering — by having a lightweight object shared by all code invoked within a single request of the logic service, we can perform buffering that is specific to a single request. For example, if an external process (web page, module, etc.) asks needs to evalute the HIV POSITIVE rule a set of patients — e.g., <tt>logicService.eval(patientSet, "HIV POSITIVE")</tt> — and the HIV POSITIVE rule invokes several other rules and requests from data sources, we can buffer data loaded or rules calculated across the entire cohort quickly and efficiently.
*Index dates — the LogicRequest will provide a means for us to manage the index date used within rules or criteria applied during a single request. For example, evaluating if the patient HIV POSITIVE on 22-Oct-2004 by issuing the same business logic (rules) while simply supplying all data as if today was 22-Oct-2004.
*Global variables scoped within the request — while we are supporting arguments passed to individual rules, having a lightweight object scoped to an individual request of the Logic Service (such that all downstream rules and data source fetches can see it) will allow us to easily define "global" arguments that can affect all components of the calculation.

Supporting Index Dates

All requests to the Logic Service need to support the notion of an index date. The index date is the date from which all criteria are applied. The default index date is today.

The purpose for the index date is to support data and rule-based queries such as "Was the patient DIABETIC on 25 October, 2004?" or "return all CD4 COUNTs within 6 months of a 14 July 2006." Such queries could be (and probably should still be) supported using optional date parameters or the between criteria — e.g., "return all CD4 COUNTs between 14 January and 14 July 2006." But the between criteria is not nearly as flexible and robust as having an index date that is passed down through all child queries. If I want to know if a patient is DIABETIC on 25 October, 2004 and the DIABETIC algorithm needs to know GLUCOSE results within the past year, then we want the request for GLUCOSE results to "magically" (without the DIABETIC rule needing to do specific coding) return values within the past year relative to 25 October, 2004.

At this time, the most natural approach to supporting an index date would be through LogicCriteria with an asOf(Date) method; however, I think we'll need to support the ability of a rule being able to discover the index date within any context — meaning that even when specific criteria are not used, we will need to pass along criteria with today as the index date by default.

Could setting an index date using an ''asOf()' method within a criteria be used to adjust the index date within the current logic request?

For example, imagine that I want to create a rule to determine HIV RISK. As part
of this rule, I need to know if the person had a HIV EXPOSURE two weeks prior (it
takes a couple weeks for the body to start building antibodies to the infection
and it's the antibodies that are detected by the test, so any test done within two
weeks of exposure is unreliable). So, my HIV RISK rule needs to eval HIV EXPOSURE
"as of" two weeks ago. Since HIV EXPOSURE calls two dozen rules, performs another
dozen data fetches, and then performs a complicated algorithm on the results to
reach an answer, we can't expect the rule author to code it so that it can provide
the answer for any date. Instead, we expect the author of HIV EXPOSURE to
properly determine the answer given today's data and then we surreptitiously
supply it data that are two weeks old as if the rule were being run two weeks
earlier – the magic of the index date. But now picture that I want to determine
HIV RISK as of 22-Oct-2006. That means that I need to set an index date of 22-Oct-2006 for the initial
request and then it will need to adjust the index date to 8-Oct-2006 when evaluating
HIV EXPOSURE. So, not only do we want to support an index date that applies to the
current eval() request, but we also want to allow the index date to be changed
midstream and then "pop" it back to original value for parallel requests.

Imagine HIV RISK needs to know gender, age, HIV exposure, and promiscuous
behavior to reach it's answer. HIV EXPOSURE needs age and whether the person had
unprotected sex within 6 weeks. So, here's the series of calculation with the
effective index date for each calculation:

  • eval(HIV RISK) as of 22-Oct
  • HIV RISK:
    • 22-Oct: eval(GENDER)
    • 22-Oct: eval(AGE)
    • 22-Oct: eval(HIV EXPOSURE) two weeks prior
      • HIV EXPOSURE
        • 8-Oct: eval(AGE)
        • 8-Oct: eval(UNPROTECTED SEX).within(Weeks(6))
    • 22-Oct: eval(PROMISCUOUS BEHAVIOR)

So for the final calculation, index date returns to its previous value – i.e.,
index date is scoped to a particular eval() request such that all evals or data
fetches performed to reach the result for the particular eval() use the same index
date.

If we had our lightweight request object...

  Result r = Context.getLogicService.eval(patients,
    new LogicCriteria("SOMETOKEN").asOf(11-Oct-2006));
  // In Logic Service Impl
  LogicRequest req = new LogicRequest();
  Rule rule = RuleFactory.getRule(criteria);
  Result r = new Result();
  for (patient : patients)
    r.add(rule.eval(req, patient, criteria));
  }
  return r;
 
  // In the rule
  ...
  Result r1 = req.eval(patient, "AGE");
  LogicDataSource ds = dataService.getPersonDataSource();
  Result r2 = req.getData(patient, ds, "GENDER");
  Result r3 = req.eval(patient,
    new LogicCriteria("HIV EXPOSURE").asOf(Duration.weeks(2))
  ...
 
  // In the LogicRequest
  public Result eval(patient, criteria) {
    if (buffer.contains(patient, criteria))
      return buffer.getResult(...);
    this.setIndexDate(criteria);
    Result r = RuleFactory.getRule(criteria)
      .eval(this, patient, criteria);
    buffer.add(...);
    return r;
  }
 
  public Result getData(patient, datasource, keyObject) {
    if (buffer.contains(patient, datasource, keyObject))
      return buffer.getResult(...);
    Result r = datasource.getData(patient, keyObject);
    buffer.add(...);
    return r;
  }

The LogicRequest would need to provide methods for getting results from rules or
data sources, but could then do per-request buffering – so data fetches are done
for all patients in a single db call – and could even detect & adjust for the
index date changes.

Supporting Token Tags

The Logic Service needs to support 0..n string "tag(s)" to be associated with any given token. We will also need methods to search for tokens by tag.

Examples

Here's what the additional methods for LogicService to support n-tags-per-token may look like:

void addRule(String token, String[] tags, Rule rule);

void addRule(String token, String[] tags, Class rule);

void addTokenTag(String token, String tag);

void removeTokenTag(String token, String tag);

String[] getTagsByToken(String token);

String[] getTokensByTag(String tag);

// Partial match search for token tags among all known tokens
String[] findTokenTags(String q);

Token tags don't necessarily have to be clinical categories. Tags might be used to organize tokens into groupings like "Patient Demographics" or "Concepts" or "Rules written by Vladimir." For testing, you could use tags like "Patient
Demographics" that would include things like name, birthdate, age, gender, address, etc. A "Lab Tests" group could include any concepts that are defined as lab test (e.g., CD4 COUNT, CREATININE, VIRAL LOAD, etc.). An "HIV TESTS" tag
group could contain CD4 COUNT, VIRAL LOAD, and your sample HIV POSITIVE rule. A "Vitals" tag group could contain PULSE, WEIGHT (KG), RESPIRATORY RATE, SYSTOLIC BLOOD PRESSURE and DIASTOLIC BLOOD PRESSURE.

Supporting TTL for Results

Both rules and data sources should be able to specify a TTL for results that is honored by any buffering mechanism. This would allow the AGE rule to indicate its result is valid for a day while the ACTIVE ORDERS rule might want to always recalculate its value (result expires immediately). The observation data source may want to indicate its results should be re-calculated after 30 minutes while the person demographics data source lets its results live for up to 4 hours in the buffer. These TTL requirements could be accomplished with int Rule.getDuration() and int DataSource.getDuration() methods (both returning TTL in seconds).

Logics Datatypes

The different flavors of Result have more to do with affecting the default behavior (i.e., operator methods) for results rather than specifying new datatypes. For example, I don't think there should be an OBS datatype. Observations values are either numeric, dates, text, Boolean, or coded. Now, I have been wanting us to support
both Person and Location as observation types as well...but Paul has resisted up to this point. Even if we remove Obs as a datatype, I would still not want to try and match the remaining Result extensions with datatypes, because other systems will need to code against these datatypes, so they need to represent the core, fairly static datatypes. All of the basic datatypes should be in this list (numeric, text, datetime, concept, Boolean). We can debates whether numeric
should be split into real & integer types or datetime should be split into datetime, date, and time types. Both
person and location are soft spots for me, since I'd like to see these eventually; however, I think it makes more sense for us to align with the datatypes currently provided in the data model (i.e., very close to the value types seen in the obs table).

Rather than making every module or service that consumes Logic Service results deal with all the possible domain objects (Person, Program, Obs, Encounter, Order, etc.), I was imagining that we would force all datatypes to the simple types listed above and then allow an object reference to be attached to each result. In this way, most code could deal with the simple datatypes; however, when someone retrieved a list of CD4 COUNT and knew what they were doing, they could get to the actual Obs objects through the reference — i.e., within something like (Obs)result.getObject(). In the same vein, we could make a ACTIVE PROGRAMS rule that fetched the list of active programs (studies/treatment programs) that a patient was enrolled in...the results returned could be a text datatype with the program name or a numeric datatype with the program ID which could be passed through any code that was oblivious to what a Program object was, but with the final result, we'd have to option of getting the actual Program for any result with a (Program)result.getObject() call. This would basically come down to a contract between the rule designer and the rule
consumer that would not be enforced by the Logic Service except to pass along these references when they were attached to results. For us, this would just mean supporting an extra result property that's a reference to an Object.

Now what about the PersonResult, ObsResult, etc? I would not think of these as datatypes themselves, but as a means to change the behavior of the Result methods in a way that makes these types of objects behave intuitively. Eventually, we may want each of our domain objects to extend Result (or change it to an interface if necessary) so that all of our domain objects (Obs, Person, Program, Order, etc.) could be returned as Result objects with no extra effort. But I didn't want to take this step until the Logic Service was well defined...so for now, the extra step of converting an Obs to an ObsResult is a fine tradeoff. So, the purpose of these Result extensions is to make sure that Result methods (e.g., contains(), exists(), before(), gt(), etc.) behave properly. If two types of objects would respond to these Result methods the same, then there's no need for two types of Results.

Insert non-formatted text here

Public Access to LogicDataService

Direct access to the logic data service layer will allow programmers to make direct calls to the logic data sources, for example:

// Assume we know demographic data source has a GENDER key
String gender = logicDataService.getDemographicDataSource().eval(person, "GENDER").toString();

This could be done as a peer of LogicService:

Context.getLogicDataService()

or be reached through the logic service:

Context.getLogicService().getDataService()

Rule References

Rule references provide a way of directly specifying a data source and a key in a simple string-based notation. This provides two primary beneifts:

  1. Programmers can reference data directly — e.g.
  2. :
    Context.getLogicService().eval(patient, "${DEMOGRAPHICS::GENDER}");
  3. Simple configuration files can be used to map data to tokens, e.g.
  4. :
    #: GENDER = DEMOGRAPHICS::GENDER
    #: WEIGHT = OBSERVATION::WEIGHT (KG)

We could use just about any notation to represent the data source and key within a string; however, we need to avoid potential confusion with existing concept names. We would also benefit from a notation to clearly distinguish references from simple tokens. To this end, I would propose something like <tt>${<Data Source Name>::<Key as String>}</tt> when passing references where tokens are used. When references are assumed, then simply <tt><Data Source Name>::<Key as String></tt> could be used. I'm suggesting the double colon (:(smile) separator instead of periods, since periods are commonly used within concept names and could be used in the keys of other data sources.

A single ReferenceRule class could be used to parse these references. Each logic data source would need to be "registered" with a speicific string name (or perhaps provide a LogicDataSource.getReference() that returns a String). When a token of the form <tt>${foo::bar}</tt> is requested, the RuleFactory could return a <tt>new ReferenceRule("foo::bar")</tt>. Likewise, when registering a token with the logic service with a reference (String) as the rule — e.g., like <tt>logicService.add("GENDER", "DEMOGRAPHICS::GENDER");</tt> — then RuleFactory would map the "GENDER" token to the string "DEMOGRAPHICS::GENDER" only replacing this with <tt>new ReferenceRule("DEMOGRAPHICS::GENDER")</tt> when the token is evaluated for the first time (so registering a hundred tokens to data references would only result in a String-to-String map within RuleFactory...to reduce time & memory usage).

Design Concerns

Distinguishing Between Tokens and Data Source Keys

I worry about using tokens directly to look up concepts by name (e.g. HibernateObsDAO#206).

Where do data source methods go?

In the end, with a layered design, the location of data source methods doesn't really matter, because it can evolve over time without affecting the upper layers of the logic service. The choice is whether to incorporate logic-specific methods into existing services versus creating separate DAO interfaces/implementations under the logic packages. Either is reasonable at this point; however, if multiple methods are needed or the code grows significantly, then there may be advantages to consolidating this code within the logic service packages.

Overloading Tokens

I hadn't considered the possibility of mapping more than one token to a single rule. The fact that the token is provided at the root of the LogicCriteria makes it possible for a rule to know which token was used to invoke it. This is an interesting possibility...I'll put some more thought into it and include them here.

Efficiently Supporting Rules For Cohorts

In the earlier designs for the Logic Service, I imagined that all rules would need to pass their child rule or data requests through a LogicContext that would ensure that data requests across a cohort would be done in one call to the database layer instead of making separate database calls for each patient. While the existing buffering mechanisms might help avoid duplicate calls to the database layer, a single buffer for all calls to the logic service may introduce more overhead than necessary without maximizing the benefits.

For example, imagine Person A wants to calculate the HIV POSITIVE rule for 8000 patients and, at nearly the same time on a different client, Person B wants to calculate the maximum CD4 COUNT on 14-July-2004 for 10000 other patients. Since all of the calculations for Person B's request are using an index date of 14-July-2004, the LogicCriteria differ even when CD4 COUNT are calculated for Person A's HIV POSITIVE rule and so results cannot be shared within the buffer. A central buffering service would need to do a lot of unnecessary lookups as these queries were processed to check for potential buffered results. On the other hand, if we had a buffer per Logic Service request – i.e., two buffers contexts in this example – then the buffers would be specific to the cohort and the Logic Service would only need to check to see if any particular data request had been buffered or not without needing to check on a per-patient basis. For example, when fetching the CD4 COUNT for Person B's 10000 patients, the central buffering service would need to check for buffered results for each of those 10000 patients among the buffered values from Person A's 8000-patient request. On the other hand, when fetching the CD4 COUNT for Person B's 10000 patients in a buffer-per-request model, the service would only need to know whether it had previously fetched CD4 COUNT for the current request (one check instead of 1000s).

Applying Business Logic to Cohorts

When applying the business logic of a rule to a cohort of patients, I would rather have the Logic Service simply loop over the cohort and evaluate the rule for each member rather than asking this of every rule – i.e., eliminating the eval(PatientSet, ...) method for Rules; rather, simply loop over the eval(Patient, ...) method from outside of the rule. Of course, to do this efficiently would require that all data fetches for rules (whenever a rule needs to dip into the data source pool) be passed through a filter that could perform the database fetch across the entire cohort if it not had already been done before passing the individual patient's result back to the rule.

TODO

Quick Hits

  1. Make sure Result is as lightweight as possible — e.g., maps, arrays, and other internal properties should not be initialized until needed. For example, instead of using <tt>myMap.get(...)</tt>, use <tt>getMyMap().get(...)</tt>
  2. Remove the eval(PatientSet, LogicCriteria) method from the Rule interface and, instead, process patient sets by looping over eval(Patient, LogicCriteria)
  3. Convert Duration to int for Rule and LogicDataSource getDuration() methods
  4. Provide public access to LogicDataService (see archive:discussion above)
  5. Implement Rule references (see archive:discussion above)

Bigger Projects

  1. Build a LogicRequest object to handle buffering and context-specific settings (index date and global arguments) at any level — consider using LogicContext instead of LogicRequest, since "context" is more appropriate for the role of this object
  2. Make sure LogicRequest is as lightweight as possible — e.g., maps, arrays, and other internal properties should not be initialized until needed. For example, instead of using <tt>myMap.get(...)</tt>, use <tt>getMyMap().get(...)</tt> where the private getMyMap() method initializes the map if it is null.
  3. Add indexDate property to LogicRequest
  4. Add globalArguments as a <tt>Map<String,Object></tt> within LogicRequest to allow for arguments to be assigned globally (scoped to the current request)
  5. Make a constructor for LogicRequest that takes another LogicRequest. We can use this "stacking" or "wrapping" of requests to handle changes to the index date during evaluation of a rule.

    </body></html>