Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

It seems that there are three big themes to be picked up next in this space:

  1. Improve monitoring for this area

    Cleaning up tasks from the ID and Atlassian suite migration

  2. Delete older machines. I expect jira, confluence, id and crowd to not be needed anymore. That also includes their databases and database storage. That would include as well removing ansible code, archiving ID repo, the lot

  3. Update documentation related to how we do our ID now: OpenMRS ID

  4. migration

    Jira Legacy
    serverSystem Jira
    serverIddd5f38d5-f8f4-3f41-9551-b59f4841491c
    keyITSMOLD-4325

    1. Jira Legacy
      serverSystem Jira
      serverIddd5f38d5-f8f4-3f41-9551-b59f4841491c
      keyITSMOLD-4328
      : a requirement to sunset

    2. Jira Legacy
      serverSystem Jira
      serverIddd5f38d5-f8f4-3f41-9551-b59f4841491c
      keyITSMOLD-4324
      : if ldap openldap is choosen to be kept, we will need to upgrade it

    3. Jira Legacy
      serverSystem Jira
      serverIddd5f38d5-f8f4-3f41-9551-b59f4841491c
      keyITSMOLD-4231
      : if LDAP openlap is chosen to stay, we need to get those certificates to automatically restart the container in a way that will actually pick the new certificate. We may go with a cron task, may be easier than the letsencrypt hook. Potentially the ldap upgrade will help here

    4. Verify the future of login of https://atlas.openmrs.org, that used to use our old ID. This system may be considered for sunset as well

    5. Delete older machines. I expect jira, confluence, id and crowd to not be needed anymore. That also includes their databases and database storage. That would include as well removing ansible code, archiving ID repo, the lot

    6. Update documentation related to how we do our ID now: OpenMRS ID

    7. Improve monitoring for this area

  5. Bamboo stability

    1. bamboo is likely due to an upgrade! It may be a huge one

    2. We may look if we can change any configuration on predator to make builds more stable https://marketplace.atlassian.com/apps/1212736/predator-plugin-for-bamboo?tab=overview&hosting=server

    3. Jira Legacy
      serverSystem Jira
      serverIddd5f38d5-f8f4-3f41-9551-b59f4841491c
      keyITSMOLD-4322
      : Bamboo server and agents seem to run out of disk every so often due to logs. We can look at logrotate carefully

    4. Jira Legacy
      serverSystem Jira
      serverIddd5f38d5-f8f4-3f41-9551-b59f4841491c
      keyITSMOLD-4316
      : Bamboo backups may not be working as desired

    5. Check monitoring for those instances. Do they need more resources? Check build waiting time, do you need more agents? If we are willing to pay AWS, we could have elastic agents (not sure if desired)

    6. Any other build improvements for reliability needed

  6. Monitoring love

    1. Datadog seems to be notifying things non stop. Do some machines need some more memory/CPU/disk? Do we need to do some cleanup?

    2. Jira Legacy
      serverSystem Jira
      serverIddd5f38d5-f8f4-3f41-9551-b59f4841491c
      keyITSMOLD-4149
      : having datadog monitoring for unhealthy containers could be rather useful. For unhealthy containers, we may want to automatically restart them or something

    3. Jira Legacy
      serverSystem Jira
      serverIddd5f38d5-f8f4-3f41-9551-b59f4841491c
      keyITSMOLD-4319
      : do all machines that need backup are deploying it as expected? Do we have good monitoring for them?

    4. Jira Legacy
      serverSystem Jira
      serverIddd5f38d5-f8f4-3f41-9551-b59f4841491c
      keyITSMOLD-4228
      Pingdom is a paid integration. If our datadog isn’t as noisy anymore, we could potentially replace it altogether with slack, unless we actually want any folks on call

...

  • Jira Legacy
    serverSystem Jira
    serverIddd5f38d5-f8f4-3f41-9551-b59f4841491c
    keyITSMOLD-4317
    : this is worth testing now the Jetstream got upgraded. It may not be relevant anymore and could potentially be reverted

  • Jira Legacy
    serverSystem Jira
    serverIddd5f38d5-f8f4-3f41-9551-b59f4841491c
    keyITSMOLD-4144
    : potentially etherpad isn’t used anymore. On that case, archive the card and change docs to reflect that

  • Jira Legacy
    serverSystem Jira
    serverIddd5f38d5-f8f4-3f41-9551-b59f4841491c
    keyITSMOLD-4143
    : potentially not used anymore. On that case, archive the card and change docs to reflect that OpenMRSBot IRC Bots / https://github.com/openmrs/openmrs-contrib-itsmresources/wiki/Service-Chat-bots

  • Jira Legacy
    serverSystem Jira
    serverIddd5f38d5-f8f4-3f41-9551-b59f4841491c
    keyITSMOLD-4318
    It’s always necessary

  • Upgrading ansible

  • Jira Legacy
    serverSystem Jira
    serverIddd5f38d5-f8f4-3f41-9551-b59f4841491c
    keyITSMOLD-4075
    : not every DNS entry is on terraform, which is fine. But as you find more of those, you can add them into IaC

  • Reach out to centralised log services (e.g. splunk, sumologic, datadog) and ask if they’d be willing to provide us with an open source licence (preparation for future

    Jira Legacy
    serverSystem Jira
    serverIddd5f38d5-f8f4-3f41-9551-b59f4841491c
    keyITSMOLD-3930
    )

Suggested

  • Upgrading terraform (i.e., our infra was built with 0.12.31, current version is 1.9.3) lest we hit a point where functionality starts breaking and we’re forced to upgrade quickly.

  • Making our SSO more robust (we’ve managed to get KeyCloak working, but I’m not sure we’re fully divested from the old OpenMRS ID code, don’t have clear/easy mechanisms for granting permissions, don’t have an easy ± (semi-)automated way to mark accounts as spam

  • Better supporting our dev and CI processes (e.g., make sure devs reliably have the environments they need, fix issues with permissions getting messed up via docker on CI, etc.)

And there’s a long tail of other nice-to-haves (e.g., fixing things that are occasionally breaking like LetsEncrypt upgrades), though some of those might get fixed with upgrades.

Resources:

...