App was unavailable
Incident Report for Asset Management for Jira
Postmortem

Incident summary

On March 7th, 2022, Asset Management for Jira was unavailable for the majority of customers between 10 am UTC to 3:30 pm UTC.

Users who tried to access the product encountered a loading screen and were unable to access any of the functionality. We want to share what happened, why it happened, and what we're doing to ensure it never happens again.

What happened?

We made a change to fix a bug that was deployed to production on March 7th, 2022 at 2 am UTC.

The change itself was small and did not cause the issue. However, recent changes in Atlassian's Connect framework to improve security practices went live recently. This caused new authentication tokens to be generated for all customers that tried to access the app.

This change wasn’t communicated to us and as a result we didn't save the new authentication tokens. We were still relying on the older authentication tokens to verify customers and this caused the app to fail when customers tried to access the app.

The bug was also missed by our testing and alerting systems and was reported to us by our customers via the ticketing system.

Resolution

The bug was identified and a fix was pushed at 3 pm UTC.

However, resolution also required manual intervention on part of the customers since they had to upgrade their app to ensure we generated new authentication tokens that could be used to access the app correctly.

Lessons learned and future prevention

  1. Our alerting systems didn't catch the problems. Since any updates to the descriptor are rolled out in batches by Atlassian over the period of a day, the testing we did after deploying the change was insufficient. Going forward we will let changes bake in our staging environment for at least one day to make sure there are no issues before promoting to production.
  2. We currently lack automated browser based testing for this particular scenario. We will invest in improving our testing automation to make sure the authentication flow isn't broken.

On behalf of our team, we apologize for the inconvenience this outage has caused. We have learned from this experience and are already implementing the safeguards to ensure it will not happen again.

Posted Mar 08, 2022 - 01:56 PST

Resolved
Asset Management for Jira app outage
Posted Mar 07, 2022 - 01:00 PST