On April 13, 2026, between 05:49 and 06:29 UTC, customers experienced failures when attempting to log in, sign up, reset passwords, and complete multi-factor authentication flows across Atlassian cloud products. Approximately 90% of authentication requests failed during the peak impact window, affecting users in the US East and EU regions. The incident was mitigated within 40 minutes through manual intervention, and full service was restored by 06:29 UTC.
This incident had several contributing factors that combined to produce a failure that the system could not recover from without manual intervention.
The primary cause was a recently enabled change that caused our authentication infrastructure to retry requests to a downstream identity service when those requests were slow to respond. This retry behaviour was rolled out to 100% of traffic earlier the same day. Under normal conditions this would be benign, but it meant that any slowness in the downstream service was amplified. Since multiple upstream services were also independently retrying their own failed requests, the amplification compounded further into a retry storm.
The trigger was a burst of legitimate user traffic. A pattern of many parallel link preview requests for a single user caused a concentrated load spike on a downstream identity service, pushing its response times above the retry threshold. On its own, this kind of spike had occurred many times before and always recovered. With the retry amplification now in effect, the spike instead created a runaway feedback loop: slow responses caused retries, retries increased load, increased load caused slower responses, preventing recovery.
The incident was mitigated by manually scaling up the downstream identity service to provide sufficient capacity to absorb the amplified load. Once scaled, the service recovered immediately, bringing authentication error rates to zero within one minute.
REMEDIAL ACTIONS PLAN & NEXT STEPS
We are taking the following actions designed to prevent recurrence and improve our resilience:
We apologize to customers whose services were interrupted by this incident and we are taking immediate steps to improve the platform’s reliability.
Thanks,
Atlassian Customer Support