這次 Jira 雲端版相關的服務炸鍋的情況 (還在進行中...)

Atlassian 最近好像把 Jira 雲端版相關的服務給炸了,本來想等到差不多告一段落再來看看發生什麼事情,直到看到這則推說預估還要兩個星期,看起來還是先寫下來好了,不然會忘記...:

在「Multiple sites showing down/under maintenance」這邊可以看到從清明節開始炸,到昨天的報告裡面可以看到受到影響的客戶裡面他們只恢復了 35%:

A small number of Atlassian customers continue to experience service outages and are unable to access their sites. Our global engineering teams are working 24/7 to make progress on this incident. At this time, we have rebuilt functionality for over 35% of the users who are impacted by the service outage, with no reported data loss. The rebuild stage is particularly complex due to several steps that are required to validate sites and verify data. These steps require extra time, but are critical to ensuring the integrity of rebuilt sites. We apologize for the length and severity of this incident and have taken steps to avoid a recurrence in the future.

Posted 19 hours ago. Apr 11, 2022 - 08:27 UTC

所以炸掉一個禮拜後大概恢復 1/3,所以的確官方預估還需要兩個禮拜應該差不多?另外在 Hacker News 上也有炸鍋的討論:「Atlassian products have been down for 4 days (atlassian.com)」。

另外在 The Register 上也有一系列的報導,裡面透漏的比官方的更多:「Atlassian Jira, Confluence outage persists two days on」、「Atlassian outage lingers, sparking data loss fears」、「Day 7 of the great Atlassian outage: IT giant still struggling to restore access」、「At last, Atlassian sees an end to its outage ... in two weeks」。

第一篇的副標題有提到原因:

'Routine maintenance script' blamed for derailed service for unlucky customers

第二篇則是提到大約 400 個客戶受到影響:

We were also told that the incident affects a relatively small number of Atlassian customers: about 400. That's only 0.18 per cent of the company's 226,000 customers, which isn't much consolation to the several hundred who still can't access their data.

之後再回頭來看所謂的 routine maintenance script 是什麼好了...