We opted not to do direct upgrades on the primary database host. Instead, we would promote a MySQL 8.0 replica to primary through a graceful failover performed with Orchestrator.
MySQL supports replication from one release to the next higher release but does not explicitly support the reverse (MySQL Replication compatibility).
所以他們只能在 staging 上演練看看,找出會炸掉的東西,然後得提前先修改完:
When we tested promoting an 8.0 host to primary on our staging cluster, we saw replication break on all 5.7 replicas.
另外一方面,在文章開頭的地方也有提到利用 CI 事先找出問題:
We added MySQL 8.0 to Continuous Integration (CI) for all applications using MySQL. We ran MySQL 5.7 and 8.0 side-by-side in CI to ensure that there wouldn’t be regressions during the prolonged upgrade process. We detected a variety of bugs and incompatibilities in CI, helping us remove any unsupported configurations or features and escape any new reserved keywords.
用這些方法儘量把問題圍堵找出來,而真的遇到在 production 上的問題時,應該是看情況來決定要不要 rollback 回 5.7 整包重來?
12:33:12 – Pushed the credentials to GitHub
12:34:19 – The AWSCompromisedKeyQuarantineV2 policy is attached to the IAM user test-user by AWS
12:34:32 – Various List and Describe calls are made using the leaked credentials
12:35:08 – Received an email from AWS with the subject ‘ACTION REQUIRED: Your AWS Access Key is Exposed for AWS Account 12345678’
I set up a project[1] to automatically leak AWS secrets published to the Python package index, which then triggers the secret scanning process and quarantines the keys[2]
作者群想要偵測 GitHub 上面 fake star 的行為,所以就跑去找黑市買,然後找到了兩家,Baddhi Shop (1000 個 $64) 與 GitHub24 (每個 €0.85,大約是 $0.91),價錢差異很大,「品質」差異也很大:貴的 star 在一個月後還是存在,而便宜的看起來有一些有被 GitHub 偵測到而清除掉:
A month later, all 100 GitHub24 stars still stood, but only three-quarters of the fake Baddhi Shop stars remained. We suspect the rest were purged by GitHub’s integrity teams.
接下來就是想要系統化分析,切入點是 GH Archive 這個服務,可以直接下載 GitHub 全站上的 public evnets 資料:
GH Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.
GitHub’s scale is truly a unique challenge. When we first deployed Elasticsearch, it took months to index all of the code on GitHub (about 8 million repositories at the time). Today, that number is north of 200 million, and that code isn’t static: it’s constantly changing and that’s quite challenging for search engines to handle. For the beta, you can currently search almost 45 million repositories, representing 115 TB of code and 15.5 billion documents.
目前是 32 台機器,沒有特別提到記憶體大小,也沒有提到 replication 之類的數字:
Code search runs on 64 core, 32 machine clusters.
然後各種 inverted index 與各種資料在壓縮後只有 25TB:
There are some big wins on the size of the index as well. Remember that we started with 115 TB of content that we want to search. Content deduplication and delta indexing brings that down to around 28 TB of unique content. And the index itself clocks in at just 25 TB, which includes not only all the indices (including the ngrams), but also a compressed copy of all unique content. This means our total index size including the content is roughly a quarter the size of the original data!
裡面有提到這個功能當初推出來的時候是個好玩的性質,但意外的在上線後發現也讓一些老系統可以比較容易轉移:也就是讓 developer 可以先開始用 Git,但 CI 類的工具可以先不用改。
As one of the GitHub cofounders and the brainchild of this particular feature, I want to let everyone know that this is maybe the funniest thing I've ever done.
We released this feature and published the announcing blog post, on April Fool's Day, 2010. I remember demoing it to the other GitHub guys and saying how funny it would be if we made this an April Fool's day post as though it was a big stupid joke but then it actually completely worked on every repository we had and we all thought it would be great. Until nobody believed us. Which in hindsight we should have seen coming, since that was the joke, but nobody actually tried it. Then people tried it and it worked and they thought it was a trick or something.
It was really helpful for people migrating from legacy SVN based systems to us (CI and stuff) but I'm surprised to some degree that it's still running 13 years later when nobody is really facing that issue anymore. And I'm still undecided if the joke was worth the massive confusion it caused. But if I'm pressed, I would say that I would 100% release it on April Fool's Day again.
In February we enrolled all maintainers of the top-100 packages on the npm registry in mandatory 2FA, and in March we enrolled all npm accounts in enhanced login verification. On May 31, we will be enrolling all maintainers of the top-500 packages in mandatory 2FA.