Google SRE 團隊整理出過去二十年的十一條心得

Google 的 SRE 團隊整理出過去二十年的心得,當看故事的心態在看的:「Lessons Learned from Twenty Years of Site Reliability Engineering」,在 Hacker News 上也有討論:「Lessons Learned from Twenty Years of Site Reliability Engineering (sre.google)」。

裡面的項目大多都會在公司成長時不斷的導入,都是夠大就會遇到的。

比較有趣的是第六條,這是唯一一條全部都用大寫字母列出來的:

COMMUNICATION CHANNELS! AND BACKUP CHANNELS!! AND BACKUPS FOR THOSE BACKUP CHANNELS!!!

到 Google 這個規模的架構,這邊就會規劃找完全獨立於 Google 架構的方案來用;我猜應該是傳統的 colocation 機房 (像是 AT&T 之類的),上面跑 IRC server 之類的?

在 Hacker News 上面也有其他人提到 Netflix 也有類似的規劃,需要有一個備援的管道是完全獨立於 AWS 的;另外同一則 comment 裡也有提到 Reddit 的作法是在辦公室裡面放 IRC server 備援:

Yes! At Netflix, when we picked vendors for systems that we used during an outage, we always had to make sure they were not on AWS. At reddit we had a server in the office with a backup IRC server in case the main one we used was unavailable.

IRC 還是很好用的 XD

目前 Reddit 的替代方案

看到「sub.rehab · Find your next diving spot」這個頁面,在整理目前 Reddit 社群的其他出處。

從目前的資料看起來,Lemmy 應該是主要方案,有些可能自架,但蠻多人就是跑去找一個 instance 掛?

第二多的是轉移到 Discord 上,這點蠻特別的...

而因為 Discord 的封閉性,也看到了「Answer Overflow - Index Your Discord Server Channels Into Google」這種服務,可以把 Discord 的內容轉成 html 頁面,讓搜尋引擎可以讀到內容。

所以這波 Reddit 決定來硬的到底會不會成呢...

最近 Reddit 打算大幅調漲 API 費用而進行中的故事

台灣用 Reddit 用的比較少,但在歐美算是超級大站。雖然是 2005 年成立的,但主要是在 2010 年的 Digg 災難後興起。

大幅調漲 API 費用使得目前的 3rd-party client 都無法負擔,像是 Apollo 就需要付 $20M/y 的費用:「Popular Reddit App Apollo Would Need to Pay $20 Million Per Year Under New API Pricing」。

最近的這包事件還在進行,在「Reddit#2023 API changes」這邊有些整理,目前看起來 CEO 是鐵了心要收這筆費用,看起來管理階層覺得不像 2010 年的時候,這次 user 沒有其他地方可以跑?

目前在「List of Active Reddit Alternatives v8」這邊有看到有人整理出來,但不確定這些站能不能撐住這波的流量...

另外補上最近 Reddit 的裁員消息可以交叉看:「Reddit to lay off about 5% of its workforce」。

透過 ControlNet 產生出來的 QR code

Hacker News Daily 上看到「Redditor creates working anime QR codes using Stable Diffusion」這個,Reddit 的原討論串在「ControlNet for QR Code」這邊。

透過 ControlNet 產生出來的 QR code 超出一般對客製化 QR code 的想像... 這邊就不解釋直接上圖了:





Netflix 在 2013 年 Open Connect Appliances

Reddit 上的原文在「So I got a Netflix cache server...」這邊,但看起來作者自己刪掉內容了 (可能是被接觸要求刪掉?),可以看 Internet Archive 上的「20221026080226」,以及報導「How a Redditor Ended Up With an Industrial-Grade Netflix Server」。

所以是 Netflix 退役的機器,看起來適合法取得的?

I work for a large ISP, and we are retiring/replacing quite a few 2013 era Netflix OCA caches, and I was offered one. Of course, I couldn't say no 😅

資料當然是被清過的:

I knew that Netflix had wiped them all in the decommissioning process, that they ran FreeBSD, that they were crammed full of drives, and that's about it.

然後這台 2013 年的機器以現在的角度來看也算很大台,尤其是看到硬碟的部份是 36 顆 HGST 的 7.2TB 硬碟:

36x 7.2TB 7200RPM HGST's

再加上 4 個 10Gbps 的界面可以接:

One 4x 10G SFP+ NIC

作者後來裝了 TrueNAS 來用,就這些規格資料看起來的確是個很適合當 NAS:

I expected some resistance when trying to install an OS, but it was already set to boot from USB and took a TrueNAS install like a champ!

但不確定會吃多少電,放在家裡用還是得考慮這點... 不然就是當紀念品收起來。

Reddit 的前 EM 提到 Reddit 用的 Dark Pattern

Hacker News Daily 上看到「Reddit's disrespectful design」這個,在講 Reddit 用到的 dark pattern, 不過我覺得更棒的是在 Hacker News 上的討論:「Reddit’s disrespectful design (ognjen.io)」。

討論裡面出現了之前在 Reddit 裡面負責這塊的 EM (engineer manager),wting

I was the EM for Reddit's Growth team around this time. I am responsible for / contributed to a few features like the current signup flow, AMP pages, push notifications, email digests, app download interstitials, etc.

然後他提到他當初拒絕實做其中一些 dark pattern,然後在經過 [0] 的事情後,他決定換團隊:

There was a new product lead who joined with many good ideas, but some of them were dark patterns that I heavily protested. After a few months of this, it was obvious that I was going to be reigned in or let go[0]; I immediately transferred to a different org.

其中 [0] 的部份是,公司把他 reporting line 改成到 product lead XDDD

0: They changed it so I would report to the product lead, which is odd for an EM to report into a product chain and the only instance within the company ever.

另外他也講了一個事情,就是強姦使用者是有用的,而且超級有用:

Now let me explain the other side of the story. 4 years later, Reddit's DAU, MAU, and revenue have all grown at ridiculous rates[1]. Yes, power users complain—and still continue using the site—but the casual user does not. These dark patterns have been normalized on other websites.

These practices are done because it works.

1: Many friends are startup founders and I've been at a few startups myself—a byproduct of being in the Bay Area—and Reddit's growth numbers are impressive. As a former employee, I am quite happy about my equity growth.

這幾乎是共識了,dark pattern 看起來很邪惡沒錯,但他就是有用...

Startpage 被廣告公司收購

Hacker News 上看到 Reddit 上的消息 (看起來有陣子了):「Startpage is now owned by an advertising company」。

Startpage 算是之前有在用的 default search engine,但發現有很多 bug 後就不太用了。目前還是先設 DuckDuckGo,然後在需要的時候用之前寫的 press-g-to-google-duckduckgo 切到 Google 去找...

DuckDuckGo 還是有搜尋品質的問題...

各種對 AWS Managemenet Console 的抱怨...

Hacker News Daily 上看到 Reddit 上面有一篇對 AWS Management Console 的抱怨文,差不多是兩個月前開始累積的:「I am stupefied every day by the awfulness of the AWS web console」。

AWS 的主力開發因為是以 API 為主,而 AWS Management Console 能做的事情一直都少蠻多的 (看起來是一個團隊在開發,然後呼叫 API),而且的確是常常中 bug,所以會有這樣的抱怨其實不太意外...

然後就有人放火了:

[–]canadian_sysadmin 24 points 2 months ago
I see you've never used Azure...

[–]myron-semack 18 points 2 months ago
AWS’s console sucks because they don’t give a damn about UI. They are API-first.

Azure’s console sucks because they tried to make it nice but failed.

[–]ryantiger658 5 points 2 months ago
I was scrolling looking for this comment. Azures interface has made me appreciate AWS even more.

Azure 被偷戳了好幾下 XDDD 然後 GCP 也被偷戳了:

[–]edgan 1 point 2 months ago
It could br better, but it is far better than than Azure and GCP. Azure's old one was better than their new beta interface last I saw it. GCP has some interesting ideas, but the side bar centric design doesn't function well. It also tries to do too much, and is too JavaScript-y happy.

通常用 AWS 自己的 CloudFormation 或是第三方的 Terraform 管理還是比較常見的方式 (基於 Infrastructure as code 的概念),而 AWS Managemenet Console 當作是輔助,因為目前的雲端服務在設計上的確是希望你多用 API...

iPhone 換電池恢復效能的事情傳到 Geekbench 後...

在「iPhone 的電池與效能」這篇提到了 iPhone 換電池可以恢復效能,結果 Geekbench (也就是原來在 Reddit 上抱怨的人用的測速軟體) 的 John Poole 從 Geekbench 的回報資料庫裡分析了資料,發現了特別的現象後寫下這篇文章 (於是後來引發一連串報導,以及 Apple 的 PR 事件):「iPhone Performance and Battery Age」。

他先拿 iPhone 6S 分析,這看起來就不太妙:

再拿 iPhone 7 的資料分析,就更確定不妙:

可以看到 iOS 的 10.2.1 與 11.2.0 有奇怪的效能集中區。

後續蘋果也確認會刻意降速:「Apple addresses why people are saying their iPhones with older batteries are running ‘slower’」。

然後最新的發展就不太意外了,開始要打架了:「Days after iPhone battery fiasco, lawsuits against Apple begin to mount」。

接下來是耶誕假期,應該要等明年才會有新消息了...

reddit 與 4chan 在新聞網路上的獨特性

在「Study finds fringe communities on Reddit and 4chan have high influence on flow of alternative news to Twitter」這邊看到的:

After analyzing millions of posts containing mainstream and alternative news shared on Twitter, Reddit and 4chan, Jeremy Blackburn, Ph.D., and collaborators found that alt-right communities within 4chan, an image-based discussion forum where users are anonymous, and Reddit, a social news aggregator where users vote up or down on posts, have a surprisingly large influence on Twitter.

依照對 reddit4chan 的描述,這兩個媒體對 Twitter 的影響,會讓我聯想到在台灣 Ptt 對各新聞媒體的影響:Ptt 是很多新聞的起點?

"Based on our findings, these smaller, fringe communities on Reddit and 4chan serve as an incubation chamber for a lot of information," said Blackburn, assistant professor of computer science in the UAB College of Arts and Sciences. "The content and talking points are refined until they finally break free and make it to larger, more mainstream communities."

真的研究應該可以看出 Ptt 的影響力?