看 Hacker News 上面文章變化的網站:Quality News

在『蒐集 Hacker News 上被不合理條件「消失」的連結』這篇看到的網站,定時去抓 Hacker News 上新的連結與首頁上的連結後整理出來的網站:「Quality News」。

像是現在在 top 1 的「John Walker, founder of Autodesk, has died (fourmilab.ch)」這篇,對應的資訊頁就在「Hacker News Story Stats: John Walker, founder of Autodesk, has died」這邊,抓個截圖:

這樣有些文章覺得奇怪時可以交叉看一下資料,就會對 Hacker News 上面的「操作」有些「感覺」了...

蒐集 Hacker News 上被不合理條件「消失」的連結

在「Stories removed from the Hacker News Front Page, updated in real time (github.com/vitoplantamura)」這邊看到的計畫,專案在 GitHub 上面:「vitoplantamura/HackerNewsRemovals」。

蒐集的條件是假設如果本來在第一頁上面 (top 30),不應該下一分鐘就調出 top 90:

The assumption is that a Story cannot go from the top 30 to a position higher than 90 in a single minute, without having been explicitly removed.

dangid=39231821 有解釋有些情況是會直接被 downrank 出去的:

The first two you listed were downranked by the flamewar detector. The last one was downranked by users. Admins didn't touch any of them.

其實 Hacker News 混久了大家心裡也都有底,這兩個很 buggy 的機制就是要夠 buggy 才能操弄,這點從某些特定主題 (像是 climate change 相關的) 會突然消失,然後被推給這兩個系統...

所以 Hacker News 被說黑箱也不是一天兩天了,隔壁 Lobsters 至少有 Moderation Log 可以看...

現在這樣可以撈出其他有用的連結來看了,另外應該也可以讓他變成類似 Hacker News Daily 的方法才對,每天整理出來後變成一篇 blog post,這樣可以訂起來看,來提議給作者看看好了?

Hacker News 目前搬上 Cloudflare

Hacker News 前陣子似乎是因為被打而轉移到 Cloudflare 上:

;; ANSWER SECTION:
news.ycombinator.com.   1       IN      CNAME   news.ycombinator.com.cdn.cloudflare.net.
news.ycombinator.com.cdn.cloudflare.net. 300 IN A 172.67.5.232
news.ycombinator.com.cdn.cloudflare.net. 300 IN A 104.22.6.236
news.ycombinator.com.cdn.cloudflare.net. 300 IN A 104.22.7.236

另外可以參考「Site report for http://news.ycombinator.com」這邊的記錄,我另外備份一份放在 archive.today 上:「Site report for http://news.ycombinator.com」。

可以看到先前是指到 209.216.230.240,然後今年 (2024) 年初的時候似乎是因為 DDoS 的關係,在骨幹上被 blackhole 掉,後來就轉到 Cloudflare 上了。

一個比較意外的是看到報告說是 FreeBSD,等之後切回去再來研究看看?

Cloudflare 的 WAF 在科技類的網站容易誤判,像是「Ask HN: Does Cloudflare block HN comments if you have code blocks in a reply?」這邊就遇到了,可以在留言裡面看到大家在研究要怎麼繞 WAF XDDD

等到切回去應該就會恢復,不過還在 Cloudflare 上的這陣子應該會繼續看到抱怨...

前陣子 Hacker News 很慢的一些背景知識

看到 Ask HN: Is Hacker News slow for anyone else? 這邊的討論,dang (Hacker News 的管理員) 在 35157344 這邊就有出來說明:

All: our poor server is smoking today* so I've had to reduce the page size of comments. There are 1500+ comments in this thread but if you want to read more than a few dozen you'll need to page through them by clicking the More link at the bottom. I apologize!

Also, if you're cool with read-only access, just log out (edit: or use an incognito tab) and all will be fast again.

* yes, HN still runs on one core, at least the part that serves logged-in requests, and yes this will all get better someday...it kills me that this isn't done yet but one day you will all see

另外比較特別的是,Hacker News 是用 Arc (Lisp) 寫的,不過看起來沒有考慮到 optimization,加上那天 Reddit 也掛了,的確帶動 Hacker News 這邊更新的頻率比較高...

用 GPT-3 產生 Hacker News 上熱門文章的摘要

看到「Autosummarized HN」這個工具,算是一個組合技的應用:

All summaries have been generated automatically by GPT-3. No responsibility is claimed for their contents nor its accuracy.

透過 GPT-3 解讀並產生出摘要,目前頁面上是沒有 RSS feed,但可以透過一些工具直接拉出來 (像是 PolitePol),然後就可以掛到 Slack 或是 RSS reader 裡面...

Hacker News 前幾天炸很久的 root cause

前幾天 Hacker News 炸了很久,如果是從 Twitter 上的資料來看,是從 2022/07/08 14:08 UTC 這篇:

中間還原失敗 (2022/07/08 17:35 UTC):

到最後恢復 (2022/07/08 20:48 UTC):

Twitter 這邊的資料看起來差不多是六個小時多,以一個應該是只有 database 需要還原的站台來說的確是蠻久的,所以後續在「HN is up again」這邊就有在討論原因,裡面 HN 的老大 dang 也有提到 downtime 是七個小時多:

8 hours of downtime, but not data loss, since there was no data to lose during the downtime.

Last post before we went down (2022-07-08 12:46:04 UTC): https://news.ycombinator.com/item?id=32026565

First post once we were back up (2022-07-08 20:30:55 UTC): https://news.ycombinator.com/item?id=32026571 (hey, that's this thread! how'd you do that, tpmx?)

So, 7h 45m of downtime. What we don't know is how many posts (or votes, etc.) happened after our last backup, and were therefore lost. The latest vote we have was at 2022-07-08 12:46:05 UTC, which is about the same as the last post.

There can't be many lost posts or votes, though, because I checked HN Search (https://hn.algolia.com/) just before we brought HN back up, and their most recent comment and story were behind ours. That means our last backup on the ill-fated server was taken after the last API update (HN Search relies on our API), and the API gets updated every 30 seconds.

I'm not saying that's a rock-solid argument, but it suggests that 30 seconds is an upper bound on how much data we lost.

另外大家就在找 dang 的回應是什麼 (畢竟是第一手資料),用 Ctrl-F 找一下就看到有趣的猜測,從 32028511 這個節點可以看到這串有趣的討論,首先是 mikeiem

You are never going to guess how long the HN SSDs were in the servers... never ever... OK... I'll tell you: 4.5years. I am not even kidding.

然後是 kabdib 的回應:

Let me narrow my guess: They hit 4 years, 206 days and 16 hours . . . or 40,000 hours.

And that they were sold by HP or Dell, and manufactured by SanDisk.

Do I win a prize?

(None of us win prizes on this one).

接著就是 dang 說他覺得這個猜測很有可能:

Wow. It's possible that you have nailed this.

Edit: here's why I like this theory. I don't believe that the two disks had similar levels of wear, because the primary server would get more writes than the standby, and we switched between the two so rarely. The idea that they would have failed within hours of each other because of wear doesn't seem plausible.

But the two servers were set up at the same time, and it's possible that the two SSDs had been manufactured around the same time (same make and model). The idea that they hit the 40,000 hour mark within a few hours of each other seems entirely plausible.

Mike of M5 (mikiem in this thread) told us today that it "smelled like a timing issue" to him, and that is squarely in this territory.

後續他也從自家的 /newest 裡面撈了相關的資料出來,依照他撈出來的關鍵字,看起來是用 HPE 出的 SSD:

It's also an example of the dharma of /newest – the rising and falling away of stories that get no attention:

HPE releases urgent fix to stop enterprise SSDs conking out at 40K hours - https://news.ycombinator.com/item?id=22706968 - March 2020 (0 comments)

HPE SSD flaw will brick hardware after 40k hours - https://news.ycombinator.com/item?id=22697758 - March 2020 (0 comments)

Some HP Enterprise SSD will brick after 40000 hours without update - https://news.ycombinator.com/item?id=22697001 - March 2020 (1 comment)

HPE Warns of New Firmware Flaw That Bricks SSDs After 40k Hours of Use - https://news.ycombinator.com/item?id=22692611 - March 2020 (0 comments)

HPE Warns of New Bug That Kills SSD Drives After 40k Hours - https://news.ycombinator.com/item?id=22680420 - March 2020 (0 comments)

(there's also https://news.ycombinator.com/item?id=32035934, but that was submitted today)

這次 downtime 看起來很像是中了 SSD firmware bug,目前看起來先搬到 EC2 上面了:

$ host news.ycombinator.com
news.ycombinator.com has address 50.112.136.166
$ host 50.112.136.166      
166.136.112.50.in-addr.arpa domain name pointer ec2-50-112-136-166.us-west-2.compute.amazonaws.com.

看討論串應該是暫時性的?

在 Hacker News 上看到幾個 Key-Value Store 軟體

Hacker News 上看到「Redis vs. KeyDB vs. Dragonfly vs. Skytable」這篇,裡面介紹了四套 key-value store 軟體:

  • Redis:這個應該不太需要介紹...
  • KeyDBSnapchat 搞出來的 Redis clone,主要的賣點是 multi-threading。
  • Dragonfly:宣稱地球上最快,但作者跑不出來,下面的討論有人提到 Dragonfly 在更多的 CPU 資源效能就會更好。
  • Skytable:作者測出來最快的。

裡面看起來都蠻有趣的,可以追起來看看發展的情況,但如果真的要的用的話,應該還是先以 Redis 為主,穩定度以及功能還是重點...

新世紀福音戰士的字型

沒想到在 Hacker News 首頁上看到第一名居然是這個連結:

2019 年的文章:「Neon Genesis Evangelion」,找資料的時候發現有簡體中文版的翻譯:「末世感叩击:《新世纪福音战士》的文字世界」。

這些字型是由日本的 Fontworks 所開發出來的 Matisse EB,在片尾的 credit 也可以看到「株式会社フォントワークスジャパン」:

主要是沒想到會在 Hacker News 首頁上的第一名看到這個...

SendGrid 意外的被幹翻...

看到 Hacker News 上的「Ask HN: Great tools for solo SaaS founders?」這則,在討論有哪些服務好用的,有人提到了 SendGrid 做為 email 發送服務,結果沒想到下面一堆人幹翻 XDDD

蠻多人推薦 Postmark 的,另外有人提到 SparkPost

另外可以看一下「Hacker News Tools of the Trade」,之前要找工具都會往這邊翻翻...

Hacker News 拿到 hackernews.com 了

Hacker News 上看到「Hackernews.com (hackernews.com)」這則消息,有人注意到 hackernews.com 被指到 news.ycombinator.com...

一開始有人猜測只是第三方指過來:

LeoPanthera 1 day ago

With a different registrar to ycombinator.com, this is likely not owned by Y Combinator, and therefore difficult to trust that it won't start being malicious in the future.

不過後來 dang (Hacker News 的管理員) 有出來證實這個網域名稱目前是在他們旗下了:

dang 1 day ago

It's owned by YC now. We got it earlier this year. That's why it redirects to HN!

以 Hacker News 的性質來說不是太重要,算是有機會拿掉就順便拿下來...