用示波器看 UDP 封包...

Hacker News Daily 上看到「From Oscilloscope to Wireshark: A UDP Story」這篇講怎麼用示波器挖出 UDP 封包的方法。不是用邏輯分析儀,而是用示波器...

上面示波器的圖片可以查到作者是用 Tektronix6 Series MSO,看起來停賣了,但類似的型號應該是百萬等級...

所以真的是打算從 L1 層一路解到 L4 層:

The rest of this post will take us from these raw voltage waveforms all the way to decoded UDP packets. Hold on tight, we're going from L1 all the way to L4.

網路設備是 VSC7448 這顆 52-port 10Gbps switch:

不過這邊提到一秒可以打 30K 的 UDP 封包出來,對於這台 switch 應該是沒滿才對,而且速度上應該也不到 10Gbps,加上作者提到的是 QSGMII,有可能是跑 1Gbps 的速度在抓:

The oscilloscope doesn't have a built-in QSGMII analyzer (and we'll want to do fairly sophisticated processing of the data), so I wanted to export waveform data to my computer.

I knew that a device on the network was emitting about 30K UDP packets per second, or one packet every 33 µs. I configured the oscilloscope to collect 100M samples at 1 TSPS (tera-sample per second, 1012), which multiplies out to 100 µs of data; this means we should catch 1-3 UDP packets.

看起來只是抓個意思意思練練手而已,抓 1 到 3 個 UDP 封包。

後面就是一堆數學處理... 看起來前面有一小段程式碼是 Python,但後面的程式碼有人知道是什麼語言嗎?

超快速的 Base64 encoding/decoding 實做

看到「Base64 encoding and decoding at almost the speed of a memory copy」這個,可以超級快速編解碼 Base64 的資料。

實做上是透過 IntelAVX-512 加速,在資料夠大的情況下 (超過 L1 cache 的大小),可以達到接近字串複製的速度 (這邊提到的 memcpy()):

We show how we can encode and decode base64 data at nearly the speed of a memory copy (memcpy) on recent Intel processors, as long as the data does not fit in the first-level (L1) cache. We use the SIMD (Single Instruction Multiple Data) instruction set AVX-512 available on commodity processors. Our implementation generates several times fewer instructions than previous SIMD-accelerated base64 codecs.

不過這樣 AMD 暫時要哭哭...

Working Set Size (WSS) 的想法

NetflixBrendan Gregg (他比較知名的發明是 Flame Graph) 寫了一篇「How To Measure the Working Set Size on Linux」,他想要量測單位時間內會用到的記憶體區塊大小:

The Working Set Size (WSS) is how much memory an application needs to keep working. Your app may have populated 100 Gbytes of main memory, but only uses 50 Mbytes each second to do its job. That's the working set size. It is used for capacity planning and scalability analysis.

這可以拿來分析這些應用程式是否能夠利用 L1/L2/L3 cache 大幅增加執行速度,於是就可以做成圖,像是這樣:

在 Netflix 這樣人數的公司,需要設計一些有用的指標,另外發展出對應的工具,讓其他人更容易迅速掌握狀況,畢竟不是每個人都有上天下海的能力,遇到狀況可以馬上有頭緒進行 trouble shooting...