Etsy 介紹的 Cache Smearing

Etsy 的 engineering blog 上提到了他們怎麼設計 cache 機制:「How Etsy caches: hashing, Ketama, and cache smearing」。

使用 consistent hash 已經是基本款了,文章裡花了一些篇幅介紹為什麼要用 consistent hash。

後半段則是有了 consistent hash 後會遇到的問題,也就是講 hot key 怎麼處理:有些資料非常熱 (常常被存取),就算用 consistent hash 也還是有可能搞爆單一機器。

他們做了幾件事情,第一件事情是設計 cache smearing 機制,把單一資料加上 random key,使得不同的 key 會打散到不同的機器上:

Let’s take an example of a hot key popular_user_data. This key is read often (since the user is popular) and is hashed to pool member 3. Cache smearing appends a random number in a small range (say, [0, 8)) to the key before each read or write. For instance, successive reads might look up popular_user_data3, popular_user_data1, and popular_user_data6. Because the keys are different, they will be hashed to different hosts. One or more may still be on pool member 3, but not all of them will be, sharing the load among the pool.

第二件事情則是監控哪些 key 比較熱門:

We’ve seen this problem many times over the years, using mctop, memkeys, and our distributed tracing tools to track down hot keys.

第三件事情是維護 hot key 的清單 (不是每個 key 都會上 cache smearing):

We manually add cache smearing to only our hottest keys, leaving keys that are less read-heavy efficiently stored on a single host.

是個當規模大到單一 hot key 會讓單台伺服器撐不住時的 workaround...

Leave a Reply

Your email address will not be published. Required fields are marked *