distributed – Page 2 – Gea-Suan Lin's BLOG

把 HDFS 的 metadata 丟上 NewSQL

把 HDFS 的效能瓶頸 metadata server 的資料改到 NewSQL 上使得效能大幅提昇：「HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases」。

In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS’ single node in-memory metadata service, with a distributed metadata service built on a NewSQL database.

尤其是在 Spotify 的測試，有 16～37 倍的改善 (應該是指碰到 HDFS 時的這塊，像是從外部拉到 HDFS 上的分析，而非整體的效率改善)：

Metadata capacity has been increased to at least 37 times HDFS’ capacity, and in experiments based on a workload trace from Spotify, we show that HopsFS supports 16 to 37 times the throughput of Apache HDFS.

論文裡面有提到用的是 MySQL Cluster 的 NDB (in-memory)：

HopsFS stores all metadata normalized in a highly available, in-memory, distributed, relational database called Network Database (NDB), a NewSQL storage engine for MySQL Cluster.

這樣應該會讓 Hadoop 的人有改善方向...

Netflix 開發的 Delayed Queue

原來這個叫做 Delayed Queue，難怪之前用其他關鍵字都找不到什麼資料... (就不講其他關鍵字了 XD)

Netflix 發表了他們自己所開發的 Delayed Queue：「Distributed delay queues based on Dynomite」。

本來的架構是用 Cassandra + Zookeeper 來做：

Traditionally, we have been using a Cassandra based queue recipe along with Zookeeper for distributed locks, since Cassandra is the de facto storage engine at Netflix.

但可以馬上想到不少問題，就如同 Netflix 提到的：

Using Cassandra for queue like data structure is a known anti-pattern, also using a global lock on queue while polling, limits the amount of concurrency on the consumer side as the lock ensures only one consumer can poll from the queue at a time.

所以就改放到 Netflix 另外開發的 Dynamite 上：

Dynomite, inspired by Dynamo whitepaper, is a thin, distributed dynamo layer for different storage engines and protocols. Currently these include Redis and Memcached. Dynomite supports multi-datacenter replication and is designed for high availability.

後端是 Redis 與 Memcached 的系統，可以對抗整個機房從 internet 上消失的狀態。

在設計上則是「保證會跑一次」，也就是有可能會有多次的情況，用 Dyno Queues 系統的人必需要考慮進去：

4. At-least-once delivery semantics

雖然整篇講的頗輕鬆，但實際看起來還是很厚重... 暫時還是不會用吧 :o

奇怪的 RFC：Naming Things with Hashes

看到「RFC 6920: Naming Things with Hashes」這個，看日期是 April 2013，就在想是不是四月一號發的... 但內容看起來還頗有用的，有種 distributed web 的味道？文件裡給的範例長這樣：

<html>
 <head>
   <title>ni: relative URI test</title>
   <base href="ni://example.com">
 </head>
 <body>
   <p>Please check <a href="sha-256;f4OxZX...">this document</a>.
     and <a href="sha-256;UyaQV...">this other document</a>.
     and <a href="sha-256-128;...">this third document</a>.
   </p>
 </body>
</html>

目前是 Propsed Standard，所以是怎樣呢...

IPFS 分散式 Web 服務，以及 ipfspics 圖片儲存

IPFS (InterPlanetary File System)，或是被稱作 The Permanent Web。

起因在於目前 HTTP (Web) 在設計時是 1990 年代的想法，許多威脅在當時並不明顯。而到了現在，來自攻擊者的威脅與政府監控的威脅使得必須在 HTTP (Web) 上架構許多 workaround。

最知名的 workaround 就是 HTTPS 以及對應的 CA 架構了，前者因為 HTTPS 協定本身高度複雜，實作的單位經常出錯而產生安全漏洞。而後者靠著大量的稽核檢查來避免出問題，不過畢竟還是 workaround，常常會有一堆「誤發」的狀況發生。

另外 HTTP 發展到現在其實是去中心化「Decenteralized」的架構，政府單位可以抓著其中幾個結點就可以大量監控，而 IPFS 想要做到真正的分散式「Distributed」：

前陣子 IPFS 在 GitHub 上放出了 prototype 讓大家玩：「ipfs implementation in go.」，而最近有人把這個點子實作成 image hosting：「ipfs.pics」(一樣是放在 GitHub 上)，並且提供對應的網頁上傳介面：「Decentralized picture hosting in ipfs」。

我試著丟一張圖片上 ipfs.pics 後，得到的 hash 值是 QmRpNqK33gDDKdu8y6Wx5DQsuiJbsnwojNzH5nUwCpwoS9，也可以在 IPFS 看到這張圖：

來玩看看好了 :o

分散式的論壇系統

在「Kudos - A Peer-to-Peer Discussion System Based on Social Voting」這邊看到分散式的論壇系統，帶有投票分數機制以及相關議題機制：

Decentralized Reddit using a DHT to store content and a blockchain to rank such content. Whitepaper with more details here: http://lucaa.org/docs/kudos.pdf

論文裡面可以看出來設計的觀念受到 Bitcoin 的啟發，演算法也是... 換句話說，Bitcoin 帶來的影響遠遠超過金融市場，Bitcoin 所使用的理論也給其他領域很多想法。

如果這樣的系統可行的話 (還沒仔細研究 @_@)，真正分散式的論壇系統就會出現了...

真正的 Redis Cluster

也是積了很久的文章，Redis 的其中一位老大 Salvatore Sanfilippo 在第一個公開 Redis Cluster 功能的 3.0.0-rc1 版寫下了 Redis Cluster 的發展過程：「Redis cluster, no longer vaporware.」。

MySQL InnoDB 可以保證極強的 ACID 特性，配合 DRBD 這類的 HA 架構，可以保證 server 回了成功後一定不會掉資料。

memcached 則是 Shared nothing architecture，當初設計就是拿來當 cache，資料隨便掉沒關係。

兩者中間還是有很大的空間，而 Redis Cluster 的出現有機會入場看看情況了，不知道能不能在 InnoDB 與 memcached 中間找到適合的點立足。

PayPal 旗下的 Braintree 將支援 Bitcoin 貨幣

Bitcoin 的特性使得各家金流業者被逼著要支援，不然就等著在市場上被淘汰。

既 Stripe 的測試「Stripe: Bitcoin」表態後，龍頭 PayPal 也出手了：「Braintree Brings One Touch Payments to the People and Will Offer Bitcoin to Developers Soon」。

目前是跟 Coinbase 合作：

In addition we announced that Braintree has partnered with Coinbase to offer a trusted and secure way for people to pay with bitcoin across their favorite apps in the coming months.

分散式系統的基礎理論

這篇「Distributed systems theory for the distributed systems engineer」列出了分散式系統的許多理論，以及後來開發的經典應用。

I need to improve my proficiency in distributed systems theory. Where do I start? Any recommended books?

— Gwen (Chen) Shapira (@gwenshap) August 7, 2014

這篇文章是回答 Cloudera 的某位全職員工而寫的，所以避開了太學術性質的論文，而是夾雜比較容易讀的解釋文章，以及維基百科的資料。

文章裡出現很多有聽過但是沒有深入了解過的詞彙。這篇介紹文章可以當作入口點...

Leslie Lamport 拿下 2013 年圖靈獎 (Turing Award)

Leslie Lamport - A.M. Turing Award Winner：

For fundamental contributions to the theory and practice of distributed and concurrent systems, notably the invention of concepts such as causality and logical clocks, safety and liveness, replicated state machines, and sequential consistency.

分散式系統領域的老大與 LaTeX 的發明人...

分散式系統的建言...

「分散式系統」(Distributed System) 是個老詞彙，但跟最近當紅詞彙「雲」、「NoSQL」常常相關。也因此「雲」與「NoSQL」常常遇到的都是分散式系統遇到 (並且討論過) 的問題...

而「Notes on Distributed Systems for Young Bloods」這篇寫的好血淚 XDDD 除了講理論面的東西以外，也把實務面會遇到的問題拿出來講...

首先要先知道「Fallacies of Distributed Computing」，在分散式系統裡，能假設的事情實在太少，要處理的事情太多。而「CAP theorem」也是個必讀的主題，從 Amazon 丟出「Dynamo: Amazon's Highly Available Key-value Store」這篇經典的 paper 後讓更多人知道這個理論。

熟悉上面兩個主題後，接下來就是血淚史... XD

garbage collection pauses make masters “disappear”

啊，GC 讓 master 不見... (NameNode... XDDD)

Writing robust distributed systems costs more than writing robust single-machine systems.

Robust, open source distributed systems are much less common than robust, single-machine systems.

這兩條... XD

Oh, and Paxos really is very hard to implement

Paxos... XD

If you can fit your problem in memory, it’s probably trivial.

(噴飯)

“It’s slow” is the hardest problem you’ll ever debug.

連問題都找不到嗎... XD

撇開這些碎碎念的部份，就算對 distributed system 沒那麼熟，這篇文章也提到了很多「解決的方向」以及「關鍵字」讓你找資料，對於實際操作時會有很大的幫助。