Home » Computer » Archive by category "Search Engine" (Page 3)

完全分散式的 BitTorrent 搜尋引擎

BitTorrent 已經有足夠的技術與環境 (ecosystem) 做出完全分散式架構的搜尋引擎了,類似於 eDonkey Network (或是說後來變成主流的 eMule) 上的 search 功能,但一直沒看到類似的東西...

magnetico 算是一個嘗試,完全透過 DHT 搜尋取得結果:

不過這套軟體的 license 是攻擊性超強的 AGPLv3,算是實驗性質吧。要真正普及應該要像 eMule 一樣直接做進 client 內...

Cloudbleed:Cloudflare 這次的安全問題

Cloudflare 把完整的時間軸與影響範圍都列出來了:「Incident report on memory leak caused by Cloudflare parser bug」。

出自於 2/18 時 GoogleTavis Ormandy 直接在 Twitter 上找 Cloudflare 的人:

Google 的 Project Zero 上的資料:「cloudflare: Cloudflare Reverse Proxies are Dumping Uninitialized Memory」。

起因在於 bug 造成有時候會送出不應該送的東西,可能包含了敏感資料:

It turned out that in some unusual circumstances, which I’ll detail below, our edge servers were running past the end of a buffer and returning memory that contained private information such as HTTP cookies, authentication tokens, HTTP POST bodies, and other sensitive data.

不過這邊不包括 SSL 的 key,主要是因為隔離開了:

For the avoidance of doubt, Cloudflare customer SSL private keys were not leaked. Cloudflare has always terminated SSL connections through an isolated instance of NGINX that was not affected by this bug.

不過由於這些敏感資料甚至還被 Google 收進 search engine,算是相當的嚴重,所以不只是 Cloudflare 得修好這個問題,還得跟眾多的 search engine 合作將這些資料移除:

Because of the seriousness of such a bug, a cross-functional team from software engineering, infosec and operations formed in San Francisco and London to fully understand the underlying cause, to understand the effect of the memory leakage, and to work with Google and other search engines to remove any cached HTTP responses.

bug 影響的時間從 2016/09/22 開始:

2016-09-22 Automatic HTTP Rewrites enabled
2017-01-30 Server-Side Excludes migrated to new parser
2017-02-13 Email Obfuscation partially migrated to new parser
2017-02-18 Google reports problem to Cloudflare and leak is stopped

而以 2/13 到 2/18 的流量反推估算,大約是 0.00003% 的 request 會可能產生這樣的問題:

The greatest period of impact was from February 13 and February 18 with around 1 in every 3,300,000 HTTP requests through Cloudflare potentially resulting in memory leakage (that’s about 0.00003% of requests).

不過不得不說 Tavis Ormandy 真的很硬,在沒有 source code 以及 Cloudflare 幫助的情況下直接打出可重製的步驟:

I worked with cloudflare over the weekend to help clean up where I could. I've verified that the original reproduction steps I sent cloudflare no longer work.

事發後完整的時間軸:

2017-02-18 0011 Tweet from Tavis Ormandy asking for Cloudflare contact information
2017-02-18 0032 Cloudflare receives details of bug from Google
2017-02-18 0040 Cross functional team assembles in San Francisco
2017-02-18 0119 Email Obfuscation disabled worldwide
2017-02-18 0122 London team joins
2017-02-18 0424 Automatic HTTPS Rewrites disabled worldwide
2017-02-18 0722 Patch implementing kill switch for cf-html parser deployed worldwide
2017-02-20 2159 SAFE_CHAR fix deployed globally
2017-02-21 1803 Automatic HTTPS Rewrites, Server-Side Excludes and Email Obfuscation re-enabled worldwide

另外在「List of Sites possibly affected by Cloudflare's #Cloudbleed HTTPS Traffic Leak」這邊有人整理出受影響的大站台有哪些 (小站台就沒列上去了)。

Google Search 是否執行 JavaScript...

在討論 Google Search 是不是會執行 javascript。半年前的文章了,不過最近作者發現又有新東西:「Does Google execute JavaScript?」。

作者用了幾段 javascript 程式碼測試 (可以參考原文),發現其實不保證會執行 javascript,所以還是建議大家乖乖跑 server-side render 產生資料,這樣對其他搜尋引擎的 SEO 也比較好:

WiGLE (Wireless Geographic Logging Engine)

WiGLE 是個蒐集無線網路資訊的服務 (i.e. SSID、mac address 以及定位位置),依照維基百科上的資料,WiGLE 計畫從 2001 年開始,到現在快十五年了:

The first recorded hotspot on WiGLE was uploaded in September 2001.

趁著最近 Pokémon Go 會跑來跑去,就順便幫忙蒐集資料了...

目前我蒐集資料的方式是透過 WiGLE 的 Android 應用程式 (Wigle Wifi Wardriving) 開著讓他背景跑,然後回到家以後上傳,過一陣子讓系統更新後就可以看到了。

看到 zmx 貼了之前的連結,更確信 Uber 的問題不是技術問題了...

Twitter 上看到 zmx 提了一個連結,講 Uber 年初時貼的「How We Built Uber Engineering’s Highest Query per Second Service Using Go」這篇文章的問題:

對照最近的事情還蠻有趣的,尤其是這篇文章後面提到的,酸~爆~了~XDDD:

It is clear to me that the team at Uber under-engineered this problem. Thoughtfully designing this service could trim down the number of nodes by an order of magnitude and save hundreds of thousands of dollars each year. That may sound like pittance to a company valued at more than the GDP of Delaware, but in my eyes that’s the salaries of a few engineers and a few good engineers can go a long way. Maybe even further than the few extra Mercedes-Benz S-Classes they could add to their fleet from the money they could be saving...

先不提政治問題,上面提到的 Quadtree 算是簡單易懂的結構,好久沒看到這個資料結構了:

Google 的 www.google.com 要上 HSTS 了

Google 宣佈 www.google.com 要上 HSTS 了:「Bringing HSTS to www.google.com」。

雖然都已經使用 EFFHTTPS Everywhere 在跑確保 HTTPS,但這個進展還是很重要,可以讓一般使用者受到保護...

Google 的切換計畫是會逐步增加 HSTS 的 max-age,確保中間出問題時造成的衝擊。一開始只會設一天,然後會逐步增加,最後增加到一年:

In the immediate term, we’re focused on increasing the duration that the header is active (‘max-age’). We've initially set the header’s max-age to one day; the short duration helps mitigate the risk of any potential problems with this roll-out. By increasing the max-age, however, we reduce the likelihood that an initial request to www.google.com happens over HTTP. Over the next few months, we will ramp up the max-age of the header to at least one year.

Google 推出 iOS 上的搜尋輸入法 Gboard

Google 推出了 iOS 上的搜尋輸入法 Gboard:「Meet Gboard: Search, GIFs, emojis & more. Right from your keyboard.」,可以在 App Store 上下載「Gboard — Search. GIFs. Emojis & more. Right from your keyboard.」,目前只有英文版可以用,其他語言還要等:

Get it now in the App Store in English in the U.S., with more languages to come.

可以參考 Google 做的 GIF 動畫,把搜尋的功能結合進去了:

John Gruber 寫了一篇「Gboard」關於隱私問題的討論,看起來還頗正面的。除了搜尋結果當然會需要傳到 Google 的系統裡以外,只有 crash log 會匿名傳回去。

WordPress.com 的 Elasticsearch

在「State of WordPress.com Elasticsearch Systems 2016」這邊描述他們 Elasticsearch 的架構。

有五個 cluster 打散,有跑 1.3.x 也有 1.7.x。把一般使用者與 VIP 分開,而全站的資料又是一組。另外在 2.3.x 的測試機上跑 en.support.wordpress.com 的資料 (看起來是短時間炸掉沒關係?XD)。

由於是自己生機器出來,所以機器的選擇上用大量的記憶體與 SSD 硬碟來換各種效能:

Typical data server config:
* 96GB RAM with 31GB for ES heap. Remaining gets used for file system caching
* 1-3 TB of SSD per server. In our testing SSDs are very worthwhile.

另外上面還是有疊 cache:

memcache timeouts vary from 30 seconds to 36 hours depending on use case

Archives