Mozilla 實做百度發表的 Speech-To-Text 引擎 Deep Speech

Hacker News 上看到 MozillaGitHub 上的 mozilla/DeepSpeech 這個專案,用 TensorFlow 實做了百度的「Deep Speech: Scaling up end-to-end speech recognition」論文:

A TensorFlow implementation of Baidu's DeepSpeech architecture

語音轉文字的方案,Mozilla 開專案實做出來了...

這程式碼需要安裝 Git Large File Storage 才能完整下載包含訓練資料的部份:

Manually install Git Large File Storage, then clone the repository normally:
git clone https://github.com/mozilla/DeepSpeech

而目前已經有的資料來自於 Mozilla 另外一個專案「Common Voice」:

The Common Voice project is Mozilla's initiative to help teach machines how real people speak.

Common Voice 這個專案目前只有英文,網頁上就可以參與 validation 過程...

透過搜尋引擎找 Hostname

看到「Fast subdomains enumeration tool for penetration testers」這個專案,可以透過多家搜索引擎找 hostname 出來做滲透測試。

支援五個大的搜尋引擎,以及 NetcraftDNSdumpster

Sublist3r currently supports the following search engines: Google, Yahoo, Bing, Baidu, and Ask. More search engines may be added in the future. Sublist3r also gathers subdomains using Netcraft and DNSdumpster.

不過沒有把 Yandex 放進去...

百度被抓到蒐集個資後還是要蒐集...

在「Thousands of apps running Baidu code collect, leak personal data - research」這篇裡,加拿大的研究團隊 Citizen Lab 發現百度的 Android SDK 使用非加密傳輸這些個資:

The unencrypted information that has been collected includes a user's location, search terms and website visits, JeffreyKnockel, chief researcher at Citizen Lab, told Reuters ahead of publication of the research on Wednesday.

百度說他們會修正加密問題,但還是要蒐集:

[,] and Baidu told Reuters it would be fixing the encryption holes in its kits, but would still collect data for commercial use, some of which it said it shares with third parties.

霸氣!不愧是百度!即使被抓到後還是要蒐集 XDDD

CloudFlare 跟百度合作進入中國市場

昨天的大新聞,CloudFlare 宣佈跟百度合作進入中國市場:「How We Extended CloudFlare's Performance and Security Into Mainland China」。

在「China network」這邊可以看到各種限制,首先是需要有牌 (ICP) 才能用:

CloudFlare customers that wish to serve traffic for their domains across the China network must possess a valid Internet Content Provider (ICP) license. An ICP license is a Chinese government issued license required to host or cache Internet content within mainland China. Learn more about how you can obtain an ICP license here.

另外是不支援 HTTPS:

For the moment the China network does not support HTTPS traffic (HTTP only). Support for SSL/TLS will be made available in the coming months.

目前只開放給 Enterprise 用戶:

Initially, the China network will be limited to Enterprise customers. Over time, as we are better able to operationalize the onboarding of customers, we hope to extend the benefits to all plan levels.

由於要 ICP 的關係,對於境外網站沒有太多幫助。另外也不確定是不是還是用 Anycast 技術,如果是的話就要煩惱某些網站的流量有機會被導到中國了。

Google 對 GitHub 先前遭受 GFW 的 DDoS 攻擊的分析

Google Online Security 分析了前陣子 GitHub 被 DDoS 攻擊的行為:「A Javascript-based DDoS Attack as seen by Safe Browsing」。

透過 GoogleSafe Browsing,針對 baidu.com 這個網域的 injection 情況分析:

可以看得出來分成多個不同階段攻擊。其中 AWSCloudFront 承受了不小的壓力,不過畢竟是商用水準的 CDN,沒那麼容易垮掉。後來則是攻擊 GitHub 造成影響而上了新聞。

最終還是繼續推廣 TLS,可以避免中間被 injection 攻擊:

Had the entire web already moved to encrypted traffic via TLS, such an injection attack would not have been possible.