Facebook 修正錯字的新演算法

先前 Facebook 已經先發表過 fastText 了,在這個月的月初又發表了另外一個演算法 Misspelling Oblivious Embeddings (MOE),是搭著本來的 fastText 而得到的改善:「A new model for word embeddings that are resilient to misspellings」。

Facebook 的說明提到在 user-generated text 的內容上,MOE 的效果比 fastText 好:

We checked the effectiveness of this approach considering different intrinsic and extrinsic tasks, and found that MOE outperforms fastText for user-generated text.

論文發表在 arXiv 上:「Misspelling Oblivious Word Embeddings」。

依照介紹,fastText 的重點在於 semantic loss,而 MOE 則多了 spell correction loss:

The loss function of fastText aims to more closely embed words that occur in the same context. We call this semantic loss. In addition to the semantic loss, MOE also considers an additional supervisedloss that we call spell correction loss. The spell correction loss aims to embed misspellings close to their correct versions by minimizing the weighted sum of semantic loss and spell correction loss.

不過目前 GitHub 上的 facebookresearch/moe 只有放 dataset,沒有 open source 出來讓人直接用,可能得自己刻...

Cloudflare 提供 Certificate Transparency 的通知服務

Cloudflare 推出了 Certificate Transparency 的通知服務,當有任何新的 SSL certificate 發出時就會通知:「Introducing Certificate Transparency Monitoring」。

基本上就是一組 crawler 去各家記錄挖資料回來:

Business 與 Enterprise 版本的可以設定要通知誰,Free 與 Pro 版本則是固定會通知網域的擁有人 (這邊指的是 Cloudflare 上的):

If you’re on a Business or Enterprise plan, you can tell us who to notify. Instead of emailing the zone owner (which we do for Free and Pro customers), we accept up to 10 email addresses as alert recipients. We do this to avoid overwhelming large teams.

剛剛把掛在上面的 domain 都開起來了,這樣在 Let's Encrypt 自動 renew 的時候應該會收到通知...

另外一個同名的工具則是 2016 年底時 Facebook 推出的 (需要登入才能使用),可以搜尋有哪些 domain:「Certificate Transparency Monitoring」。

Facebook 推出了 Hermes,為了 React Native 而生的 JS Engine

Facebook 提供了一個對 React Native 最佳化的 JS engine:「Hermes: An open source JavaScript engine optimized for mobile apps, starting with React Native」。

裡面有提到兩個比較重要的的部份是 No JIT 與 Garbage collector strategy,針對行動裝置的特性而設計:避免 JIT 產生的 overhead,以及降低記憶體使用量。

官方給的改善主要也都是偏這兩塊:

不過沒有提到 CPU usage 會上升多少,只是帶過去:

Notably, our primary metrics are relatively insensitive to the engine’s CPU usage when executing JavaScript code.

對於 Facebook 也許是可以接受的數量,但對於其他人就沒概念了... 要入坑的人自己衡量這部份的風險 XD

幫你的 iPhone 電話簿找到對應的頭像

前幾天看到的:「Announcing Vignette」,透過 social network 的資料,把本來電話簿裡面的 icon 更新:

透過 app store 的搜尋找不太到,我一開始用了「Vignette」搜不到,但用「Vignette Update」就可以。或者你可以透過他提供的連結直接開 app store:「Vignette – Update Contact Pics」。

這是一個 IAP 類的付費服務,搜尋是免費的,但如果要把資料更新回通訊錄,需要付 USD$4.99 (一次性),台灣帳號是付 TWD$170,應該是因為最近的稅務調整:

Vignette allows you to scan your contacts and see what it can find for free. If you wish to actually save these updates to your contact list, you must pay for a one-time in-app purchase. That purchase costs $4.99, is not a subscription, and is the only in-app purchase.

搜尋的範圍包括了 GravatarTwitterFacebookInstagram

Email is used for Gravatar
Twitter
Facebook
A custom network called Instagram

另外作者有提到這個 app 不傳資料到伺服器上,都是在自己的裝置上連到上面提到的 social network 尋找:

Privacy is paramount
All the processing is done on-device; this isn’t the sort of app where your contacts are uploaded en masse to some server, and out of your control.

所以速度不會太快,但對隱私比較好...

Facebook 的招募狀況

Facebook 的各種醜聞知道招募變難 (而且醜聞沒停過),但變得多困難則是第一次看到報導:「Facebook has struggled to hire talent since the Cambridge Analytica scandal, according to recruiters who worked there」。

新鮮人的接受從 85% 掉到 35%~55%,依據不同的學校而有差異:

Among top schools, Facebook’s acceptance rate for full-time positions offered to new graduates has fallen from an average of 85% for the 2017-2018 school year to between 35% and 55% as of December.

Among top schools, such as Stanford, Carnegie Mellon and Ivy League universities, Facebook’s acceptance rate for full-time positions offered to new graduates has fallen from an average of 85% for the 2017-2018 school year to between 35% and 55% as of December, according to former Facebook recruiters. The biggest decline came from Carnegie Mellon University, where the acceptance rate for new recruits dropped to 35%.

工程師則是從 90% 掉到 50%:

The company has seen a decline in its job offer acceptance rates to software engineer candidates from nearly 90% in late 2016 to almost 50% in early 2019.

Facebook 發言人 Anthony Harrison 否認,不過沒有給出說明:

After the publication of this story, Harrison contacted CNBC to say "these numbers are totally wrong."

Facebook disputed the accuracy of the recruiters' accounts, but declined to point out any specific points that were wrong.

Hacker News 上的討論也蠻有趣的:「https://news.ycombinator.com/item?id=19931977」,有不少其他的觀察。

回來用 uBlock Origin 擋 Facebook 廣告...

基本上現在是哪個有用就用哪個... @_@

先前提到的「擋 Facebook 廣告的 Userscript」這個又不會動啦... 所以又到處找方法,目前看起來在 uBlock Origin 的「Facebook · Issue #3367 · uBlockOrigin/uAssets」這邊有一直在討論新的擋法,之後如果又看到廣告就過來這邊看一下...

目前用這組:

www.facebook.com##div[id^=hyperfeed_story_id_]:has(span[data-ft="{\"tn\":\"j\"}"])
www.facebook.com##.pagelet-group .pagelet:has(a:has-text(Sponsored))
www.facebook.com##.pagelet-group .pagelet:has(a:has-text(Create ad))

不知道可以活多久...

Facebook 員工爆料內部密碼存了明碼

Krebs on Security 這邊看到的:「Facebook Stored Hundreds of Millions of User Passwords in Plain Text for Years」,Facebook 官方的回應在「Keeping Passwords Secure」這邊。

幾個重點,第一個是範圍,目前已經有看到 2012 的資料都有在內:

The Facebook source said the investigation so far indicates between 200 million and 600 million Facebook users may have had their account passwords stored in plain text and searchable by more than 20,000 Facebook employees. The source said Facebook is still trying to determine how many passwords were exposed and for how long, but so far the inquiry has uncovered archives with plain text user passwords dating back to 2012.

另外的重點是這些資料已經被內部拿來大量搜尋 (喔喔):

My Facebook insider said access logs showed some 2,000 engineers or developers made approximately nine million internal queries for data elements that contained plain text user passwords.

另外是 Legal 與 PR 都已經啟動處理了,對外新聞稿會美化數字,降低傷害:

“The longer we go into this analysis the more comfortable the legal people [at Facebook] are going with the lower bounds” of affected users, the source said. “Right now they’re working on an effort to reduce that number even more by only counting things we have currently in our data warehouse.”

另外也會淡化後續的程序:

Renfro said the company planned to alert affected Facebook users, but that no password resets would be required.

去年的另外一則新聞可以交叉看:「Facebook’s security chief is leaving, and no one’s going to replace him」:

Instead of building out a dedicated security team, Facebook has dissolved it and is instead embedding security engineers within its other divisions. “We are not naming a new CSO, since earlier this year we embedded our security engineers, analysts, investigators, and other specialists in our product and engineering teams to better address the emerging security threats we face,” a Facebook spokesman said in an email. Facebook will “continue to evaluate what kind of structure works best” to protect users’ security, he said.

看起來又要再換一次密碼了... (還好已經習慣用 Password Manager,所以每個站都有不同密碼?)

喔對,另外補充一個概念,當他們說「我們沒有證據有人存取了...」的時候,比較正確的表達應該是「我們沒有稽核這塊... 所以沒有證據」。

移除 Facebook 上的行動電話

Facebook 上的行動電話號碼除了被拿來當作 2FA 的備案外,也被強制分享。更糟的是還被拿來當作廣告的條件:

目前同時有 TOTP (Facebook 的 app 提供的) 與 U2F,完全不會用到簡訊認證,就拔掉吧...

擋 Facebook 廣告的 Userscript

Facebook 為了反制各種「擋廣告軟體」,用了各種奇怪的 DOM 在擋:

目前看起來 ublock origin 這類擋廣告軟體支援的格式已經擋不住了,得靠其他工具來擋... 用到現在一直有在更新的「Facebook unsponsored」算是還行... 看 source code 可以看到他是直接抓有顯示的字串來分析,所以不會受到 DOM 的干擾,不過最近看起來又開始被搞了... XD