各家 glob 的效能...

在「Glob Matching Can Be Simple And Fast Too」這邊看到在分析 (a.*)nb 這樣的 pattern 的效能 (像是 a.*a.*a.*b 這樣的東西),第一波先測 shell,結果發現有趣的現象:

那個 csh 是怎麼了 XDDD

Looking at the source code, it doesn’t attempt to perform glob expansion itself. Instead it calls the C library implementation glob(3), which runs in linear time, at least on this Linux system. So maybe we should look at programming language implementations too.

再來是看各程式語言:

各家實做方式不一樣 XD

然後文章裡有提到這個方式是之前蠻常見的 DoS 技巧,用很少的資源就可以吃光你的 CPU resource... 另外也提到了可能的解法,就是限制星號的數量:

At the very least, most FTP servers should probably reject glob patterns with more than, say, 3 stars.

後面演算法的部份也是很有趣的主題...

StackOverflow 對於多 DNS 商的同步方式...

他們的解法是設計出一套 DSL (Domain Specific Language),然後從 DSL 轉出各 DNS 商的格式:「Introducing DnsControl – “DNS as Code” has Arrived」。

stackoverflow.com 來說,可以看到有同時使用 AWSRoute 53GoogleCloud DNS

;; ANSWER SECTION:
stackoverflow.com.      36458   IN      NS      ns-cloud-e2.googledomains.com.
stackoverflow.com.      36458   IN      NS      ns-358.awsdns-44.com.
stackoverflow.com.      36458   IN      NS      ns-1033.awsdns-01.org.
stackoverflow.com.      36458   IN      NS      ns-cloud-e1.googledomains.com.

於是他們就用 DSL 管理:

D(“stackoverflow.com”, REG_NAMEDOTCOM, DnsProvider(R53), DnsProvider(GCLOUD),
    A(“@”, “198.252.206.16”),
    A(“blog”, “198.252.206.20”),
    CNAME(“chat”, “chat.stackexchange.com.”),
    CNAME(“www”, “@”, TTL(3600)),
    A(“meta”, “198.252.206.16”)
)

這套程式碼在「StackExchange/dnscontrol」這邊,但這樣搞有種微妙的感覺... 不考慮直接用兩家有支援 AXFR 架構的 DNS 商來架設嗎?這樣就只要用 BIND 這類已經很熟悉的軟體設定就好?

超越線性成長的資料庫架構

標題取自 Percona 的「Better Than Linear Scaling」。

其實是因為機器數量增加,而且有妥善規劃,使得 cache 的 hit rate 上升而讓整體效率變好 (也就是 1 + 1 > 2)。

不只在 database 上會發生,在其他系統上其實會有類似的情況,剛好看到覺得很懷念 XD

Percona XtraDB Cluster 解釋最近的版本為什麼可以再度提昇效能

Percona XtraDB Cluster 最近的一個版本宣稱效能大幅提昇,這篇試著解釋原因:「How We Made Percona XtraDB Cluster Scale」。

裡面提了三個情境,解釋了這些效能問題以及解法,進而帶出這次的效能提昇:

With these three main optimizations, and some small tweaks, we have tuned Percona XtraDB Cluster to scale better and made it fast enough for the growing demands of your applications. All of this is available with the recently released Percona XtraDB Cluster 5.7.17-29.20. Give it a try and watch your application scale in a multi-master environment, making Percona XtraDB Cluster the best fit for your HA workloads.

查看 Percona XtraDB Cluster 在 IST 時的進度

Percona 的「Tracking IST Progress in Percona XtraDB Cluster」這篇介紹了 PXC 在 IST 時的進度。

是個新功能,很簡單的可以看到目前進度,也可以拿來當作 health check 的一部分,避免開機剛起來時開始提供服務:

mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+--------------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------------+
| wsrep_ist_receive_status | 3% complete, received seqno 1421771 of 1415410-1589676 |
+--------------------------+--------------------------------------------------------+
1 row in set (0.00 sec)
....
mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| wsrep_ist_receive_status | 52% complete, received seqno 1506799 of 1415410-1589676 |
+--------------------------+---------------------------------------------------------+
1 row in set (0.00 sec)
....
mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| wsrep_ist_receive_status | 97% complete, received seqno 1585923 of 1415410-1589676 |
+--------------------------+---------------------------------------------------------+
1 row in set (0.00 sec)
mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| wsrep_ist_receive_status | |
+--------------------------+-------+
1 row in set (0.00 sec)

SQL 的各種 JOIN

SQL 的各種 JOIN 其實是個很好玩的主題,不是很難,但沒有仔細練習過一次通常只能答出常用的那幾種... 而且過一陣子又會忘記 XD

這幾天不知道哪邊看到的舊文章「Say NO to Venn Diagrams When Explaining JOINs」,這篇作者大聲疾呼不要用 Venn diagram 解釋 SQL 的 JOIN,把比較特別的 JOIN 拿出來舉例...

因為 Venn diagram 是講集合交聯集這類的操作,但 JOIN 不僅僅是如此... 舉例來說,CROSS JOIN 對應到 cartesian product 就沒辦法用 Venn diagram 簡單的表示出來,而必須畫的更「具體」:

Bose 販賣用戶隱私被告

Unroll 在旁邊燒的時候 (參考 Uber 戰火蔓延到 Unroll),Bose 也不甘寂寞決定跟上科技業的潮流:「Bose headphones spy on listeners: lawsuit」。

Bose 直接將他們 app 收集到的資訊拿出來賣:

Bose Corp spies on its wireless headphone customers by using an app that tracks the music, podcasts and other audio they listen to, and violates their privacy rights by selling the information without permission, a lawsuit charged.

這次打算控告的產品包括這些 (這邊提到的 Zak 是原告):

Zak is seeking millions of dollars of damages for buyers of headphones and speakers, including QuietComfort 35, QuietControl 30, SoundLink Around-Ear Wireless Headphones II, SoundLink Color II, SoundSport Wireless and SoundSport Pulse Wireless.

編號可以記一下,之後可以拿來追蹤:

The case is Zak v Bose Corp, U.S. District Court, Northern District of Illinois, No. 17-02928.

Uber 戰火蔓延到 Unroll

最近 Uber 的 CEO 被 Tim Cook 叫去喝咖啡的事情被報導出來:「Uber’s C.E.O. Plays With Fire」,裡面提到了 Uber 試著要「辨別」使用者的 iPhone,而這違反蘋果的政策:

To halt the activity, Uber engineers assigned a persistent identity to iPhones with a small piece of code, a practice called “fingerprinting.” Uber could then identify an iPhone and prevent itself from being fooled even after the device was erased of its contents.

There was one problem: Fingerprinting iPhones broke Apple’s rules. Mr. Cook believed that wiping an iPhone should ensure that no trace of the owner’s identity remained on the device.

而 Uber 的搞法是針對蘋果總部所在地點屏蔽這個功能:

So Mr. Kalanick told his engineers to “geofence” Apple’s headquarters in Cupertino, Calif., a way to digitally identify people reviewing Uber’s software in a specific location. Uber would then obfuscate its code for people within that geofenced area, essentially drawing a digital lasso around those it wanted to keep in the dark. Apple employees at its headquarters were unable to see Uber’s fingerprinting.

然後被蘋果工程師抓到,於是 Tim Cook 把人叫來喝咖啡:

The ruse did not last. Apple engineers outside of Cupertino caught on to Uber’s methods, prompting Mr. Cook to call Mr. Kalanick to his office.

另外提到了 Uber 從 Unroll.me 買來 Lyft 的帳單資料當作分析:

Using an email digest service it owns named Unroll.me, Slice collected its customers’ emailed Lyft receipts from their inboxes and sold the anonymized data to Uber. Uber used the data as a proxy for the health of Lyft’s business. (Lyft, too, operates a competitive intelligence team.)

而更精彩的在 Hacker News 上的這串爆了不少料,提到 Unroll 會把所有信件掃下來,丟到 S3 上面:

I worked for a company that nearly acquired unroll.me. At the time, which was over three years ago, they had kept a copy of every single email of yours that you sent or received while a part of their service. Those emails were kept in a series of poorly secured S3 buckets. A large part of Slice buying unroll.me was for access to those email archives. Specifically, they wanted to look for keyword trends and for receipts from online purchases.

The founders of unroll.me were pretty dishonest, which is a large part of why the company I worked for declined to purchase the company. As an example, one of the problems was how the founders had valued and then diluted equity shares that employees held. To make a long story short, there weren't any circumstances in which employees who held options or an equity stake would see any money.

I hope you weren't emailed any legal documents or passwords written in the clear.

而在 FAQ 的「If I delete my Unroll.Me account, what will happen to all of my previously rolled up emails?」裡則是說我們沒有存你的信件:

這爆米花要多買一些了...

在 F5 設備上使用 Let's Encrypt 憑證

找資料的時候翻到 F5 官方有給一篇導引,介紹如何自動化 Let's Encrypt 的憑證:「Lightboard Lessons: Automating SSL on BIG-IP with Let's Encrypt!」。

在 F5 的 GitHub 上也有一包「f5devcentral/lets-encrypt-python」可以看看。

還沒仔細看 & 測試,但大概有些輪廓了。看起來要考慮到裡面用的 letsencrypt.sh 已經改名成 dehydrated,另外就是實際測試了...

其實憑證貴的不是費用,是跑採購流程的成本... 單 domain 的如果可以用 Let's Encrypt 解決的話會可以省下不少功夫。

利用 Unicode Domain 釣魚,以及 Chrome 與 Firefox 的解法

一個多禮拜前引起蠻多討論的一篇文章,利用 Unicode Domain 釣魚的方法:「Phishing with Unicode Domains」。

由於這是幾乎完美的攻擊,所以被提出來後 (Security: Whole-script confusable domain label spoofing) 有不少討論:

This bug was reported to Chrome and Firefox on January 20, 2017 and was fixed in the Chrome trunk on March 24. The fix is included in Chrome 58 which is currently rolling out to users.

comment 8 提到:

We do have a whitelist. Essentially you're suggesting that we remove Cyrillic and Greek characters from the list. I'm not sure we want to go down that path.

在新版的 Chrome 58 已經「修正」了這個問題:

Firefox 的討論在「IDN Phishing using whole-script confusables on Windows and Linux」這邊,一開始就直接把票給關了 XDDD:

Indeed. Our IDN threat model specifically excludes whole-script homographs, because they can't be detected programmatically and our "TLD whitelist" approach didn't scale in the face of a large number of new TLDs. If you are buying a domain in a registry which does not have proper anti-spoofing protections (like .com), it is sadly the responsibility of domain owners to check for whole-script homographs and register them.

We can't go blacklisting standard Cyrillic letters.

If you think there is a problem here, complain to the .com registry who let you register https://www.xn--80ak6aa92e.com/ .

Gerv

Status: NEW → RESOLVED
Last Resolved: 3 months ago
Flags: needinfo?(gerv)
Resolution: --- → WONTFIX

然後一個月前被提出來看看 Chrome 怎麼做:

Gerv/Valentin, is this something we can/should align with Chromium on?

目前唯一的解法是改 flag,把所有的 Unicode Domain 直接當作一般的 domain 來處理,列出像是 www.xn--80ak6aa92e.com 的網址。