公平會對創業家兄弟與松果公司的 SEO 誘導轉向開罰

好像很少提到國內的新聞,但這則應該是這兩天蠻熱門的一個新聞,創業家兄弟與松果公司 (也是創業家兄弟公司) 被公平會開罰:「操作SEO搜尋關鍵字誤導消費者 創業家兄弟、松果公司挨罰」,相關的備份先留起來:Internet Archivearchive.today

公平會官方的新聞稿則可以在「利用程式設計引誘消費者「逛錯街」,公平會開罰」這邊看到,對應的網頁備份:Internet Archivearchive.today

用的是公平交易法第 25 條:

公平會於4月12日第1594次委員會議通過,創業家兄弟股份有限公司及松果購物股份有限公司利用「搜尋引擎優化 (Search Engine Optimization,簡稱SEO)」技術,並在搜尋引擎的顯示結果上不當顯示特定品牌名稱,使消費者誤認該賣場有販售特定品牌產品,藉以增進自身網站到訪率,違反公平交易法第25條規定,處創業家兄弟公司200萬元、松果公司80萬元罰鍰。

這條的條文可以從「公平交易法§25-全國法規資料庫」這邊看到:

除本法另有規定者外,事業亦不得為其他足以影響交易秩序之欺罔或顯失公平之行為。

主要的原因是點進去後卻沒有該項商品:

公平會發現,消費者在Google搜尋引擎打上特定品牌名稱,例如「悅夢床墊」時,搜尋結果會出現「悅夢床墊的熱銷搜尋結果│生活市集」、「人氣熱銷悅夢床墊口碑推薦品牌整理─松果購物」等搜尋結果,消費者被前述搜尋結果吸引點選進入「生活市集」、「松果購物」網站後,卻發現該賣場並無「悅夢床墊」之產品,此係生活市集及松果購物之經營者創業家兄弟公司及松果公司分別利用SEO技術所產生的現象。

而且會透過使用者在往站上搜尋的關鍵字產生對應的頁面:

公平會進一步調查後發現,創業家兄弟公司及松果公司對其所經營之「生活市集」及「松果購物」網頁進行設計,只要網路使用者在該2網站搜尋過「悅夢床墊」,縱然該2網站賣場並沒有賣「悅夢床墊」,其網站程式也會主動生成行銷文案網頁,以供搜尋引擎攫取。若有消費者之後在Google搜尋引擎查詢「悅夢床墊」時,搜尋結果便會帶出「悅夢床墊的熱銷搜尋結果│生活市集」、「人氣熱銷悅夢床墊口碑推薦品牌整理─松果購物」等搜尋結果項目,經消費者點選後即會導向「生活市集」、「松果購物」之網站。

然後判罰的部份:

公平會過往即曾就事業使用競爭對手事業名稱作為關鍵字廣告,並在關鍵字廣告併列競爭對手事業名稱之行為,認定違反公平交易法第25條規定。本案雖非創業家兄弟公司及松果公司直接使用「悅夢床墊」等他人商品品牌作為關鍵字廣告,但最終呈現之結果,本質上都是「誘導/轉向」(bait-and-switch)的欺罔行為,除了打斷消費者正常的商品搜尋與購買過程,也對其他販售該等品牌商品之經營者形成不公平競爭的效果。若任由發生而不予規範,未來將可能導致其他競爭者之競相仿效,消費者將更難以分辨搜尋結果呈現資訊之真偽,進而威脅電商市場之競爭秩序及消費者利益。故公平會認為違反公平交易法第25條「足以影響交易秩序之欺罔及顯失公平行為」,並分別處創業家兄弟公司200萬元、松果公司80萬元罰鍰。

所以這算是對 Dark pattern SEO 的部份開罰...

最近 Linux 核心安全性問題的 Dirty Pipe 故事很有趣...

Hacker News 上看到「The Dirty Pipe Vulnerability」這個 Linux kernel 的安全性問題,Hacker News 上相關的討論在「The Dirty Pipe Vulnerability (cm4all.com)」這邊可以看到。

這次出包的是 splice() 的問題,先講他寫出可重製 bug 的程式碼,首先是第一個程式用 user1 放著跑:

#include <unistd.h>
int main(int argc, char **argv) {
  for (;;) write(1, "AAAAA", 5);
}
// ./writer >foo

然後第二個程式也放著跑 (可以是不同的 user2,完全無法碰到 user1 的權限):

#define _GNU_SOURCE
#include <unistd.h>
#include <fcntl.h>
int main(int argc, char **argv) {
  for (;;) {
    splice(0, 0, 1, 0, 2, 0);
    write(1, "BBBBB", 5);
  }
}
// ./splicer <foo |cat >/dev/null

理論上不會在 foo 裡面看到任何 BBBBB 的字串,但卻打穿了... 透過 git bisect 的檢查,他也確認了是在「pipe: merge anon_pipe_buf*_ops」這個 commit 時出的問題。

不過找到問題的過程拉的頗長,一開始是有 web hosting 服務的 support ticket 說 access log 下載下來發現爛掉了,無法解壓縮:

It all started a year ago with a support ticket about corrupt files. A customer complained that the access logs they downloaded could not be decompressed. And indeed, there was a corrupt log file on one of the log servers; it could be decompressed, but gzip reported a CRC error.

然後他先手動處理就把票關起來了:

I fixed the file’s CRC manually, closed the ticket, and soon forgot about the problem.

接下來過幾個月後又發生,經過幾次的 support ticket 後他手上就有一些「資料」可以看:

Months later, this happened again and yet again. Every time, the file’s contents looked correct, only the CRC at the end of the file was wrong. Now, with several corrupt files, I was able to dig deeper and found a surprising kind of corruption. A pattern emerged.

然後因為發生的頻率也不是很高,加上邏輯上卡到死胡同,所以他也沒有辦法花太多時間在上面:

None of this made sense, but new support tickets kept coming in (at a very slow rate). There was some systematic problem, but I just couldn’t get a grip on it. That gave me a lot of frustration, but I was busy with other tasks, and I kept pushing this file corruption problem to the back of my queue.

後來真的花時間下去找,利用先前的 pattern 掃了一次系統 log,發現有規律在:

External pressure brought this problem back into my consciousness. I scanned the whole hard disk for corrupt files (which took two days), hoping for more patterns to emerge. And indeed, there was a pattern:

  • there were 37 corrupt files within the past 3 months
  • they occurred on 22 unique days
  • 18 of those days have 1 corruption
  • 1 day has 2 corruptions (2021-11-21)
  • 1 day has 7 corruptions (2021-11-30)
  • 1 day has 6 corruptions (2021-12-31)
  • 1 day has 4 corruptions (2022-01-31)

The last day of each month is clearly the one which most corruptions occur.

然後就試著寫各種 reproducible code,最後成功的版本就是開頭提到的,然後他發現這個漏洞可以是 security vulnerability,就回報出去了,可以看到前後從第一次的 support ticket 到最後解決花了快一年的時間,不過 Linux kernel 端修正的速度蠻快的:

  • 2021-04-29: first support ticket about file corruption
  • 2022-02-19: file corruption problem identified as Linux kernel bug, which turned out to be an exploitable vulnerability
  • 2022-02-20: bug report, exploit and patch sent to the Linux kernel security team
  • 2022-02-21: bug reproduced on Google Pixel 6; bug report sent to the Android Security Team
  • 2022-02-21: patch sent to LKML (without vulnerability details) as suggested by Linus Torvalds, Willy Tarreau and Al Viro
  • 2022-02-23: Linux stable releases with my bug fix (5.16.11, 5.15.25, 5.10.102)
  • 2022-02-24: Google merges my bug fix into the Android kernel
  • 2022-02-28: notified the linux-distros mailing list
  • 2022-03-07: public disclosure

整個故事還蠻精彩的 XD

修正 Curl 的 TLS handshake,避開 bot 偵測機制

利用 TLS handshake 的 pattern 可以當作是某種 fingerprint,就可以知道你是用 Curl,這個方式在蠻多 CDN 都會用在 anti-bot 機制 (像是 Cloudflare),而剛剛看到有人投稿自己的 patch,試著將 Curl 修改成 Firefox 的 pattern:「curl-impersonate」,Hacker News 上的討論在這邊可以看到:「Show HN: Curl modified to impersonate Firefox and mimic its TLS handshake (github.com/lwthiker)」。

作者有提到這次的 patch 偏 hack,不太可能整進上游,但希望未來改的乾淨一點,然後整進上游:

I hope to do so in the future, for now the implementation is extremely hacky so I doubt it can get accepted into curl.

另外有人提出來說應該要用 Firefox ESR 版本的 pattern 而非 stable channel,也有人提出來說用 Google Chrome 的更好,不過我覺得有人開始做就已經很棒了 XD

透過 Interview 的管道推銷產品

Hacker News 上看到的 dark pattern:「NDA Expired - let's spill the beans on a weird startup」,對應的討論可以在「NDA expired, let’s spill the beans on a weird startup (shkspr.mobi)」這邊翻到。

整個行銷手法大概是這樣,首先是客戶會想要推廣某個產品 (文章內的 AWS 想要推廣 InfiniDash 這個產品),所以找上這個行銷公司。而行銷公司會找一群人去各公司 interview,然後在 interview 過程中問到各種問題的時候都回答用 InfiniDash 這個產品解決:

"Let's say in the interview you're asked for a time when you handled a difficult situation. We want you to talk about how using InfiniDash made life easier for your team."

這個行銷公司會跟某些獵頭合作,把人塞進去 (而這些獵頭也會拿到錢):

"Oh, we work with loads of recruitment consultants. They get paid for every decent candidate who gets interviewed, so they give us a cut of their commission. And we get paid by our tech clients. It's win-win!"

後來作者沒有加入這家公司,而這家公司後來也消失了:

I did not take the job. I had serious concerns about the ethics and legality of their business model. I also didn't think that it would work. Evidently, neither did their investors. Fronk bumbled along for a few months - and I'd occasionally meet them at conferences - but then they seemed to disappear.

但作者最後提到了他之後在 interview 時會聽到面試者講一些特定的產品,讓他懷疑這家公司是不是還存在 XDDD

Every once in a while, I'll be interviewing a candidate who starts waxing lyrical about how rewriting everything in today's flavour of JavaScript really helped their last company. Or how their bosses were impressed with what this cool new bit of tech can do. Or why they could never work anywhere which didn't use this specific code editor.

Reddit 的前 EM 提到 Reddit 用的 Dark Pattern

Hacker News Daily 上看到「Reddit's disrespectful design」這個,在講 Reddit 用到的 dark pattern, 不過我覺得更棒的是在 Hacker News 上的討論:「Reddit’s disrespectful design (ognjen.io)」。

討論裡面出現了之前在 Reddit 裡面負責這塊的 EM (engineer manager),wting

I was the EM for Reddit's Growth team around this time. I am responsible for / contributed to a few features like the current signup flow, AMP pages, push notifications, email digests, app download interstitials, etc.

然後他提到他當初拒絕實做其中一些 dark pattern,然後在經過 [0] 的事情後,他決定換團隊:

There was a new product lead who joined with many good ideas, but some of them were dark patterns that I heavily protested. After a few months of this, it was obvious that I was going to be reigned in or let go[0]; I immediately transferred to a different org.

其中 [0] 的部份是,公司把他 reporting line 改成到 product lead XDDD

0: They changed it so I would report to the product lead, which is odd for an EM to report into a product chain and the only instance within the company ever.

另外他也講了一個事情,就是強姦使用者是有用的,而且超級有用:

Now let me explain the other side of the story. 4 years later, Reddit's DAU, MAU, and revenue have all grown at ridiculous rates[1]. Yes, power users complain—and still continue using the site—but the casual user does not. These dark patterns have been normalized on other websites.

These practices are done because it works.

1: Many friends are startup founders and I've been at a few startups myself—a byproduct of being in the Bay Area—and Reddit's growth numbers are impressive. As a former employee, I am quite happy about my equity growth.

這幾乎是共識了,dark pattern 看起來很邪惡沒錯,但他就是有用...

GitHub 的 API Token 換格式

GitHub 前幾天宣佈更換 API token 的格式:「Authentication token format updates are generally available」,在今年三月初的時候有先公告要換:「Authentication token format updates」。

另外昨天也解釋了換成這樣的優點:「Behind GitHub’s new authentication token formats」。

首先是 token 的字元集合變大了:

The character set changed from [a-f0-9] to [A-Za-z0-9_]

另外是增加了 prefix 直接指出是什麼種類的 token:

The format now includes a prefix for each token type:

  • ghp_ for Personal Access Tokens
  • gho_ for OAuth Access tokens
  • ghu_ for GitHub App user-to-server tokens
  • ghs_ for GitHub App server-to-server tokens
  • ghr_ for GitHub App refresh tokens

另外官方目前先不會改變 token 長度 (透過字元變多增加 entropy),但未來有打算要增加:

The length of our tokens is remaining the same for now. However, GitHub tokens will likely increase in length in future updates, so integrators should plan to support tokens up to 255 characters after June 1, 2021.

看起來當初當作 hex string 而轉成 binary 會有問題,不過就算這樣做應該也是轉的回來的。

回到好處的部份,這個作法跟 SlackStripe 類似,讓開發者或是管理者更容易辨識 token 的類型:

As we see across the industry from companies like Slack and Stripe, token prefixes are a clear way to make tokens identifiable. We are including specific 3 letter prefixes to represent each token, starting with a company signifier, gh, and the first letter of the token type.

另外這也讓 secret scanning 的準確度更高,本來是 40 bytes 的 hex string,有機會撞到程式碼內的 SHA-1 string:

Many of our old authentication token formats are hex-encoded 40 character strings that are indistinguishable from other encoded data like SHA hashes. These have several limitations, such as inefficient or even inaccurate detection of compromised tokens for our secret scanning feature.

另外官方也建議現有的 token 換成新的格式,這樣如果真的發生洩漏,可以透過 secret scanning 偵測並通知:

We strongly encourage you to reset any personal access tokens and OAuth tokens you have. These improvements help secret scanning detection and will help you mitigate any risk to compromised tokens.

最近很熱鬧的 New York Times 退訂截圖

最近很熱鬧的 New York Times 的退訂過程截圖在這邊,可以看到滿滿的 Dark pattern 想辦法讓使用者難以退訂:「Before buying a NYT subscription, here's what it will take you to cancel it.」,這點在 Hacker News 上的討論也可以看一下:「Before buying a NYT subscription, here's what it'll take to cancel it (imgur.com)」。

我在看的時候想到美國好像有通過法律,要求租用與退訂流程的對等性,查了一下資料發現理解不正確,之前看到的新聞應該是加州州政府通過的法令:「SB-313 Advertising: automatic renewal and continuous service offers.(2017-2018)」。

2018 年法令生效當時也有報導,裡面講的比較白話:「Companies must let customers cancel subscriptions online, California law says」,結果看到這則報導裡面給的範例時馬上笑出來,因為又是 New York Times,看起來是就是慣犯 XDDD

One person tweeted about trying to cancel a New York Times subscription on the phone and being put on hold for 15 minutes -- twice.

在 Hacker News 的討論裡有提到,美國的使用者可以考慮用 Privacy 這個虛擬信用卡服務,對於這種很搞事的 subscription 直接關閉對應的信用卡帳號就好。

台灣之前有遠東銀行提供 Mastercard InControl 方案,但看起來到去年年底也不提供了:「MasterCard inControl 網路交易虛擬卡號申請服務110年1月1日起終止公告」。

NordVPN 綁架使用者的方式...

Hacker News Daily 上看到「NordVpn disables features when you turn off auto-renew」這個,這也太厲害了:

NordVPN 設計成只要關掉 auto-renewal 就直接拔掉一些功能,一臉 WTF...

Hacker News 的「NordVPN disables features when you turn off auto-renew (reddit.com)」看到這段提出來的論點蠻有趣的,當作一個參考觀點:

By now these VPN providers are like toothpaste, diapers or soft drinks: completely undifferentiated between competitors, and so only able to maintain their market share by spending loads on marketing. Of course the company with most egregious dark patterns and aggressive churn dampening wins.

Thankfully a tube of toothpaste doesn't allow implementing dark patterns like this... yet.

SQL Antipatterns: Avoiding the Pitfalls of Database Programming

標題裡是說這本書:「SQL Antipatterns: Avoiding the Pitfalls of Database Programming」,在 2010 年出版的書。

我是在 Hacker News Daily 上看到「Ask HN: What are some examples of good database schema designs?」這篇,裡面提到了這本書,看了一下章節看到只有 USD$25 就馬上先買起來丟到 Kindle 裡面...

這不是給資料庫初學者看的書,主要的讀者是對於「標準」夠熟 (學校教的那些理論基礎,像是各種 index 的底層結構,正規化的方法,以及正規化的原因),然後也有一些實務經驗後的開發者。

因為裡面把常遇到的問題,與可能的解決方式 (通常都違反當初在學校學的理論基礎) 整理成這本書,在底子還沒打穩前跑來看這本書並不是個好主意...

另外裡面雖然不一定有提到,但有蠻多技巧其實是用在「為了效能而不則手段」的情境下,所以這些內容對於想要調校比較大的網站應該也是很有幫助。

讓手機上瀏覽器自動帶出數字鍵盤的方式

在「HTML attributes to improve your users' two factor authentication experience」這邊看到關於讓使用者輸入 2FA (通常是數字) 比較流暢的設定。

原始是上面這張這樣,目標是希望下面這張這樣,當透過 SMS 2FA 時可以提供選項直接貼上,而且也自動帶出數字鍵盤:

給出的幾個重點在於 inputmodepattern 以及 autocomplete

<input
  type="text"
  name="token"
  id="token"
  inputmode="numeric"
  pattern="[0-9]*"
  autocomplete="one-time-code"
/>

查了一下 caniuse.com 上面的支援度,pattern 基本上都支援了,autocomplete 在這邊用到的 one-time-code 基本上也沒問題,只有 inputmode 這邊支援度比較差,IE11 (基本上不會更新了)、FirefoxSafari 沒支援。