PostgreSQL 對 fsync() 的修正

上次寫了「PostgreSQL 對 fsync() 的行為傷腦筋...」提到 fsync() 有些地方是與開發者預期不同的問題,但後面忘記跟進度...

剛剛看到 Percona 的人寫了「PostgreSQL fsync Failure Fixed – Minor Versions Released Feb 14, 2019」這篇才發現在 2/14 就出了對應的更新,從 release notes 也可以看到:

By default, panic instead of retrying after fsync() failure, to avoid possible data corruption (Craig Ringer, Thomas Munro)

Some popular operating systems discard kernel data buffers when unable to write them out, reporting this as fsync() failure. If we reissue the fsync() request it will succeed, but in fact the data has been lost, so continuing risks database corruption. By raising a panic condition instead, we can replay from WAL, which may contain the only remaining copy of the data in such a situation. While this is surely ugly and inefficient, there are few alternatives, and fortunately the case happens very rarely.

A new server parameter data_sync_retry has been added to control this; if you are certain that your kernel does not discard dirty data buffers in such scenarios, you can set data_sync_retry to on to restore the old behavior.

現在的 workaround 是遇到 fsync() 失敗時為了避免 data corruption,會直接 panic 讓整個 PostgreSQL 從 WAL replay 記錄,也代表 HA 機制 (如果有設計的話) 有機會因為這個原因被觸發...

不過也另外設計了 data_sync_retry,讓 PostgreSQL 的管理者可以硬把這個 panic 行為關掉,改讓 PostgreSQL 重新試著 fsync(),這應該是在之後 kernel 有修改時會用到...

各家 Session Replay 服務對個資的處理

Session Replay 指的是重播將使用者的行為錄下來重播,市面上有很多這樣的服務,像是 User Replay 或是 SessionCam

這篇文章就是在討論這些服務在處理個資時的方式,像是信用卡卡號的內容,或是密碼的內容,這些不應該被記錄下來的資料是怎麼被處理的:「No boundaries: Exfiltration of personal data by session-replay scripts」,主要的重點在這張圖:

後面有提到目前防禦的情況,看起來目前用 adblock 類的軟體可以擋掉一些服務,但不是全部的都在列表裡。而 DNT 則是裝飾品沒人鳥過:

Two commonly used ad-blocking lists EasyList and EasyPrivacy do not block FullStory, Smartlook, or UserReplay scripts. EasyPrivacy has filter rules that block Yandex, Hotjar, ClickTale and SessionCam.

At least one of the five companies we studied (UserReplay) allows publishers to disable data collection from users who have Do Not Track (DNT) set in their browsers. We scanned the configuration settings of the Alexa top 1 million publishers using UserReplay on their homepages, and found that none of them chose to honor the DNT signal.

Improving user experience is a critical task for publishers. However it shouldn’t come at the expense of user privacy.

Facebook 備份 MySQL 資料並且確認正確性的方法

Facebook 再多花了一些篇幅數對於 MySQL 資料備份以及確認正確性的方法:「Continuous MySQL backup validation: Restoring backups」。

首先是 Continuous Restore Tier (CRT) 這塊,可以看到他們在這塊很仰賴 HDFS 當作備份的第一層基地,包括了 Full logical backups (用 mysqldump)、Differential (diff) backups 以及 Binary log (binlog) backups (stream 進 HDFS)。

另外上了 GTID,對於後續的處理會比較方便:

All of our database servers also use global transaction IDs (GTIDs), which gives us another layer of control when replaying transactions from binlog backups.

在 CRT 這塊可以看到其實是拿現成的工具堆起來,不同單位會因為規模而有不同的作法。真正的重點反而在 ORC Restore Coordinator (ORC) 這塊,可以看到 Facebook 開發了大量的程式將回復這件事情自動化處理:

在收到回復的需求後,可以看到 Peon 會從 HDFS 拉資料出來,並且用 binlog replay 回去:

Peons contain all relevant logic for retrieving backups from HDFS, loading them into their local MySQL instance, and rolling them forward to a certain point in time by replaying binlogs. Each restore job a peon works on goes through these five stages[.]

也是因為 Facebook 對 MySQL 的用量大到需要自動化這些事情,才有這些東西...