Amazon EFS 也要推出 IA 版本了 (Infrequent Access)

Amazon EFS 也要推出 IA (Infrequent Access) 版本了,Infrequent Access 指的是不常存取的資料:「Coming Soon – Amazon EFS Infrequent Access Storage Class」。

這剛好配合上很多人拿 Amazon EFS 來堆 log 的行為... AWS 是說有機會到省到 85%,不過應該是非常大的量才有機會有這個價錢?

EFS IA reduces storage costs for files not accessed every day, with savings up to 85% compared to the EFS Standard storage class.

其實用過 Amazon EFS 的人都對效能抱怨頗嚴重 (透過 NFS 有太多操作沒辦法 cache,於是 network latency issue 就出現了),堆 log 或是當作跨機器的空間大概是目前的主流用法...

Twitter 密碼中槍...

Twitter 發了公告請大家改密碼:「Keeping your account secure」。不只是 Twitter 自家的密碼,如果你有重複使用同一組密碼,也建議一起修改:

Out of an abundance of caution, we ask that you consider changing your password on all services where you’ve used this password.

雖然使用 bcrypt,但因為透過 log 記錄下了未加密的密碼,所以就中槍了:

We mask passwords through a process called hashing using a function known as bcrypt, which replaces the actual password with a random set of numbers and letters that are stored in Twitter’s system. This allows our systems to validate your account credentials without revealing your password. This is an industry standard.

Due to a bug, passwords were written to an internal log before completing the hashing process. We found this error ourselves, removed the passwords, and are implementing plans to prevent this bug from happening again.

這時候就要再推 Password manager 這種東西了,在每個站台都使用完全不同的密碼,可以降低這類問題帶來的衝擊...

把 git log 用得很開心...?

看到「git log – the Good Parts」這篇文章,裡面研究了 Gitgit log 的各種好用的功能,然後整理出來... (所以是 good parts XD)

作者用的參數是一個一個加上去,所以可以一個階段一個階段了解用途。除了可以用作者推薦的 repository 測試外,我建議大家拿個自己比較熟悉的 open source 專案來測 (有用到比較複雜的架構):

git log
git log --oneline
git log --oneline --decorate
git log --oneline --decorate --all
git log --oneline --decorate --all --graph

看到喜歡的部份可以在 ~/.gitconfig 裡設 alias 使用,像是用 git l 之類的?保留 git log 本身可以避免一些 script 用到這個指令時因為輸出格式跟預期不一樣而爛掉 XD

EnterpriseDB 打算推出的 zheap,想要解 VACUUM 問題...

前天被問到「DO or UNDO - there is no VACUUM」這篇,回家後仔細看一看再翻了一些資料,看起來是要往 InnoDB 的解法靠...

PostgreSQL 與 InnoDB 都是透過 MVCC 的概念實做 transaction 之間的互動,但兩者實際的作法不太一樣。其中帶來一個明顯的差異就是 PostgreSQL 需要 VACUUM。這點在同一篇作者八年前 (2011) 的文章就有提過兩者的差異以及優缺點:「MySQL vs. PostgreSQL, Part 2: VACUUM vs. Purge」。

UPDATE 時,InnoDB 會把新資料寫到表格內,然後把可能會被 rollback 的舊資料放到表格外:

In InnoDB, only the most recent version of an updated row is retained in the table itself. Old versions of updated rows are moved to the rollback segment, while deleted row versions are left in place and marked for future cleanup. Thus, purge must get rid of any deleted rows from the table itself, and clear out any old versions of updated rows from the rollback segment.

而被 DELETE 清除的資料則是由 purge thread 處理:

All the information necessary to find the deleted records that might need to be purged is also written to the rollback segment, so it's quite easy to find the rows that need to be cleaned out; and the old versions of the updated records are all in the rollback segment itself, so those are easy to find, too.

所以可以在 InnoDB 看到 purge thread 相關的設定:「MySQL :: MySQL 5.7 Reference Manual :: 14.6.11 Configuring InnoDB Purge Scheduling」,負責處理這些東西。

而在 PostgreSQL 的作法則是反過來,舊的資料放在原來地方,新資料另外存:

PostgreSQL takes a completely different approach. There is no rollback tablespace, or anything similar. When a row is updated, the old version is left in place; the new version is simply written into the table along with it.

新舊資料的位置其實還好,主要是因為沒有類似的地方可以記錄哪些要清:

Lacking a centralized record of what must be purged, PostgreSQL's VACUUM has historically needed to scan the entire table to look for records that might require cleanup.

這也使得 PostgreSQL 裡需要 autovacuum 之類的程序去掃,或是手動跑 vacuum。而在去年 (2017) 的文章裡也有提到目前還是類似的情況:「MVCC and VACUUM」。

而在今年 (2018) 的文章裡,EnterpriseDB 就提出了 zheap 的想法,在 UPDATE 時寫到 table 裡,把可能被 rollback 的資料放到 undo log 裡。其實就是把 InnoDB 那套方法拿過來用,只是整篇都沒提到而已 XD:

That brings me to the design which EnterpriseDB is proposing. We are working to build a new table storage format for PostgreSQL, which we’re calling zheap. In a zheap, whenever possible, we handle an UPDATE by moving the old row version to an undo log, and putting the new row version in the place previously occupied by the old one. If the transaction aborts, we retrieve the old row version from undo and put it back in the original location; if a concurrent transaction needs to see the old row version, it can find it in undo. Of course, this doesn’t work when the block is full and the row is getting wider, and there are some other problem cases as well, but it covers many useful cases. In the typical case, therefore, even bulk updates do not force a zheap to grow. Instead, the undo grows. When a transaction commits, all row versions that will become dead are in the undo, not the zheap.

不過馬上就會想到問題,如果要改善問題,不是個找地方記錄哪些位置要回收就好了嗎?順便改變方法是為了避免 fragment 嗎?

等著看之後變成什麼樣子吧...

ExpressVPN 在土耳其的 VPN server 被抄...

ExpressVPN 在土耳其的 VPN server 被抄,為了調查大使的刺殺案件:「VPN Server Seized to Investigate Russian Ambassador’s Assassination」。

A VPN server operated by ExpressVPN was seized by Turkish authorities to investigate the assassination of Andrei Karlov, the Russian Ambassador to Turkey. Authorities hoped to find more information on people who removed digital traces of the assassin, but the server in question held no logs.

ExpressVPN 官方的回覆在「ExpressVPN statement on Andrey Karlov investigation」,主要的部份是:

As we stated to Turkish authorities in January 2017, ExpressVPN does not and has never possessed any customer connection logs that would enable us to know which customer was using the specific IPs cited by the investigators. Furthermore, we were unable to see which customers accessed Gmail or Facebook during the time in question, as we do not keep activity logs. We believe that the investigators’ seizure and inspection of the VPN server in question confirmed these points.

至於是不是真的,就需要時間確認了...

Amazon Elasticsearch 支援 I3 instance (i.e. 1.5 PB Disk) 了

Amazon Elasticsearch 支援 I3 instance 了:「Run Petabyte-Scale Clusters on Amazon Elasticsearch Service Using I3 instances」。

Amazon Elasticsearch Service now supports I3 instances, allowing you to store up to 1.5 petabytes of data in a single Elasticsearch cluster for large log analytics workloads.

i3.16xlarge 單台是 15.2 TB 的硬碟空間,100 台就會是 1.5 PB,不知道跑起來會多慢 XDDD

Amazon Elasticsearch Service – Amazon Web Services (AWS) | FAQs 這邊還沒修正 XD:

You can request a service limit increase up to 100 instances per domain by creating a case with the AWS Support Center. With 100 instances, you can allocate about 150 TB of EBS storage to a single domain.

AWS 推出 Amazon GuardDuty 進行內部網路監控

AWS 推出 Amazon GuardDuty 監控內部網路:「Amazon GuardDuty – Continuous Security Monitoring & Threat Detection」。

從示意圖可以看到結合了許多 log 資料,然後綜合判斷:

In combination with information gleaned from your VPC Flow Logs, AWS CloudTrail Event Logs, and DNS logs, this allows GuardDuty to detect many different types of dangerous and mischievous behavior including probes for known vulnerabilities, port scans and probes, and access from unusual locations.

所以連 Bitcoin 相關網站也當作條件之一 XD

開了相當多區 (相較於之前 AWS Elemental MediaOOXX 系列...):

Amazon GuardDuty is available in production form in the US East (Northern Virginia), US East (Ohio), US West (Oregon), US West (Northern California), EU (Ireland), EU (Frankfurt), EU (London), South America (São Paulo), Canada (Central), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), and Asia Pacific (Mumbai) Regions and you can start using it today!

CloudFlare 也要提供 Certificate Transparency 的 Log 伺服器了...

看到 CloudFlare 請求加入 Chromium (Google Chrome) 的伺服器列表:「Certificate Transparency - Cloudflare "nimbus2017" Log Server Inclusion Request」。

對照之前的「Chromium 內提案移除 HPKP (HTTP Public Key Pinning)」以及「Let's Encrypt 的 Embed SCT 支援」,這樣看起來是瀏覽器內會有一份白名單,只有在這白名單上的 Embed SCT 才會被信任...

但弄到這樣的話,log server 是不是也要有稽核機制?

好像哪邊搞錯了方向啊...

Route 53 的 Query 記錄

Amazon Route 53 可以收 query log 了,會丟到 CloudWatch Logs:「Amazon Route 53 Announces Support For DNS Query Logging」。

If you are using Amazon Route 53 as your public, authoritative DNS, you will now have the capability to easily log DNS queries received by Amazon Route 53 through integration with CloudWatch Logs.

這樣可以拿來分析了...

nginx 記錄 TLS 連線資訊

想要在 nginx 的 access log 裡面記錄使用者在 HTTPS 連線使用的 TLS protocol 與 cipher。

在「How can I let nginx log the used SSL/TLS protocol and ciphersuite?」這邊有提到方向是 $ssl_protocol$ssl_cipher (出自「Module ngx_http_ssl_module」內的 Embedded Variables 章節)。

他的方式是在前面就插入 protocol,但我希望前面的欄位保持不變,把 protocol & cipher 放到後面,所以我就加了一個 /etc/nginx/conf.d/combined_ssl.conf (這邊我用 ondrej 的 PPA,在設定檔裡會撈 /etc/nginx/conf.d/ 下的設定,不確定其他的情況如何):

#
log_format combined_ssl '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $ssl_protocol/$ssl_cipher';

然後本來用 combined 的 HTTPS 設定就改成 combined_ssl

來放一陣子再來分析,然後想看看要怎麼調整 cipher...