操作 S3 Command Line 的工具

在朋友的 Facebook 上看的東西:「S5cmd for High Performance Object Storage」。會想要寫這篇是因為看到 s4cmds5cmd 這兩個工具的命名而笑出來:

不過這篇也可以看到差異,s3cmd 是自己用 Python 刻所有東西,s4cmd 還是用 Python,但是因為 boto3 而快了不少,而 s5cmd 則是改用 Golang 寫,並且採用多個 TCP connection 操作而讓效能大幅提昇。

AWS Cloud 的用法

Hacker News Daily 上看到這則,分享了 AWS (他的前東家,超過八年) 的使用經驗:

除了可以在 Twitter 上看以外,也可以用 Thread reader 直接讀整條 thread,應該也還算清楚:「This is how I use the good parts of @awscloud, while filtering out all the distracting hype.」。

這邊的經驗談主要是在 web 與 app 相關的服務這塊:

有講到 AWS 的業務其實圍繞在 scalability 上發展,但這對 startup 可能反而是扣分,因為暴力法解反而可以大幅簡化架構換得 agile (而讓 startup 存活下來)。

另外從團隊的開發成本來看,這些 scale 的技術增加了開發成本,產生了很多開發上的限制,這些觀點也有點帶到「Premature optimization is the root of all evil」在講的事情:

最後的結論可以看到一些列表:

除了 DynamoDB 的意見不同外 (這邊提到的 DDB),其他的我都可以接受...

Amazon S3 淘汰 Path-style 存取方式的新計畫

先前在「Amazon S3 要拿掉 Path-style 存取方式」提到 Amazon S3 淘汰 Path-style 存取方式的計畫,經過幾天後有改變了。

Jeff Barr 發表了一篇「Amazon S3 Path Deprecation Plan – The Rest of the Story」,裡面提到本來的計畫是 Path-style model 只支援到 2020/09/30,被大幅修改為只有在 2020/09/30 後建立的 bucket 才會禁止使用 Path-style:

In response to feedback on the original deprecation plan that we announced last week, we are making an important change. Here’s the executive summary:

Original Plan – Support for the path-style model ends on September 30, 2020.

Revised Plan – Support for the path-style model continues for buckets created on or before September 30, 2020. Buckets created after that date must be referenced using the virtual-hosted model.

這樣大幅降低本來會預期的衝擊,但 S3 團隊希望償還的技術債又得繼續下去了... 也許再過個幾年後才會再被提出來?

Amazon S3 要拿掉 Path-style 存取方式

Hacker News 上翻的時候翻到的公告:「Announcement: Amazon S3 will no longer support path-style API requests starting September 30th, 2020」。

現有的兩種方法,一種是把 bucket name 放在 path (V1),另外一種是把 bucket name 放在 hostname (V2):

Amazon S3 currently supports two request URI styles in all regions: path-style (also known as V1) that includes bucket name in the path of the URI (example: //s3.amazonaws.com/<bucketname>/key), and virtual-hosted style (also known as V2) which uses the bucket name as part of the domain name (example: //<bucketname>.s3.amazonaws.com/key).

這次要淘汰的是 V1 的方式,預定在 2020 年十月停止服務 (服務到九月底):

Customers should update their applications to use the virtual-hosted style request format when making S3 API requests before September 30th, 2020 to avoid any service disruptions. Customers using the AWS SDK can upgrade to the most recent version of the SDK to ensure their applications are using the virtual-hosted style request format.

Virtual-hosted style requests are supported for all S3 endpoints in all AWS regions. S3 will stop accepting requests made using the path-style request format in all regions starting September 30th, 2020. Any requests using the path-style request format made after this time will fail.

AWS 推出更便宜的儲存方案 Glacier Deep Archive

AWS 推出的這個方案價錢又更低了:「New Amazon S3 Storage Class – Glacier Deep Archive」。

在這之前在 us-east-1S3 最低的方案是 Glacier Storage,單價是 USD$0.004/GB (也就是 $4/TB)。

而這次推出的 Glacier Deep Archive Storage 在同一區則是直接到 USD$0.00099/GB ($0.99/TB),大約是 1/4 的價錢。

Glacier Deep Archive 在取得時 first byte 的保證時間是 12 小時,另外最低消費是 180 天:

Retrieval time within 12 hours

先前就有的 Glacier Storage 則是可以在取用時設定取得的 pattern (會影響 first byte 的時間),而最低消費是 90 天:

Configurable retrieval times, from minutes to hours

Pricing for each of these metrics is determined by the speed at which data is requested based on three options. "Expedited" queries <250 MB are typically returned in 1-5 minutes. "Standard" queries are typically returned in 3-5 hours. "Bulk" queries are typically returned in 5-12 hours.

多一個更便宜的選擇可以用。

AWS 推出了 Live 時全自動上字幕的功能

AWS 推出了在直播時就自動上字幕的功能:「Introducing Live Streaming with Automated Multi-Language Subtitling」,其實就是把現有的服務兜出來:「Live Streaming with Automated Multi-Language Subtitling」。

The solution deploys Live Streaming on AWS which includes AWS Elemental MediaLive, MediaPackage, Amazon CloudFront. The solution also deploys AWS Lambda, Amazon Simple Storage Service, Amazon Transcribe, and Amazon Translate.

對於比較沒那麼要求翻譯品質的情況也許可以玩看看...?

Cloudflare 發表 Logpush,把 log 傳出來

Cloudflare 推出了 Logpush,可以把 log 傳到 Amazon S3 或是 Google Cloud Storage 上 (目前就支援這兩個):「Logpush: the Easy Way to Get Your Logs to Your Cloud Storage」,目前是 enterprise 方案可以用:

Today, we’re excited to announce a new way to get your logs: Logpush, a tool for uploading your logs to your cloud storage provider, such as Amazon S3 or Google Cloud Storage. It’s now available in Early Access for Enterprise domains.

Logpush currently works with Amazon S3 and Google Cloud Storage, two of the most popular cloud storage providers.

以往是自己拉,需要考慮 HA 問題 (兩台小機器就可以了,機器成本是還好,但維護成本頗高),現在則是可以讓 Cloudflare 主動推上來...

我大多數的建議是 log 儘量留 (包括各種系統的 log 與 http access log)。主要是因為儲存成本一直下降 (S3 的 1TB 才 USD$23/month,如果使用 Glacier 會低到 USD$4/month),而且 log 在壓縮後會變小很多 (經常會有重複的字)。

最傳統的 AWStats 就可以看到不少資訊,如果是透過 IP address 與 User-agent,交叉配合其他 event data 就可以讓很多 machine learning 演算法取得的資訊更豐富。

AWS 要再推出更低價的 S3 Glacier Deep Archive

AWS 打算要推出 S3 的新產品,單價更低但反應時間更長的儲存方案:「Coming Soon – S3 Glacier Deep Archive for Long-Term Data Retention」。

設計的客群是對法規有要求的情況:

It is designed for customers — particularly those in highly-regulated industries, such as the Financial Services, Healthcare, and Public Sectors — that retain data sets for 7-10 years or longer to meet regulatory compliance requirements.

看起來是那種 log 丟著就不會去動的情境... (Write once never read?XD)

AWS 推出了 S3 Object Lock,保護資料被刪除的可能性

AWS 推出了 S3 Object Lock,可以設定條件鎖住 S3 上的 object,以保護資料不被刪除:「AWS Announces Amazon S3 Object Lock in all AWS Regions」。

這個功能跟會計做帳的概念很像,也就是寫進去後就不能改,也不能刪除,保留一定時間後才移除掉:

You can migrate workloads from existing write-once-read-many (WORM) systems into Amazon S3, and configure S3 Object Lock at the object- and bucket-levels to prevent object version deletions prior to pre-defined Retain Until Dates or Legal Hold Dates.

AWS 提供有兩種模式,一個是 Governance mode,這個模式下可以設定某些 IAM 權限可以移除 S3 Object Lock。另外一個是 Compliance mode,這個模式下連 root account 都不能刪除:

S3 Object Lock can be configured in one of two modes. When deployed in Governance mode, AWS accounts with specific IAM permissions are able to remove object locks from objects. If you require stronger immutability to comply with regulations, you can use Compliance Mode. In Compliance Mode, the protection cannot be removed by any user, including the root account.

Amazon S3 推出了一個自動分析後分類的 Storage Class

Amazon S3 推出了新的 Storage Class,後面直接用演算法分析 access pattern (所以要跑一陣子才會生效),然後決定要放到 Standard 或是 Standard IA 裡:「Announcing S3 Intelligent-Tiering — a New Amazon S3 Storage Class」。

混了 Standard 與 Standard IA:

S3 Intelligent-Tiering stores objects in two access tiers: one tier that is optimized for frequent access and another lower-cost tier that is optimized for infrequent access.

然後連續 30 天沒有被存取的就會被丟到 Standard IA,如果有被存取的話就會被搬回來,而搬移的部份不用收費:

For a small monthly monitoring and automation fee per object, S3 Intelligent-Tiering monitors access patterns and moves objects that have not been accessed for 30 consecutive days to the infrequent access tier. There are no retrieval fees in S3 Intelligent-Tiering. If an object in the infrequent access tier is accessed later, it is automatically moved back to the frequent access tier. No additional tiering fees apply when objects are moved between access tiers within the S3 Intelligent-Tiering storage class.

從費用上可以看到演算法本身是有費用的,換算一下 1M objects 是 USD$2.5/month,好像還可以...

Monitoring and Automation, All storage / Month$0.0025 per 1,000 objects

不過有蠻多要注意的 pattern。像是這邊有提到 128KB 以下的檔案不會搬到 IA 上,但不知道算不算 Monitoring 的費用?

S3 Intelligent-Tiering has a minimum eligible object size of 128KB for auto-tiering. Smaller objects may be stored but will always be charged at the Frequent Access tier rates.

另外這邊講 S3 Intelligent-Tiering 的三十天也不知道是不是 Standard + Standard IA,或是分開算:

S3 Intelligent-Tiering, S3 Standard-IA, and S3 One Zone-IA storage are charged for a minimum storage duration of 30 days.

可以先觀望一下...