Home » Posts tagged "s3"

Amazon S3 的流量,以及 S3 與 Glacier 都推出 Select 功能

Twitter 上看到會場的照片,Amazon S3 單一 region 就有 37 Tb/sec 的量:

在這種量下面對 DDoS 沒什麼感覺 XDDD

另外是 Amazon S3 與 Amazon Glacier 都推出了 Select 功能:「S3 Select and Glacier Select – Retrieving Subsets of Objects」。

看示範的程式碼就可以看出用途了,原文中間那段有 sytax error,我這邊就幫忙修掉了:

handler = PrintingResponseHandler()
s3 = boto3.client('s3')
response = s3.select_object_content(
    Bucket="super-secret-reinvent-stuff",
    Key="stuff.csv",
    SelectRequest={
        'ExpressionType': 'SQL',
        'Expression': 'SELECT s._1 FROM S3Object AS s',
        'InputSerialization': {
            'CompressionType': 'NONE',
            'CSV': {
                'FileHeaderInfo': 'IGNORE',
                'RecordDelimiter': '\n',
                'FieldDelimiter': ',',
            }
        },
        'OutputSerialization': {
            'CSV': {
                'RecordDelimiter': '\n',
                'FieldDelimiter': ',',
            }
        }
    }
)

這樣可以大幅降低 I/O,節省成本:

Glacier Select 也是類似的想法,不需要整包拉出來再處理,可以在一開始就設定條件。

AWS Media Services 推出一卡車與影音相關的服務...

AWS 推出了一連串 AWS Elemental MediaOOXX 一連串影音相關的服務:「AWS Media Services – Process, Store, and Monetize Cloud-Based Video」。

但不是所有的服務都是相同的區域... 公告分別在:

不過這邊還是引用 Jeff Barr 文章裡的說明,可以看到從很源頭的 transencoding 到 DRM,以及 Live 格式,到後續的檔案儲存及後製 (像是上廣告) 都有:

AWS Elemental MediaConvert – File-based transcoding for OTT, broadcast, or archiving, with support for a long list of formats and codecs. Features include multi-channel audio, graphic overlays, closed captioning, and several DRM options.

AWS Elemental MediaLive – Live encoding to deliver video streams in real time to both televisions and multiscreen devices. Allows you to deploy highly reliable live channels in minutes, with full control over encoding parameters. It supports ad insertion, multi-channel audio, graphic overlays, and closed captioning.

AWS Elemental MediaPackage – Video origination and just-in-time packaging. Starting from a single input, produces output for multiple devices representing a long list of current and legacy formats. Supports multiple monetization models, time-shifted live streaming, ad insertion, DRM, and blackout management.

AWS Elemental MediaStore – Media-optimized storage that enables high performance and low latency applications such as live streaming, while taking advantage of the scale and durability of Amazon Simple Storage Service (S3).

AWS Elemental MediaTailor – Monetization service that supports ad serving and server-side ad insertion, a broad range of devices, transcoding, and accurate reporting of server-side and client-side ad insertion.

引個前同事的 tweet,先不說 Amazon SWF 的情況 (畢竟 Amazon SWF 還可以找到其他用途),倒是 Amazon Elastic Transcoder 很明顯要被淘汰掉了:

這種整個大包的東西是 AWS re:Invent 才有的能量,平常比較少看到...

靜態站台的選擇...

Hacker News 首頁上看到的文章,講 Jekyll 一路跟 Amazon S3Amazon CloudFront 接上去的步驟:「Jekyll CBCD Pipeline to the Cloud」。

我看了以後覺得好麻煩 @_@

然後回頭看 Hacker News 上的評論:「Jekyll Static Web Hosting – Deployment Pipeline on AWS | Hacker News」,看到這段:

What a nightmare. I'm sure there are use cases for a setup like this, but this is not the system I'd like to maintain. I use Jekyll because of it's simplicity. I edit my site in my favorite text editor and rsync to shared hosting.

好多人都有同感啊 XDDD

另外有人提到 Netlify 這個服務:

After I discovered Netlify, I'm kind of thinking "why bother". It's free, I just push to my repo and they take care of all the building/publishing/hosting/CDNs, and they're very responsive for support and have high availability. I'm a very happy customer (or rather leech, as I don't pay anything).

下面評價看起來還算不錯,而且有 free tier 可以用,也許可以找機會玩看看...

用 Percona Toolkit 備份的 MySQL 可以直接還原到 Amazon RDS 上

AWS 宣佈 Amazon RDS for MySQL 支援從 Percona Toolkit 備份出來的檔案還原了:「Easily restore an Amazon RDS MySQL database from your MySQL backup」。

Starting today you can easily restore a new Amazon RDS for MySQL database instance from a backup of your existing MySQL database, including MySQL databases running on Amazon EC2 or outside of AWS. This is done by creating a backup using the Percona XtraBackup tool and uploading the resulting files to an Amazon S3 bucket. You then create a new Amazon RDS DB Instance from the backup files in Amazon S3, directly through the RDS Console or AWS Command Line Interface.

備份出來後放到 Amazon S3 上,然後就可以讓 RDS 拉進去了...

This feature is available in all AWS Commercial regions for databases using MySQL version 5.6.

目前在 commercial region 都可以用了,所以代表 GovCloud (US) 還沒 (不過一般情況也沒權限碰到)。

不過他只說 5.6,代表 5.7 還不支援嗎?反正最差的情況就是再升一次 5.6 到 5.7?

Amazon Aurora 的論文

AWS 老大介紹自家產品 Amazon Aurora 的論文:「Weekend Reading: Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases.」,論文在「Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases」這邊可以取得。

Amazon Aurora 算是用很特別的架構達到高可靠架構的需求,主要用了一堆已經很強大的底層,像是用 Amazon S3 來交換一堆資料。

不過 AWS 在論文裡的比較的事情其實並不合現實,因為現在的 MySQL 在做分散式架構時的方式其實並不一樣 (i.e. Galera Cluster),論文裡提的很多比較的項目,其實都不是其他方式會遇到的問題,所以就看看就好,畢竟是在推銷自家產品...

AWS Storage Gateway 的新功能:File Gateway

AWS Storage Gateway 推出新功能 File Gateway:「File Interface to AWS Storage Gateway」。

可以讓使用者在前面用 NFS,後面實際接的是 Amazon S3

After you set it up in your data center or in the cloud, your configured buckets will be available as NFS mount points. Your application simply reads and writes files and directories over NFS; behind the scenes, the gateway turns these operations into object-level requests on your S3 buckets, where they are accessible natively (one S3 object per file).

以前自己用 FUSE 惡搞的方式,現在變成官方支援的功能?XDDD

Amazon Redshift 可以讀 S3 裡被 KMS 加密過的資料了

清資料時發現支援了:「Amazon Redshift now supports encrypting unloaded data using Amazon S3 server-side encryption with AWS KMS keys」:

The Amazon Redshift UNLOAD command now supports Amazon S3 server-side encryption using an AWS KMS key.

這樣資料丟上 Amazon S3 時可以透過 AWS KMS 加密保存,而 Amazon Redshift 可以透過 KMS 直接拉出來,處理起來會方便不少...

不過 Amazon Athena 好像還是沒辦法?

CodeDeploy 的權限設定...

這陣子在弄 AWS CodeDeploy,調整了半天才把權限壓低到合理的範圍 (AWS 給的預設值還是有點大),記錄下來之後會比較好找...

在權限部份,AWS CodeDeploy 應該有三個權限要設定:

  • IAM user:給 CI 跑完後丟上 Amazon S3 並且呼叫 AWS CodeDeploy 佈署用的,像是 Travis CI 或是之類的服務。但如果是手動到 web console 觸發的話,這個部份就不需要了。
  • Service Role:給 AWS CodeDeploy 的服務本身用的。
  • EC2 Role:給跑在機器上 AWS CodeDeploy Agent 用的。

如同前面講的,IAM user 的部份有兩個要處理,一個是讓 CI 服務把檔案傳上 Amazon S3,另外一個是讓他有權限可以呼叫 AWS CodeDeploy 佈署新版本。

後者比較簡單,直接拉 AWSCodeDeployDeployerAccess 就可以了,前者比較麻煩一點,需要透過 Policy Generator 建出對應的權限:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1486156178000",
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::codedeploy-testbucket/*"
            ]
        }
    ]
}

第二個的 Service Role 是最簡單的,直接拉 AWSCodeDeployRole 就可以了。

第三個的 EC2 Role 只需要給存取 Amazon S3 的權限。照 AWS 的建議用 AmazonEC2RoleforAWSCodeDeploy 是會動啦,但權限給太大了,整個帳號可以存取的 S3 bucket 他都可以拉到:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "s3:GetObject",
        "s3:GetObjectVersion",
        "s3:ListObjects"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

改成自己用 Policy Generator 生,限制在 codedeploy-testbucket 上會比較好:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1486165995000",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::codedeploy-testbucket/*"
            ]
        }
    ]
}

這樣權限的部份就差不多了...

Amazon Kinesis Firehole 可以先轉換再進 S3 了...

前陣子 Amazon Kinesis Firehole 推出來的新功能,可以轉完後再寫進 Amazon S3 (或是其他地方):「Amazon Kinesis Firehose can now prepare and transform streaming data before loading it to data stores」。

文件是「Amazon Kinesis Firehose Data Transformation」這份,開頭有說明是透過 Lambda 做到的:

When you enable Firehose data transformation, Firehose buffers incoming data up to 3 MB or the buffering size you specified for the delivery stream, whichever is smaller. Firehose then invokes the specified Lambda function with each buffered batch asynchronously. The transformed data is sent from Lambda to Firehose for buffering. Transformed data is delivered to the destination when the specified buffering size or buffering interval is reached, whichever happens first.

而文件下方可以看到有些現成寫好的 Lambda 可以用,而且是還蠻常見的 case,像是 apache log 的處理,或是 syslog 的處理:

Lambda Blueprints

Firehose provides the following Lambda blueprints that you can use to create a Lambda function for data transformation.

General Firehose Processing — Contains the data transformation and status model described in the previous section. Use this blueprint for any custom transformation logic.

  • Apache Log to JSON — Parses and converts Apache log lines to JSON objects, using predefined JSON field names.
  • Apache Log to CSV — Parses and converts Apache log lines to CSV format.
  • Syslog to JSON — Parses and converts Syslog lines to JSON objects, using predefined JSON field names.
  • Syslog to CSV — Parses and converts Syslog lines to CSV format.

這樣配合 Amazon Athena 就是一包 serverless 架構了...

Archives