前陣子 Amazon Kinesis Firehole 推出來的新功能,可以轉完後再寫進 Amazon S3 (或是其他地方):「Amazon Kinesis Firehose can now prepare and transform streaming data before loading it to data stores」。
文件是「Amazon Kinesis Firehose Data Transformation」這份,開頭有說明是透過 Lambda 做到的:
When you enable Firehose data transformation, Firehose buffers incoming data up to 3 MB or the buffering size you specified for the delivery stream, whichever is smaller. Firehose then invokes the specified Lambda function with each buffered batch asynchronously. The transformed data is sent from Lambda to Firehose for buffering. Transformed data is delivered to the destination when the specified buffering size or buffering interval is reached, whichever happens first.
而文件下方可以看到有些現成寫好的 Lambda 可以用,而且是還蠻常見的 case,像是 apache log 的處理,或是 syslog 的處理:
Lambda Blueprints
Firehose provides the following Lambda blueprints that you can use to create a Lambda function for data transformation.
General Firehose Processing — Contains the data transformation and status model described in the previous section. Use this blueprint for any custom transformation logic.
- Apache Log to JSON — Parses and converts Apache log lines to JSON objects, using predefined JSON field names.
- Apache Log to CSV — Parses and converts Apache log lines to CSV format.
- Syslog to JSON — Parses and converts Syslog lines to JSON objects, using predefined JSON field names.
- Syslog to CSV — Parses and converts Syslog lines to CSV format.
這樣配合 Amazon Athena 就是一包 serverless 架構了...