Amazon Redshift 壓縮率的改善

Amazon Redshift 對壓縮率的改善:「Data Compression Improvements in Amazon Redshift Bring Compression Ratios Up to 4x」。

首先是引入了 Zstandard

First, we added support for the Zstandard compression algorithm, which offers a good balance between a high compression ratio and speed in build 1.0.1172. When applied to raw data in the standard TPC-DS, 3 TB benchmark, Zstandard achieves 65% reduction in disk space. Zstandard is broadly applicable.

然後是自動選擇壓縮,對於之前沒有設定壓縮參數的人,會直接有改善:

Second, we’ve improved the automation of compression on tables created by the CREATE TABLE AS, CREATE TABLE or ALTER TABLE ADD COLUMN commands. Starting with Build 1.0.1161, Amazon Redshift automatically chooses a default compression for the columns created by those commands. Automated compression happens when we estimate that we can reduce disk space without degrading query performance. Our customers have seen up to 40% reduction in disk space.

再來是改善資料結構:

Third, we’ve been optimizing our internal on-disk data structures. Our preview customers averaged a 7% reduction in disk space usage with this improvement. This feature is delivered starting with Build 1.0.1271.

最後是提供更好的分析判斷:

Finally, we have enhanced the ANALYZE COMPRESSION command to estimate disk space reduction.

不過其他幾個產品線的使用方式更成熟 (像是 Amazon Athena 這類產品),不知道會不會讓 Amazon Redshift 慢慢退出第一線...

innodb_file_per_table 對於 CREATE TABLE 與 DROP TABLE 的速度

雖然平常應該不會常常用到 CREATE TABLEDROP TABLE,不過還是很有趣的 benchmark:「Is MySQL’s innodb_file_per_table slowing you down?」。

重點在這段:

  • With innodb_file_per_table=ON
    • Schema and table creation = 1m54.852s
    • Schema drops = 1m21.682s
  • With innodb_file_per_table=OFF
  • Schema and table creation = 0m59.968s
  • Schema drops = 0m54.870s

不過作者測試時沒有用 ENGINE=COMPRESSED (必須在 innodb_file_per_table 打開時才支援,而且這也是選擇打開 innodb_file_per_table 的重要因素),不知道壓縮開起來以後會差多少...

不過就算再怎麼慢,相較於 CREATE TABLEDROP TABLE 的效能,還是比較計較壓縮換來的 I/O 效能。(尤其是資料量超過記憶體大小時)