Home » Posts tagged "size"

MySQL 8.0 的 innodb_dedicated_server

Percona 介紹了 MySQL 8.0 將會推出的 innodb_dedicated_server 參數:「New MySQL 8.0 innodb_dedicated_server Variable Optimizes InnoDB from the Get-Go」,Oracle 官方的文件在「15.6.13 Enabling Automatic Configuration for a Dedicated MySQL Server」這邊可以翻到。

這是針對整台機器完全給 MySQL 用的情況所設計的參數。在這種情況下,可以透過 RAM 的大小以及一些簡單的公式,得到還算堪用的系統參數...

依照說明,可以看到系統會依照記憶體的大小自動計算出 innodb_buffer_pool_sizeinnodb_log_file_size 這兩個參數,並且把 innodb_flush_method 設為 O_DIRECT_NO_FSYNC (如果所在平台有支援這個值)。

不過看了一下公式,依照經驗可以設的更積極一點... 像是 Percona 文章裡提到的,當記憶體夠大時,其實可以考慮從 80% 開始調整大小 (innodb_buffer_pool_size):

For InnoDB buffer pool size (based on this article), consider allocating 80% of physical RAM for starters. You can increase it to as large as needed and possible, as long as the system doesn’t swap on the production workload.

innodb_log_file_size 則應該要分析寫入的 pattern 而不是直接看 RAM 大小。有些機器雖然很大台但幾乎沒有寫入的量,照著公式的值就偏大很多:

For InnoDB log file size, it should be able to handle one hour of writes to allow InnoDB to optimize writing the redo log to disk. You can calculate an estimate by following the steps here, which samples one minute worth of writes to the redo log. You could also get a better estimate from hourly log file usage with Percona Monitoring and Management (PMM) graphs.

不過基本上 tune 出來的值還算堪用,對於剛入手的人頗有幫助。

Ubuntu 18.04 LTS Minimal Image 的大小

看到「RFC: Ubuntu 18.04 LTS Minimal Images」這篇,在蒐集將來要出的 Ubuntu 18.04 LTS Minimal Image 的意見...

The Ubuntu Minimal Image is the smallest base upon which a user can apt install any package in the Ubuntu archive.

雖然應該還會有改變,不過以目前的版本來看,可以看出壓縮前後兩種版本都比 16.04 小了不少:

對需要這些 image 的人來說 (像是當作 Docker 的 base image),小一點操作起來也比較開心...

Working Set Size (WSS) 的想法

NetflixBrendan Gregg (他比較知名的發明是 Flame Graph) 寫了一篇「How To Measure the Working Set Size on Linux」,他想要量測單位時間內會用到的記憶體區塊大小:

The Working Set Size (WSS) is how much memory an application needs to keep working. Your app may have populated 100 Gbytes of main memory, but only uses 50 Mbytes each second to do its job. That's the working set size. It is used for capacity planning and scalability analysis.

這可以拿來分析這些應用程式是否能夠利用 L1/L2/L3 cache 大幅增加執行速度,於是就可以做成圖,像是這樣:

在 Netflix 這樣人數的公司,需要設計一些有用的指標,另外發展出對應的工具,讓其他人更容易迅速掌握狀況,畢竟不是每個人都有上天下海的能力,遇到狀況可以馬上有頭緒進行 trouble shooting...

基於 RNN 的無損壓縮

Hacker News 上看到「DeepZip: Lossless Compression using Recurrent Networks」這篇論文,利用 RNN 幫助壓縮技術壓的更小,而程式碼在 GitHubkedartatwawadi/NN_compression 上有公開讓大家可以測試。

裡面有個比較特別的是,Lagged Fibonacci PRNG 產生出來的資料居然有很好的壓縮率,這在傳統的壓縮方式應該都是幾乎沒有壓縮率...

整體的壓縮率都還不錯,不過比較的對象只有 gzip,沒有拿比較先進的壓縮軟體進行比較) 像是 xz 之類的),看數字猜測在一般的情況下應該不會贏太多,不過光是 PRNG 那部份,這篇論文等於是給了一個不同的方向讓大家玩...

Amazon RDS 支援更大的硬碟空間與更多的 IOPS

Amazon RDS 的升級:「Amazon RDS Now Supports Database Storage Size up to 16TB and Faster Scaling for MySQL, MariaDB, Oracle, and PostgreSQL Engines」。

空間上限從 6TB 變成 16TB,而且可以無痛升。另外 IOPS 上限從 30K 變成 40K:

Starting today, you can create Amazon RDS database instances for MySQL, MariaDB, Oracle, and PostgreSQL database engines with up to 16TB of storage. Existing database instances can also be scaled up to 16TB storage without any downtime.

The new storage limit is an increase from 6TB and is supported for Provisioned IOPS and General Purpose SSD storage types. You can also provision up to 40,000 IOPS for Provisioned IOPS storage volumes, an increase from 30,000 IOPS.

不過隔壁的 Amazon Aurora 還是大很多啊 (64TB),而且實際上不用管劃多大,他會自己長大:

Q: What are the minimum and maximum storage limits of an Amazon Aurora database?

The minimum storage is 10GB. Based on your database usage, your Amazon Aurora storage will automatically grow, up to 64 TB, in 10GB increments with no impact to database performance. There is no need to provision storage in advance.

Microsoft 與 GitHub 合作,將會把 GVFS 移植到 Linux 與 Mac 上

MicrosoftGitHub 合作將本來只有在 Windows 上可以用的 GVFS 移植到 LinuxMac 上:「Microsoft and GitHub team up to take Git virtual file system to macOS, Linux」。

GVFS 是解決微軟內部自己在用 Git 的痛處,因為微軟的 repository 都... 有... 點... 肥... (畢竟有不少產品發展了很久)。

目前 Git 的操作是卡在 I/O 與 memory cache 的限制上:

Also, Git wasn't designed for a codebase that was so large, either in terms of the number of files and version history for each file, or in terms of sheer size, coming in at more than 300GB. When using standard Git, working with the source repository was unacceptably slow. Common operations (such as checking which files have been modified) would take multiple minutes.

GVFS 的想法是有用到的部份再真的去拉,藉此大幅減少 I/O 需求...

Twitter Moments 很大?

看到「How big is Twitter Moments?」這篇,在談 Twitter Moments

依照推算,Twitter Moments 的使用量應該比全世界任何一個媒體都大,但你會發現實際上沒有音量。沒有人談論他,沒有人引用他... 但估算起來他應該是超級大的產品?

有種「到底怎麼樣才算是一個成功的產品」的反思。

PS4 下載速度很慢的原因

在「Why PS4 downloads are so slow」這篇作者花了不少力氣找出原因,發現 PS4 下載速度很慢是故意的... 另外討論了在什麼情況下會變慢,以及要怎麼避免的方式。

懶得看的人可以直接看 Conculsions 那段,主要的原因是 PS4 會因為背景程式而調整 TCP window size (就算背景程式在 idle 也會影響到下載的 TCP window size),進而影響速度:

If any applications are running, the PS4 appears to change the settings for PSN store downloads, artificially restricting their speed. Closing the other applications will remove the limit.

用 TCP window size 來調整速度也算是頗有「創意」的方法...

Anyway,遇到時的解決方法就是把所有在跑的程式都完整關掉,再下載就會正常多了...

InnoDB redo log 大小對效能的影響

在「Benchmark(et)ing with InnoDB redo log size」這邊看到在討論 InnoDB redo log 的大小對效能的影響 (也就是 innodb_log_file_sizeinnodb_log_files_in_group)。

開頭就有先提到重點,在新版 MySQL 裡,幾乎所有的情況比較大的 redo log 有比較好的效能 (平均值):

tl;dr - conclusions specific to my test

  1. A larger redo log improves throughput
  2. A larger redo log helps more with slower storage than with faster storage because page writeback is more of a bottleneck with slower storage and a larger redo log reduces writeback.
  3. A larger redo log can help more when the working set is cached because there are no stalls from storage reads and storage writes are more likely to be a bottleneck.
  4. InnoDB in MySQL 5.7.17 is much faster than 5.6.35 in all cases except IO-bound + fast SSD

可以看出來平均效能的提昇很顯著,不管是增加 redo log 大小還是升級到 5.7:

但作者也遇到了奇怪的效能問題。雖然平均效能提昇得很顯著,但隨著加入資料的增加,效能的 degradation 其實很嚴重,在原來的網頁上可以看到這些資訊。

The results above show average throughput and that hides a lot of interesting behavior. We expect throughput over time to not suffer from variance -- for both InnoDB and for MyRocks. For many of the results below there is a lot of variance (jitter).

所以也許現階段先加大就好 (至少寫入的效能會提昇),不需要把這個特性當作升級 MySQL 的理由。

Archives