繼續清 tab,在「PostgreSQL and UUID as Primary Key (maciejwalkowiak.com)」這邊看到的,原文是討論 PostgreSQL 要怎麼處理 PK 是 UUID 的情況:「PostgreSQL and UUID as primary key」。
文章開頭作者就說了,這篇不是要戰 PK 要不要用 UUID,而是已經決定要用了 (i.e. 通常不是你決定的),在接手以後要怎麼用比較好:
Considering the size of UUID it is questionable if it is a right choice, but often it is not up to us to decide.
This article does not focus on "if UUID is the right format for a key", but how to use UUID as a primary key with PostgreSQL efficiently.
首先是 PostgreSQL 從 8.3 版 (2008 年) 就支援 UUID
的資料型態了,這點從 release note 可以看到:「PostgreSQL 8.3.0」,所以要當 PK 的話沒什麼道理不考慮他。
用 UUID
的空間上就是 128-bit data (16 bytes),相比於 TEXT
會省蠻多的,尤其 PK 常常會被其他表格 reference 到 (像是 foreign key) 會在其他表格也省下來:
Table that uses text is 54% larger and the index size 85% larger.
第二點則是考慮到 UUID 本身的特性,以前的 UUID 因為是亂數生成的 (通常會用 UUIDv4),對寫入 B-tree 類的資料結構不是很有效率,改用 UUIDv7 (差不多是這兩年陸陸續續發展出來的規格) 會得益於與 timestamp 有關,對 B-tree 寫入的效率會好很多:
Random UUIDs are not a good fit for a B-tree indexes - and B-tree index is the only available index type for a primary key.
B-tree indexes work the best with ordered values - like auto-incremented or time sorted columns.
UUID - even though always looks similar - comes in multiple variants. Java's UUID.randomUUID() - returns UUID v4 - which is a pseudo-random value. For us the more interesting one is UUID v7 - which produces time-sorted values. It means that each time new UUID v7 is generated, a greater value it has. And that makes it a good fit for B-Tree index.
作者的測試可以看到寫入速度與 UUIDv4 相比比快不少:
BUT we can clearly see, that inserting UUID v7 is ~2x faster and inserting regular UUID v4.
總結來說,當 PK 已經決定是 UUID 後,主要就是這兩個重點可以注意的,當然 Hacker News 上更熱鬧的是兩派人馬在吵要用 integer 類的 SERIAL
/BIGSERIAL
還是用 UUID
,那又是另外一個話題了...
如果早個二十年前對 memory size 斤斤計較的情況下,答案鐵定是 integer 類的,但年代不同了...?