Home » Posts tagged "duplicate"

JSON 的 Object 裡 Key 重複的問題

tl;dr:不要亂來啦... 這是 UB (Undefined behavior) 的一種。

因為看到這則 tweet,所以去查一下 JSON 的資料:

首先是找標準是什麼。在維基百科的 JSON 條目裡提到了有兩份標準,一份是 RFC 7159,一份是 ECMA-404

Douglas Crockford originally specified the JSON format in the early 2000s; two competing standards, RFC 7159 and ECMA-404, defined it in 2013. The ECMA standard describes only the allowed syntax, whereas the RFC covers some security and interoperability considerations.

ECMA-404 裡面就真的只講語法沒講其他東西,而在 RFC 7159 內的 Object 則是有提到 (重點我就用粗體標起來了):

An object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). A name is a string. A single colon comes after each name, separating the name from the value. A single comma separates a value from a following name. The names within an object SHOULD be unique.

   object = begin-object [ member *( value-separator member ) ]
            end-object

   member = string name-separator value

An object whose names are all unique is interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings. When the names within an object are not unique, the behavior of software that receives such an object is unpredictable. Many implementations report the last name/value pair only. Other implementations report an error or fail to parse the object, and some implementations report all of the name/value pairs, including duplicates.

JSON parsing libraries have been observed to differ as to whether or not they make the ordering of object members visible to calling software. Implementations whose behavior does not depend on member ordering will be interoperable in the sense that they will not be affected by these differences.

粗體有描述唯一性,但尷尬的地方在於他用 SHOULD 而非 MUST,所以 library 理論上都要能接受。但後面提到如果不唯一時,行為無法預測 (會到 rm -rf / 嗎?XDDD 最像的應該還是 crash?),所以還是不要亂來啦...

不過如果真的會 crash 的話,應該也會因為 DoS issue 而被發 CVE,所以實務上應該是不會 crash 啦...

UPSERT

維基百科對 UPSERT 的說明:(取自「Merge (SQL)」條目)

A relational database management system uses SQL MERGE (also called upsert) statements to INSERT new records or UPDATE existing records depending on whether or not a condition matches.

MySQL 裡的兩種語法其實就是在實做這個需求:

  • REPLACE INTO ...
  • INSERT INTO ... ON DUPLICATE KEY UPDATE ...

而前者其實是後者的一個特例 (當 INSERT 發現有 dupe key 時把現有的 record 改成與 INSERT 時相同的條件)。

而計數器是後者常見的 case 之一:當 record 不存在的時候塞一筆進去,並且將 counter 設為 1;當 record 存在的時候對 counter 加一更新。像是這樣的 SQL query:

INSERT INTO my_table SET id = ?, num = 1 ON DUPLICATE KEY UPDATE num = num + 1;

由於這是常見的需求,使得這個語法是目前少數 MySQL 比 PostgreSQL 好用的地方。

在「A Case for Upserts」這篇就看到抱怨 PostgreSQL 不實做這個功能...

不過我覺得作者寫得有點誇張,INSERT INTO ... ON DUPLICATE KEY UPDATE ... 應該是可以模擬出來的功能:當 INSERT 失敗後再跑 UPDATE。而 REPLACE INTO ... 是特例,也就當然可以模擬出來。

Archives