Home » Posts tagged "duplicate"

Trac 的 DuplicateTicketSearchPlugin

DuplicateTicketSearchPluginTrac 的 plugin,在開新票時會搜尋系統內可能重複開過的票給出建議。

之前在寫 wiki 上的「Trac」條目時沒寫到,大概是最早的時候都沒裝,後來有人找出來要我裝的東西,所以印象沒那麼深刻。剛剛是在找 Trac + Elasticsearch 有沒有現成的方案可以搭,結果先看到這個...

產生的效果是這樣,在改變 summary 後會出現 (focus 從 summary 移開時):

當然就算裝了還是難免會重複開 (尤其組織夠大的時候),但算是有幫助的東西...

JSON 的 Object 裡 Key 重複的問題

tl;dr:不要亂來啦... 這是 UB (Undefined behavior) 的一種。

因為看到這則 tweet,所以去查一下 JSON 的資料:

首先是找標準是什麼。在維基百科的 JSON 條目裡提到了有兩份標準,一份是 RFC 7159,一份是 ECMA-404

Douglas Crockford originally specified the JSON format in the early 2000s; two competing standards, RFC 7159 and ECMA-404, defined it in 2013. The ECMA standard describes only the allowed syntax, whereas the RFC covers some security and interoperability considerations.

ECMA-404 裡面就真的只講語法沒講其他東西,而在 RFC 7159 內的 Object 則是有提到 (重點我就用粗體標起來了):

An object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). A name is a string. A single colon comes after each name, separating the name from the value. A single comma separates a value from a following name. The names within an object SHOULD be unique.

   object = begin-object [ member *( value-separator member ) ]
            end-object

   member = string name-separator value

An object whose names are all unique is interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings. When the names within an object are not unique, the behavior of software that receives such an object is unpredictable. Many implementations report the last name/value pair only. Other implementations report an error or fail to parse the object, and some implementations report all of the name/value pairs, including duplicates.

JSON parsing libraries have been observed to differ as to whether or not they make the ordering of object members visible to calling software. Implementations whose behavior does not depend on member ordering will be interoperable in the sense that they will not be affected by these differences.

粗體有描述唯一性,但尷尬的地方在於他用 SHOULD 而非 MUST,所以 library 理論上都要能接受。但後面提到如果不唯一時,行為無法預測 (會到 rm -rf / 嗎?XDDD 最像的應該還是 crash?),所以還是不要亂來啦...

不過如果真的會 crash 的話,應該也會因為 DoS issue 而被發 CVE,所以實務上應該是不會 crash 啦...

UPSERT

維基百科對 UPSERT 的說明:(取自「Merge (SQL)」條目)

A relational database management system uses SQL MERGE (also called upsert) statements to INSERT new records or UPDATE existing records depending on whether or not a condition matches.

MySQL 裡的兩種語法其實就是在實做這個需求:

  • REPLACE INTO ...
  • INSERT INTO ... ON DUPLICATE KEY UPDATE ...

而前者其實是後者的一個特例 (當 INSERT 發現有 dupe key 時把現有的 record 改成與 INSERT 時相同的條件)。

而計數器是後者常見的 case 之一:當 record 不存在的時候塞一筆進去,並且將 counter 設為 1;當 record 存在的時候對 counter 加一更新。像是這樣的 SQL query:

INSERT INTO my_table SET id = ?, num = 1 ON DUPLICATE KEY UPDATE num = num + 1;

由於這是常見的需求,使得這個語法是目前少數 MySQL 比 PostgreSQL 好用的地方。

在「A Case for Upserts」這篇就看到抱怨 PostgreSQL 不實做這個功能...

不過我覺得作者寫得有點誇張,INSERT INTO ... ON DUPLICATE KEY UPDATE ... 應該是可以模擬出來的功能:當 INSERT 失敗後再跑 UPDATE。而 REPLACE INTO ... 是特例,也就當然可以模擬出來。

Archives