A new Bloom filter alternative is available as a drop-in replacement (since version 6.15.0), saving about 30% of Bloom filter space (most importantly, memory) but using about 3-4x as much CPU on filters. Most of the additional CPU time is in the background jobs constructing the filters, and this is usually a good trade because it is common for SST filters to use ~10% of system RAM and well under 1% of CPU.
Efficient bracket pair colorization was a fun challenge. With the new data structures, we can also solve other problems related to bracket pairs more efficiently, such as general bracket matching or showing colored line scopes.
EdgeWorkers lets developers just code — integrating into existing CI/CD workflows and enabling multiple teams to work in parallel using JavaScript. EdgeWorkers eliminates the hassle of managing compute resources and building for scale.
如同名字所說的,架構上 Key-Value 架構,放棄了 CAP theorem 裡面的 C,改走 Eventual Consistency:
EdgeKV uses what is known in distributing computing as an eventual consistency model to perform writes and updates. This model achieves high availability with low read latency by propagating data writes globally. The period of time it takes the system to distribute data globally is called the “inconsistency window”.
KV achieves this performance by being eventually-consistent. Changes are immediately visible in the edge location at which they're made, but may take up to 60 seconds to propagate to all other edge locations.
丟進 array 是 OK 的,但問題在於他需要判斷 entry 是否重複,卻沒有用 hash 或是 tree 的結構,而這邊大約有 63k 筆資料,用 array 實做就產生了 O(n^2) 的演算法:
But before it’s stored? It checks the entire array, one by one, comparing the hash of the item to see if it’s in the list or not. With ~63k entries that’s (n^2+n)/2 = (63000^2+63000)/2 = 1984531500 checks if my math is right. Most of them useless. You have unique hashes why not use a hash map.
if it’s called again within the string’s range, return cached value
而第二個問題他直接把檢查是否有重複的跳過,因為資料本身不重複:
And as for the hash-array problem, it’s more straightforward - just skip the duplicate checks entirely and insert the items directly since we know the values are unique.
I found this while making a collection of what C implementation does what at https://news.ycombinator.com/item?id=26298300.
There are two basic implementation strategies. The BSD (FreeBSD and OpenBSD and more than likely NetBSD too), Microsoft, GNU, and MUSL C libraries use one, and suffer from this; whereas the OpenWatcom, P.J. Plauger, Tru64 Unix, and my standard C libraries use another, and do not.
The 2002 report in the comp.lang.c Usenet newsgroup (listed in that discussion) is the earliest that I've found so far.
Deletions are generally unsafe with these filters even in principle because they track hash values and cannot deal with collisions without access to the object data: if you have two objects mapping to the same hash value, and you have a filter on hash values, it is going to be difficult to delete one without the other.