5.1 Using a 64 Bit Hash Function

To fulfill the requirement of being able to estimate multisets of cardinalities beyond 1 billion, we use a 64 bit hash function.

It would be possible to introduce a similar correction if the cardinality
approaches 264, but it seems unlikely that such cardinalities are encountered in practice.

5.2 Estimating Small Cardinalities

5.3 Sparse Representation

Example: Actual answer is within 5 ± 1 with prob ≥ 0.9

## Redis 對 HyperLogLog 省空間的實作

HyperLogLog (HLL) 是用統計方式解決 Count-distinct problem 的資料結構以及演算法，不要求完全正確，而是大概的數量。

The magic of this algorithm is that you no longer need to use an amount of memory proportional to the number of items counted, and instead can use a constant amount of memory; 12k bytes in the worst case, or a lot less if your HyperLogLog (We'll just call them HLL from now) has seen very few elements.

The use of 16384 6-bit registers for a great level of accuracy, using a total of 12k per key.

In the example the sparse representation used just 7 bytes instead of 12k in order to represent the HLL registers. In general for low cardinality there is a big win in terms of space efficiency, traded with CPU time since the sparse representation is slower to access.

## tf-idf 與 BM25

tf–idfBM25 是兩個在資訊檢索 (IR) 裡面的經典演算法，也常被用在搜尋引擎技術上。

btw，這邊 tf-idf 與 BM25 的公式就請大家自己去維基百科上翻了...

tf-idf 概念上很簡單，而也沒有什麼 magic number 在公式裡面。

## FreeBSD 14.0 釋出

FreeBSD 14.0-RELEASE 的公告也出來了：「FreeBSD 14.0-RELEASE Announcement」，比較完整的 release notes 在「FreeBSD 14.0-RELEASE Release Notes」。

FreeBSD supports up to 1024 cores on the amd64 and arm64 platforms.

ARM 的話好像也可以堆，但不熟...

The default congestion control mechanism for TCP is now CUBIC.

FreeBSD 15.0 is not expected to include support for 32-bit platforms other than armv7. The armv6, i386, and powerpc platforms are deprecated and will be removed. 64-bit systems will still be able to run older 32-bit binaries.

The chroot facility supports unprivileged operation, and the chroot(8) program has a -n option to enable its use. a40cf4175c90 (Sponsored by EPSRC)

The use of FIDO/U2F hardware authenticators has been enabled in ssh, using the new public key types ecdsa-sk and ed25519-sk, along with corresponding certificate types. FIDO/U2F support is described in https://www.openssh.com/txt/release-8.2. e9a994639b2a (Sponsored by The FreeBSD Foundation)

ASLR 預設開啟：

Address Space Layout Randomization (ASLR) is enabled for 64-bit executables by default. It can be disabled as needed if applications fail unexpectedly, for example with segmentation faults. To disable for a single invocation, use the proccontrol(1) command: proccontrol -m aslr -s disable command. To disable ASLR for all invocations of a binary, use the elfctl(1) command: elfctl -e +noaslr file. Problems should be reported via the problem reporting system, https://bugs.freebsd.org, or posting to the freebsd-stable@FreeBSD.org mailing list. b014e0f15bc7 (Sponsored by Stormshield)

The kernel wg(4) WireGuard driver has been reintegrated; it provides Virtual Private Network (VPN) interfaces using the WireGuard protocol. 744bfb213144 (Sponsored by Rubicon Communications, LLC ("Netgate") and The FreeBSD Foundation)

The default shell for the root user is now sh(1), which has many new features for interactive use. d410b585b6f0

The default mail transport agent (MTA) is now the Dragonfly Mail Agent (dma(8)) rather than sendmail(8). Configuration of the MTA is done via mailer.conf(5). sendmail(8) and its configuration remain available. a67b925ff3e5

The portsnap(8) utility has been removed. Users are encouraged to fetch the ports tree by using pkg install git and then git clone https://git.FreeBSD.org/ports.git /usr/ports. df53ae0fdd98

mergemaster(8) has been deprecated. Its replacement is etcupdate(8). 398b12691b4f (Sponsored by The FreeBSD Foundation)

The tarfs(5) file system has been added, which is backed by POSIX tar archives optionally compressed with zstd(1). 69d94f4c7608 (Sponsored by Juniper Networks, Inc.) (Sponsored by Klara, Inc.)

## NIST P-curve 的 Seed Bounty Program

Filippo Valsorda 發起了 seed bounty program，針對 NIST P-curve 裡 seed 的部分尋找 SHA-1 的 pre-image：「Announcing the \$12k NIST Elliptic Curves Seeds Bounty」。

```3045AE6FC8422F64ED579528D38120EAE12196D5
BD71344799D5C7FCDC45B59FA3B9AB8F6A948BC5
C49D360886E704936A6678E1139D26B7819F7E90
A335926AA319A27A1D00896A6773A4827ACDAC73
D09E8800291CB85396CC6717393284AAA0DA64BA```

$y^2 = x^3 + ax + b (Weierstrass form)$ $y^2 = x^3 + ax^2 + bx (Montgomery form)$

$y^2 = x^3 + 486662x^2 + x$

To protect against various attacks discussed in Section 3, I rejected choices of A whose curve and twist orders were not {4 · prime, 8 · prime}; here 4, 8 are minimal since p ∈ 1+4Z. The smallest positive choices for A are 358990, 464586, and 486662. I rejected A = 358990 because one of its primes is slightly smaller than 2^252, raising the question of how standards and implementations should handle the theoretical possibility of a user’s secret key matching the prime; discussing this question is more difficult than switching to another A. I rejected 464586 for the same reason. So I ended up with A = 486662.

```3045AE6FC8422F64ED579528D38120EAE12196D5 # NIST P-192, ANSI prime192v1
BD71344799D5C7FCDC45B59FA3B9AB8F6A948BC5 # NIST P-224
C49D360886E704936A6678E1139D26B7819F7E90 # NIST P-256, ANSI prime256v1
A335926AA319A27A1D00896A6773A4827ACDAC73 # NIST P-384
D09E8800291CB85396CC6717393284AAA0DA64BA # NIST P-521```

Apparently, they were provided by the NSA, and generated by Jerry Solinas in 1997. He allegedly generated them by hashing, presumably with SHA-1, some English sentences that he later forgot.

[Jerry] told me that he used a seed that was something like:
SEED = SHA1("Jerry deserves a raise.")
After he did the work, his machine was replaced or upgraded, and the actual phrase that he used was lost. When the controversy first came up, Jerry tried every phrase that he could think of that was similar to this, but none matched.

## Tor 的 Onion 導入防禦機制，在遭受 DoS 的時候要求用戶端執行 PoW 任務

Tor's PoW defense is a dynamic and reactive mechanism, remaining dormant under normal use conditions to ensure a seamless user experience, but when an onion service is under stress, the mechanism will prompt incoming client connections to perform a number of successively more complex operations. The onion service will then prioritize these connections based on the effort level demonstrated by the client.

The inherent design of onion services, which prioritizes user privacy by obfuscating IP addresses, has made it vulnerable to DoS attacks and traditional IP-based rate limits have been imperfect protections in these scenarios. In need of alternative solutions, we devised a proof-of-work mechanism involving a client puzzle to thwart DoS attacks without compromising user privacy.

For our proof-of-work function we will use the Equi-X scheme by tevador [REF_EQUIX].

## SHA-256 的 Length extension attack

Hacker News 上看到「Breaking SHA256: length extension attacks in practice (kerkour.com)」，在講不當使用 SHA-256 會導致 Length extension attack 類的安全漏洞，主要是因為 MD5SHA-1 以及 SHA-2 類的 hash function 最後生出 hash 值時會暴露出 hash function 的內部狀態而導致的問題。

```// Process the message in successive 512-bit chunks:
for each 512-bit chunk of padded message do
// ...

// Add this chunk's hash to result so far:
a0 := a0 + A
b0 := b0 + B
c0 := c0 + C
d0 := d0 + D
end for

var char digest[16] := a0 append b0 append c0 append d0 // (Output is in little-endian)```

```Original Data: count=10&lat=37.351&user_id=1&long=-119.827&waffle=eggo
Original Signature: 6d5f807e23db210bc254a28be2d6759a0f5f5d99```

`Desired New Data: count=10&lat=37.351&user_id=1&long=-119.827&waffle=eggo&waffle=liege`

```New Data: count=10&lat=37.351&user_id=1&long=-119.827&waffle=eggo\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x28&waffle=liege
New Signature: 0e41270260895979317fff3898ab85668953aaa2```

## GPT 的進程 (或是 LLM 的進程)

GPTGPT-2GPT-3 這三個 OpenAI 的作品開始講，然後提到 GPT-3 帶出來的新紀元。

## 分散式系統的 clock

Author here. Pleasantly surprised to see the article here.

Some context behind the article. I studied CRDTs for a few months, and noticed that different CRDT designs use logical clocks in different and clever ways. And I haven't seen anyone narrate all those ways of use in one article. My attempt with this article was to dredge up those flavors of logical clocks into one article and give them names for future reference.

(To respond to a couple of other comments, I ignored atomic (and gps-based) clocks in this discussion, as indicated in my footnote 3).