Mountpoint for Amazon S3 is optimized for read-heavy workloads that need high throughput. It intentionally does not implement the full POSIX specification for file systems.
The calloc() function shall allocate unused space for an array of nelem elements each of whose size in bytes is elsize. The space shall be initialized to all bits 0.
Instead, it fakes it, using virtual memory: it takes a single 4 KiB page of memory that is already full of zeros (which it keeps around for just this purpose), and maps 1 GiB / 4 KiB = 262144 copy-on-write copies of it into our process's address space. So the first time we actually write to each of those 262144 pages, then at that point the kernel has to go and find a real page of RAM, write zeros to it, and then quickly swap it in place of the "virtual" page that was there before. But this happens lazily, on a page-by-page basis.
Trigraph sequences were introduced in C89 as alternate spellings of some characters to allow the implementation of C in character sets which do not provide a sufficient number of non-alphabetic graphics
而且是強制要求實做:
Implementations are required to support these alternate spellings, even if the character set in use is ASCII, in order to allow transportation of code from systems which must use the trigraphs. AMD1 also added digraphs (see §6.4.6 and §MSE.4).
其中遇到的問題就是當年得決定 C 可以用的 charset,得考慮到很多不同機器 charset 相容性的問題:
The C89 Committee faced a serious problem in trying to define a character set for C. Not all of the character sets in general use have the right number of characters, nor do they support the graphical symbols that C users expect to see. For instance, many character sets for languages other than English resemble ASCII except that codes used for graphic characters in ASCII are instead used for alphabetic characters or diacritical marks. C relies upon a richer set of graphic characters than most other programming languages, so the representation of programs in character sets other than ASCII is a greater problem than for most other programming languages.
The solution is an internationally agreed-upon repertoire in terms of which an international representation of C can be defined. ISO has defined such a standard, ISO/IEC 646, which describes an invariant subset of ASCII.
The characters in the ASCII repertoire used by C and absent from the ISO/IEC 646 invariant repertoire are:
By the end of 2024, all mobile phones, tablets and cameras sold in the EU will have to be equipped with a USB Type-C charging port. From spring 2026, the obligation will extend to laptops.
So that's my best guess: the programmers were working with constrained resources and could optimize `asctime` tricky pointer arithmetic on the month and day-of-week, so made them 0-indexed. Day-of-month is just for displaying to the user, so is 1-indexed.
首先是在 -O3 的情況下 (也就是作者使用的參數),可以看到類似的結果:(我桌機的 CPU 是定速,沒有跑動態調整)
$ repeat 10 ./a
[-] Took: 248830 ns.
[-] Took: 249150 ns.
[-] Took: 248760 ns.
[-] Took: 248730 ns.
[-] Took: 248770 ns.
[-] Took: 248861 ns.
[-] Took: 248760 ns.
[-] Took: 253050 ns.
[-] Took: 248640 ns.
[-] Took: 249211 ns.
$ repeat 10 ./b
[-] Took: 686660 ns.
[-] Took: 696090 ns.
[-] Took: 696310 ns.
[-] Took: 694431 ns.
[-] Took: 691971 ns.
[-] Took: 697690 ns.
[-] Took: 693241 ns.
[-] Took: 692900 ns.
[-] Took: 654751 ns.
[-] Took: 679101 ns.
從版本 A 的 objdump -d -S -M intel a 可以看到作者 screenshot 內也有看的 unroll 與 SSE2 指令集:
$ repeat 10 ./a
[-] Took: 571140 ns.
[-] Took: 570280 ns.
[-] Took: 571271 ns.
[-] Took: 573971 ns.
[-] Took: 571981 ns.
[-] Took: 569650 ns.
[-] Took: 566361 ns.
[-] Took: 571600 ns.
[-] Took: 571330 ns.
[-] Took: 571030 ns.
$ repeat 10 ./b
[-] Took: 697521 ns.
[-] Took: 696961 ns.
[-] Took: 696201 ns.
[-] Took: 694921 ns.
[-] Took: 696930 ns.
[-] Took: 695001 ns.
[-] Took: 701661 ns.
[-] Took: 698100 ns.
[-] Took: 702430 ns.
[-] Took: 702641 ns.
從 objdump 可以看到版本 A 的變化,退化成一次只處理一個,但把所有的數字都用 xmmN 存放計算:
$ repeat 10 ./a
[-] Took: 1097091 ns.
[-] Took: 1092941 ns.
[-] Took: 1092501 ns.
[-] Took: 1091991 ns.
[-] Took: 1092441 ns.
[-] Took: 1093970 ns.
[-] Took: 1091341 ns.
[-] Took: 1093931 ns.
[-] Took: 1094111 ns.
[-] Took: 1092231 ns.
$ repeat 10 ./b
[-] Took: 2703282 ns.
[-] Took: 2705933 ns.
[-] Took: 2703582 ns.
[-] Took: 2702622 ns.
[-] Took: 2703043 ns.
[-] Took: 2702262 ns.
[-] Took: 2703352 ns.
[-] Took: 2703532 ns.
[-] Took: 2703112 ns.
[-] Took: 2702533 ns.
It would be nice if only one or two of the sorting methods would dominate all of the others, regardless of application or the computer being used. But in fact, each method has its own peculiar virtues. […] Thus we find that nearly all of the algorithms deserve to be remembered, since there are some applications in which they turn out to be best.
Pattern-defeating quicksort (pdqsort) is a novel sorting algorithm that combines the fast average case of randomized quicksort with the fast worst case of heapsort, while achieving linear time on inputs with certain patterns.
A common software technique to implement nearly quadruple precision using pairs of double-precision values is sometimes called double-double arithmetic.
Using pairs of IEEE double-precision values with 53-bit significands, double-double arithmetic provides operations on numbers with significands of at least[4] 2 × 53 = 106 bits (...), only slightly less precise than the 113-bit significand of IEEE binary128 quadruple precision.
To clarify: Both of these are joke posts. C doesn’t actually do either of these things in the general case - I’m (mis)using other features of the language with carefully chosen numbers to make it seem like it works.