With GCC 6.3.0 on an i7-6700, my decoder is about 20% faster than the DFA decoder in the benchmark. With Clang 3.8.1 it’s just 1% faster.
而後來的更新則是大幅改善,在 Clang 上 DFA 版本比 branchless 的快:
Update: Björn pointed out that his site includes a faster variant of his DFA decoder. It is only 10% slower than the branchless decoder with GCC, and it’s 20% faster than the branchless decoder with Clang. So, in a sense, it’s still faster on average, even on a benchmark that favors a branchless decoder.
所以作者最後也有說這是個嘗試而已 XD:
It’s just a different approach. In practice I’d prefer Björn’s DFA decoder.