看到「When setting an environment variable gives you a 40x speedup」這篇在講 ls 的速度。
文章是由 Stanford 的 Sherlock 發出來的,不過看起來跟電視劇沒關係,從網站上的標語「The HPC cluster for all your computing needs」可以看出是 HPC 相關的單位。
在 HPC 環境裡面可以預期單一目錄裡會有很多檔案,所以使用者跑來抱怨 ls 的速度就不算太意外了。不過這次使用者有提到在他自己的 laptop 上跑 ls 反而很快:
It all started from a support question, from a user reporting a usability problem with ls
taking several minutes to list the contents of a 15,000+ entries directory on $SCRATCH
.
Having thousands of files in a single directory is usually not very file system-friendly, and definitely not recommended. The user knew this already and admitted that wasn’t great, but when he mentioned his laptop was 1,000x faster than Sherlock to list this directory’s contents, of course, it stung. So we looked deeper.
直接跳到後面的結論... 原因是出自於因為需要顯示不同顏色,而需要透過 lstat()
查詢額外的檔案性質 (可執行、setuid 以及 setgid 這些資料),導致速度變慢:
From 13s with the default settings, to 0.3s with a small LS_COLORS tweak, that’s a 40x speedup right there, for the cheap price of not having setuid/setgid or executable files colorized differently.
Of course, this is now setup on Sherlock, for every user’s benefit.
透過設定 LS_COLORS='ex=00:su=00:sg=00:ca=00:'
,可以讓 lstat()
消失,所以被放進 Sherlock 的預設值了... 而沒有遇到這個問題的環境 (像是有設計好對應的目錄結構),或是想要維持原來的樣子的人,則可以 unset 掉這個值讓輸出還是有色彩差異 :o