Hacker News 上看到「Neural networks in the 1990s (twitter.com/id_aa_carmack)」這篇,原推在:
It is interesting how many old papers used neural networks with only a dozen or so units. Computers weren’t THAT slow in the 90s — BLAS (basic linear algebra subprograms) was already a thing that vendors hyper-optimized for. Not much overlap between HPC and NN people?…
— John Carmack (@ID_AA_Carmack) June 18, 2023
在 Hacker News 上的 rm999 有提到當時的結果,可以解釋為什麼在 1990 年代時類神經網路沒有興起的關係:
A lot of the problems that did benefit from neural networks in the 90s/early 2000s just needed a non-linear model, but did not need huge neural networks to do well. You can very roughly consider the first layer of a 2-layer neural network to be a series of classifiers, each tackling a different aspect of the problem (e.g. the first neuron of a spam model may activate if you have never received an email from the sender, the second if the sender is tagged as spam a lot, etc). These kinds of problems didn't need deep, large networks, and 10-50 neuron 2-layer networks were often more than enough to fully capture the complexity of the problem. Nowadays many practitioners would throw a GBM at problems like that and can get away with O(100) shallow trees, which isn't very different from what the small neural networks were doing back then.
1990 年代時的主題還是比較簡單的題目,像是分 category 這類題目 (一個常見的應用是 spam filter),而這些題目在傳統方式與類神經網路的差異並不大。
直到後來 GPU 運算技術的成熟,而且從 2010 年有 cloud 的概念以後,一般單位可以不用花大錢自己建整套超級電腦,只需要花一些 OPEX 就可以生出小型的超級電腦 (短時間),這讓不少單位都可以有夠大的計算力計算大型 model (相較於以前的大小),也才看得出來大型 model 用來解更複雜問題的威力。
而 2014 年的 AlphaGo 算是一個類神經網路對一般人衝擊的成功案例 (i.e. 跨出圈子),這也讓投資人對人工智慧的主題更願意投資。