用 PageRank 跑 arXiv 上面 CS paper 的排名

在「Ask HN: AI/ML papers to catch up with current state of AI?」這邊看到的,本來只是在討論有哪些 AI/ML paper 可以看,結果在 id=38654200 這邊看到這個網站,上面的資料是每天更新一次:

https://trendingpapers.com/

This tool can help you find what's new & relevant to read. It's updated every day (based on ArXiv).

You can filter by category (Computer Vision, Machine Learning, NLP, etc), by release date, but most importantly, you can rank by PageRank (proxy of influence/readership), PageRank growth (to see the fastest growing papers in terms of influence), total # of citations, etc...

依照「Frequently Asked Questions」的說明,是用 PageRankarXiv 上面的 paper,主要是 CS 為主。

難得看到 PageRank 出現而且是用在 paper citation 上面...

引用自己論文的問題...

Nature 上點出來期刊論文裡自我引用的問題 (這邊的自我引用包括了合作過的人):「Hundreds of extreme self-citing scientists revealed in new database」。

開頭舉了一個極端的例子,Vaidyanathan 的自我引用比率高達 94%,而學界的中位數是 12.7%,感覺是有某種制度造成的行為?

Vaidyanathan, a computer scientist at the Vel Tech R&D Institute of Technology, a privately run institute, is an extreme example: he has received 94% of his citations from himself or his co-authors up to 2017, according to a study in PLoS Biology this month. He is not alone. The data set, which lists around 100,000 researchers, shows that at least 250 scientists have amassed more than 50% of their citations from themselves or their co-authors, while the median self-citation rate is 12.7%.

會想要提是因為想到當年 Google 的經典演算法 PageRank,就是在處理這個問題... 把 paper 換成 webpage 而已。