[–]edgan 1 point 2 months ago It could br better, but it is far better than than Azure and GCP. Azure's old one was better than their new beta interface last I saw it. GCP has some interesting ideas, but the side bar centric design doesn't function well. It also tries to do too much, and is too JavaScript-y happy.
After analyzing millions of posts containing mainstream and alternative news shared on Twitter, Reddit and 4chan, Jeremy Blackburn, Ph.D., and collaborators found that alt-right communities within 4chan, an image-based discussion forum where users are anonymous, and Reddit, a social news aggregator where users vote up or down on posts, have a surprisingly large influence on Twitter.
"Based on our findings, these smaller, fringe communities on Reddit and 4chan serve as an incubation chamber for a lot of information," said Blackburn, assistant professor of computer science in the UAB College of Arts and Sciences. "The content and talking points are refined until they finally break free and make it to larger, more mainstream communities."
# build the static files and put them on the static server
`make -C /home/reddit/reddit static`
`rsync /home/reddit/reddit/static public:/var/www/`
# iterate through the app servers and update their copy
# of the code, restarting once done.
foreach $h (@hostlist) {
`git push $h:/home/reddit/reddit master`
`ssh $h make -C /home/reddit/reddit`
`ssh $h /bin/restart-reddit.sh`
}
The process for actually doing the deploy looked the same, but now the system did the work for you and told everyone what you were doing.
另外值得一提的是,因為他們不是自己架 IRC server 而是用外面第三方的伺服器,所以他們決定 IRC 只有單向告知的功能:
There was a lot of talk of systems that managed deploys from chat around this time, but since we used third party IRC servers we weren’t able to fully trust the chat room with production control and so it remained a one-way flow of information.
2012 時則是把機器列表放到 DNS 上,某種 service discovery 系統:
First, it fetched its list of hosts from DNS rather than keeping it hard-coded. This allowed us to update the list of hosts without having to remember to update the deploy tool as well — a rudimentary service discovery system.
Another small but important change was to always deploy a fixed version of the code. The previous version of the tool would update master on a given host, but what if master changed mid-deploy because someone accidentally pushed up code? By deploying a specific git revision instead of branch name, we ensured that the deploy got the same version everywhere in production.
What happens if a server is launched while a deploy is ongoing? We had to make sure each newly launched server checked in to get new code if present. What about servers going away mid-deploy? The tool had to be made smarter to detect when the server was gone legitimately rather than there being an issue with the deploy process itself that should be noisily alerted on.
2014 遇到機器數量太多,推一輪要一個小時而被迫要平行化處理:
Over time, the number of servers needed to serve peak traffic grew. This meant that deploys took longer and longer. At its worst, a normal deploy took close to an hour. This was not good.
2015 則是加上 deploy lock,避免同時間有兩個人在 deploy:
Engineers would ask for the deploy lock and either get it or get put in the queue. This helped keep order in deploys and let people relax a bit while waiting for the lock.
2017 的部份則是提到了伺服器的數量:
This new mechanism allows us to deploy to a lot more machines concurrently, and deploy timings are down to 7 minutes for around 800 servers despite the extra waiting for safety.
The amount of memory varies per implementation, but in the case of this implementation, we could count over 1 million IDs using just 12 kilobytes of space, which would be 0.15% of the original space usage!
維基百科上有說明當資料量在 109 這個等級時,用 1.5KB 的記憶體只有 2% 的誤差值:
The HyperLogLog algorithm is able to estimate cardinalities of > 109 with a typical error rate of 2%, using 1.5 kB of memory.
It has come to our attention that many websites such as Forbes and Wired are now requiring users to disable ad blockers to view content. Because Forbes requires users to do this and has then served malware to them we see this as a security risk to you our community. There are also sites such as Wall Street Journal that have implemented pay-walls which we were are also considering banning.
Decentralized Reddit using a DHT to store content and a blockchain to rank such content. Whitepaper with more details here: http://lucaa.org/docs/kudos.pdf
Data is complete from January 01, 2008 thru August 31, 2015. Partial data is available for years 2006 and 2007. The reason for this is that the id's used when Reddit was just a baby were scattered a bit -- but I am making an attempt to grab all data from 2006 and 2007 and will make a supplementary upload for that data once I'm satisfied that I've found all data that is available.
約 42GB 的資料,幾乎是公開的資料都包含進去了:
This dataset represents approximately 200 million submission objects with score data, author, title, self_text, media tags and all other attributes available via the Reddit API.