That bit after “watch?v=” is an 11 digit string. The first ten digits can be a-z,A-Z,0-9 and _-. The last digit is special, and can only be one of 16 values. Turns out there are 2^64 possible YouTube addresses, an enormous number: 18.4 quintillion. There are lots of YouTube videos, but not that many. Let’s guess for a moment that there are 1 billion YouTube videos – if you picked URLs at random, you’d only get a valid address roughly once every 18.4 billion tries.

We refer to this method as “drunk dialing”, as it’s basically as sophisticated as taking swigs from a bottle of bourbon and mashing digits on a telephone, hoping to find a human being to speak to. Jason found a couple of cheats that makes the method roughly 32,000 times as efficient, meaning our “phone call” connects lots more often. Kevin Zheng wrote a whole bunch of scripts to do the dialing, and over the course of several months, we collected more than 10,000 truly random YouTube videos.

By comparing our results to other ways of generating lists of YouTube videos, we can declare them “plausibly random” if they generate similar results. Fortunately, one method does – it was discovered by Jia Zhou et. al. in 2011, and it’s far more efficient than our naïve method. (You generate a five character string where one character is a dash – YouTube will autocomplete those URLs and spit out a matching video if one exists.) Kevin now polls YouTube using the “dash method” and uses the results to maintain our dashboard at Tubestats.

In our case, our drunk dials tried roughly 32k numbers at the same time, and we got a “hit” every 50,000 times or so. Our current estimate for the size of YouTube is 13.325 billion videos – we are now updating this number every few weeks at tubestats.org.

## 透過 Deep Learning 辨識人臉馬賽克的技術

• 符合這些背景身份的四十個人的照片。
• 人臉被馬賽克後的新聞照片。

## 電子書在美國的販售管道與作者的獲利

• Hachette Book Group
• HarperCollins
• Macmillan Publishers
• Penguin Random House
• Simon and Schuster

AuthorEarnings.com has published a report on this very subject, so I jumped into the data. Their May 2016 report reports 1340 authors earn over \$100,000 per year on Amazon.com. The striking fact here: “Half of them are indies and Amazon-imprint authors.”

「出版社」的架構受到的挑戰愈來愈多了。

## Humble Bundle 對抗信用卡盜刷的方法

Humble Bundle 說明他們如何對抗信用卡盜刷的方法，主要是不斷的降低風險，然後讓人介入的機會降低 (因為人事成本很高)：「How Humble Bundle stops online fraud」。

Our first line of defense is a machine-learning-based anti-abuse startup called Sift Science, which we’ve been training for years across 55,000,000 transactions. Given how many orders we process, Sift Science has a really good idea when someone is up to no good. The model adapts daily as we get more data.

Sift Science 在 2014 的時候提過：「偵測信用卡交易是否為盜刷的服務」。做的事情很簡單，你把大量的資料傳給 Sift Science，包括了各種使用者身份資訊，以及信用卡資料，Sift Science 可以透過 Machine Learning 的方法告訴你這筆交易的風險，讓你進一步的判斷。

## Amazon EBS 推出新磁碟種類

Amazon EBS 推出了新的磁碟種類，都是比現在更經濟 (白話文：更便宜) 的方案：「Amazon EBS Update – New Cold Storage and Throughput Options」。

Starts at 250 MB/s for a 1 terabyte volume, and grows by 250 MB/s for every additional provisioned terabyte until reaching a maximum burst throughput of 500 MB/s.

Designed for workloads similar to those for Throughput Optimized HDD that are accessed less frequently; \$0.025 / gigabyte / month.

For both of the new magnetic volume types, the burst credit bucket can grow until it reaches the size of the volume. In other words, when a volume’s bucket is full, you can scan the entire volume at the burst rate. Each I/O request of 1 megabyte or less counts as 1 megabyte’s worth of credit. Sequential I/O operations are merged into larger ones where possible; this can increase throughput and maximizes the value of the burst credit bucket (to learn more about how the bucket operates, visit the Performance Burst Details section of my New SSD-Backed Elastic Block Storage post).

## Amazon S3 與 HDFS 的速度差異

Though the speed improvements using HDFS are considerable, S3 did perform pretty well. At worst there's a 1.75x overhead in exchange for virtually unlimited scalability, 11 9's of durability and no worrying about over/under-provisioning storage space.