fp32、fp16 與 bf16

nanoGPT 這個專案下面看到 bf16 這個浮點格式:(這段在新版的 README.md 被改寫了)

Currently fp16 is much faster than bf16. Potentially revert back to using fp16 and re-introduce the gradient scaler?

翻了一下 bf16,是由 Google 的團隊所提出來的想法,主打是與 fp32 相同的表示範圍,但是犧牲精度,這樣一來既有使用 fp32 的程式很有機會可以直接換成 bf16 降低計算量 & 提昇效率 (也就是 drop-in replacement):

然後同一頁也看到各家自己推出不同的格式,這邊看起來比較像是市場競爭的關係...

Amazon Aurora for MySQL 支援 Point-in-Time Recovery 了

繼四月出 DynamoDB 推出的 PITR 後 (參考「Amazon DynamoDB 的 Point-In-Time Recovery」這篇),Amazon Aurora for MySQL 也宣佈支援 PITR 了:「Amazon Aurora Backtrack – Turn Back Time」。

從截圖可以看到支援到秒層級:

然後最多 72 小時,會有額外費用:

這樣又讓 DBA 少了一些事情 XD

Amazon DynamoDB 的 Point-In-Time Recovery

Amazon DynamoDB 在 3/26 發出來的功能,以秒為單位的備份與還原機制:「New – Amazon DynamoDB Continuous Backups and Point-In-Time Recovery (PITR)」。

先打開這個功能:

打開後就會開始記錄,最多可以還原 35 天內的任何一個時間點的資料:

DynamoDB can back up your data with per-second granularity and restore to any single second from the time PITR was enabled up to the prior 35 days.

這時候就算改變資料或是刪除資料,實際上在系統內都是 Copy-on-write 操作,所以需要另外的空間,這部份會另外計價:

Pricing for continuous backups is detailed on the DynamoDB Pricing Pages. Pricing varies by region and is based on the current size of the table and indexes. For example, in US East (N. Virginia) you pay $0.20 per GB based on the size of the data and all local secondary indexes.

有這樣的功能通常是一開始設計時就有考慮 (讓底層的資料結構可以很方便的達成這樣的效果),現在只是把功能實作出來... 像 MySQL 之類的軟體就沒辦法弄成這樣 XDDD

最後有提到支援的地區,是用條列的而不是說所有有 Amazon DynamoDB 的區域都支援:

PITR is available in the US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Sydney), Canada (Central), EU (Frankfurt), EU (Ireland), EU (London), and South America (Sao Paulo) Regions starting today.

比對一下,應該是巴黎與美國政府用的區域沒進去... 一個是去年年底開幕的區域,另一個是本來上新功能就偏慢的區域。

Facebook 用哪些資訊來決定投放給你的廣告

華盛頓郵報整理出來了 Facebook 的廣告所使用的 98 個個人資訊:「98 personal data points that Facebook uses to target ads to you」。

基本的個人資訊 (甚至是朋友的),以及使用什麼瀏覽器都可以預期;而 Like 或是參加的 Group 都會被計算也是意料中的事情,不過連信用卡的種類也都在內就頗特別的...

來檢視一下自己的防禦機制有哪些... 瀏覽器預設擋下第三方 cookie:

Ghostery 預設把所有外部元件擋下來,再用白名單開想要看的部份。用 uBlock Origin 擋下所有廣告。

另外用「Force Facebook Most Recent」強制 Facebook 轉到 Most Recent 的 Timeline 上。

不知道這樣夠不夠用...

最後來列出這 98 個條件:

  1. Location
  2. Age
  3. Generation
  4. Gender
  5. Language
  6. Education level
  7. Field of study
  8. School
  9. Ethnic affinity
  10. Income and net worth
  11. Home ownership and type
  12. Home value
  13. Property size
  14. Square footage of home
  15. Year home was built
  16. Household composition
  17. Users who have an anniversary within 30 days
  18. Users who are away from family or hometown
  19. Users who are friends with someone who has an anniversary, is newly married or engaged, recently moved, or has an upcoming birthday
  20. Users in long-distance relationships
  21. Users in new relationships
  22. Users who have new jobs
  23. Users who are newly engaged
  24. Users who are newly married
  25. Users who have recently moved
  26. Users who have birthdays soon
  27. Parents
  28. Expectant parents
  29. Mothers, divided by “type” (soccer, trendy, etc.)
  30. Users who are likely to engage in politics
  31. Conservatives and liberals
  32. Relationship status
  33. Employer
  34. Industry
  35. Job title
  36. Office type
  37. Interests
  38. Users who own motorcycles
  39. Users who plan to buy a car (and what kind/brand of car, and how soon)
  40. Users who bought auto parts or accessories recently
  41. Users who are likely to need auto parts or services
  42. Style and brand of car you drive
  43. Year car was bought
  44. Age of car
  45. How much money user is likely to spend on next car
  46. Where user is likely to buy next car
  47. How many employees your company has
  48. Users who own small businesses
  49. Users who work in management or are executives
  50. Users who have donated to charity (divided by type)
  51. Operating system
  52. Users who play canvas games
  53. Users who own a gaming console
  54. Users who have created a Facebook event
  55. Users who have used Facebook Payments
  56. Users who have spent more than average on Facebook Payments
  57. Users who administer a Facebook page
  58. Users who have recently uploaded photos to Facebook
  59. Internet browser
  60. Email service
  61. Early/late adopters of technology
  62. Expats (divided by what country they are from originally)
  63. Users who belong to a credit union, national bank or regional bank
  64. Users who investor (divided by investment type)
  65. Number of credit lines
  66. Users who are active credit card users
  67. Credit card type
  68. Users who have a debit card
  69. Users who carry a balance on their credit card
  70. Users who listen to the radio
  71. Preference in TV shows
  72. Users who use a mobile device (divided by what brand they use)
  73. Internet connection type
  74. Users who recently acquired a smartphone or tablet
  75. Users who access the Internet through a smartphone or tablet
  76. Users who use coupons
  77. Types of clothing user’s household buys
  78. Time of year user’s household shops most
  79. Users who are “heavy” buyers of beer, wine or spirits
  80. Users who buy groceries (and what kinds)
  81. Users who buy beauty products
  82. Users who buy allergy medications, cough/cold medications, pain relief products, and over-the-counter meds
  83. Users who spend money on household products
  84. Users who spend money on products for kids or pets, and what kinds of pets
  85. Users whose household makes more purchases than is average
  86. Users who tend to shop online (or off)
  87. Types of restaurants user eats at
  88. Kinds of stores user shops at
  89. Users who are “receptive” to offers from companies offering online auto insurance, higher education or mortgages, and prepaid debit cards/satellite TV
  90. Length of time user has lived in house
  91. Users who are likely to move soon
  92. Users who are interested in the Olympics, fall football, cricket or Ramadan
  93. Users who travel frequently, for work or pleasure
  94. Users who commute to work
  95. Types of vacations user tends to go on
  96. Users who recently returned from a trip
  97. Users who recently used a travel app
  98. Users who participate in a timeshare

0.1 + 0.2 = 0.30000000000000004

看到「http://0.30000000000000004.com/」這個網站對經典的 0.1 + 0.2 問題整理了各語言的結果。這個網址名稱也很機車啊 XD

開頭的說明講述 IEEE 754 二進制表示法的問題:

Your language isn't broken, it's doing floating point math. Computers can only natively store integers, so they need some way of representing decimal numbers. This representation comes with some degree of inaccuracy. That's why, more often than not, .1 + .2 != .3.

It's actually pretty simple. When you have a base 10 system (like ours), it can only express fractions that use a prime factor of the base. The prime factors of 10 are 2 and 5. So 1/2, 1/4, 1/5, 1/8, and 1/10 can all be expressed cleanly because the denominators all use prime factors of 10. In contrast, 1/3, 1/6, and 1/7 are all repeating decimals because their denominators use a prime factor of 3 or 7. In binary (or base 2), the only prime factor is 2. So you can only express fractions cleanly which only contain 2 as a prime factor. In binary, 1/2, 1/4, 1/8 would all be expressed cleanly as decimals. While, 1/5 or 1/10 would be repeating decimals. So 0.1 and 0.2 (1/10 and 1/5) while clean decimals in a base 10 system, are repeating decimals in the base 2 system the computer is operating in. When you do math on these repeating decimals, you end up with leftovers which carry over when you convert the computer's base 2 (binary) number into a more human readable base 10 number.

這邊主要是討論 IEEE 754-1985 這個標準,後來在 IEEE 754-2008 提出了新的表示方法,支援十進位的表示法來解這個問題 (雖然還沒普及)。

DEC64 浮點數

Hacker News Daily 上看到「DEC64: Decimal Floating Point」:

公式是:

value = coefficient * 10exponent

是以 10 為底。

0 與 NaN 是特別處理:

There are 255 possible representations of zero. They are all considered to be equal.

There is a special value called nan that has a coefficient of 0 and an exponent of -128. The result of division by zero is nan. nan is also the result of operations that produce results that are too large to be represented. nan is equal to itself.

網域也註冊一段時間了,不知道為什麼被突然提起...:

   Domain Name: DEC64.COM
   Registrar: 1 & 1 INTERNET AG
   Whois Server: whois.schlund.info
   Referral URL: http://1and1.com
   Name Server: NS57.1AND1.COM
   Name Server: NS58.1AND1.COM
   Status: ok
   Updated Date: 03-jun-2013
   Creation Date: 02-jun-2009
   Expiration Date: 02-jun-2014

NSA 付錢給 RSA 放後門的事件...

Edward Snowden 再次丟出 NSA 內部文件,表示 NSA 付錢給 RSA 在演算法裡面放後門:「Exclusive: Secret contract tied NSA and security industry pioneer」。

RSA 的回應則是完全不想提到這筆錢是做什麼用的:「RSA Response to Media Claims Regarding NSA Relationship」。

現在一般在猜測,這個後門應該就是 RSA BSAFE 的預設偽隨機數產生器 Dual_EC_DRBG

對於 Dual_EC_DRBG 的攻擊,2006 年的「Cryptanalysis of the Dual Elliptic Curve Pseudorandom Generator」就這樣寫:

Our experimental results and also empirical argument show that the DEC PRG is insecure. The attack does not imply solving the ECDLP for the corresponding elliptic curve. The attack is very efficient.

在 2007 年,Bruce Schneier 寫了一篇「Did NSA Put a Secret Backdoor in New Encryption Standard?」,提到這個弱點並沒有大到使得這個演算法不堪用,但看了總是不爽:

Problems with Dual_EC_DRBG were first described in early 2006. The math is complicated, but the general point is that the random numbers it produces have a small bias. The problem isn't large enough to make the algorithm unusable -- and Appendix E of the NIST standard describes an optional work-around to avoid the issue -- but it's cause for concern. Cryptographers are a conservative bunch: We don't like to use algorithms that have even a whiff of a problem.

並且建議不要用 Dual_EC_DRBG:

My recommendation, if you're in need of a random-number generator, is not to use Dual_EC_DRBG under any circumstances. If you have to use something in SP 800-90, use CTR_DRBG or Hash_DRBG.

現在回頭看這件事情... hmmm...

Fast Inverse Square Root 演算法...

中文稱為「平方根倒數速演算法」,英文則是「Fast Inverse Square Root」。

好像是在 Twitter 還是 Facebook 上看到的 (還是是在其他管道?),仔細看中文版維基百科條目,發現中文版的資料相當完整了 (看了一下歷史記錄,是去年 2012 年 6 月的時候從英文版翻出來的)。

當時很有名的 magic hack,比查表法快:

在1990年代初(也即該演算法發明的大概時間),軟體開發時通用的平方根計算方法多是從尋找表中取得近似值,而這段代碼取近似值耗時比之更短,達到精確度要求的速度也比通常使用的浮點除法計演算法快四倍,

然後還比 CPU 指令集快 XD

由於演算法所生成的用於輸入牛頓法的首次近似值已經相當精確,此演算法所得近似值的精度已可接受,而若使用與《雷神之鎚III競技場》同為1999年發行的Pentium III中的SSE指令rsqrtss計算,則計算平方根倒數的收斂速度更慢,精度也更低。

Update:請參考 comment,看起來中文版有誤譯...

我本來以為我之前寫過,找了找沒翻到... 補記錄下來 :p