Amazon 又把一個大部門的 Oracle 系統轉移到了 AWS 自家的系統

算是 AWS 的 PR 稿,在老闆對雲的宣示與政治正確下本來就會陸陸續續轉過去...

這次是 Amazon 的 Consumer Business 從 Oracle 的系統換到 AWS 自己的系統:「Migration Complete – Amazon’s Consumer Business Just Turned off its Final Oracle Database」。

原先有 75 PB 的資料與 7500 個 database:

We migrated 75 petabytes of internal data stored in nearly 7,500 Oracle databases to multiple AWS database services including Amazon DynamoDB, Amazon Aurora, Amazon Relational Database Service (RDS), and Amazon Redshift.

其中一個優點是省成本,但是也投入了超過一百個團隊一起參與轉移,會需要攤多久才會打平,這點在沒有看到內部財務資料其實沒辦法判斷,而且工程資源的稀缺性也是個沒有被看到的資訊:

Cost Reduction – We reduced our database costs by over 60% on top of the heavily discounted rate we negotiated based on our scale. Customers regularly report cost savings of 90% by switching from Oracle to AWS.

More than 100 teams in Amazon’s Consumer business participated in the migration effort.

然後 latency 的下降其實也只能參考,因為轉移系統的時候也會順便改寫,有多少是因為 AWS 服務本身帶出來,在沒有內部資料看不出來:

Performance Improvements – Latency of our consumer-facing applications was reduced by 40%.

管理成本算是裡面唯一可以參考的,畢竟是搬到可延展擴充的服務:

Administrative Overhead – The switch to managed services reduced database admin overhead by 70%.

另外,沒寫的東西比較有趣,像是他們沒有選擇 Athena 而是用 Redshift,看起來像是先轉上去,其他找機會再說...

Linux 上 Intel CPU 的安全性修正與效能的影響

Hacker News Daily 上看到在講 Intel CPU 因為各種安全性問題,而需要在 Linux Kernel 上修正,所產生的效能問題:「HOWTO make Linux run blazing fast (again) on Intel CPUs」。

這一系列的子彈也飛得夠久了 (雖然還是一直有其他的小子彈在飛),所以回過頭來看一下目前的情況。

這邊主要的測試是針對 mitigations=off 與 SMT 的啟用兩個項目在測 (SMT 在 Intel 上叫做 Hyper-threading),可以看到這兩份測試結果,目前的 mitigation 對效能的影響其實已經逐漸降到可以接受的程度 (小於 5%),但關閉 SMT 造成的效能影響大約都在 20%~30%:

但是開啟 SMT 基本上是個大坑,如果有關注大家在挖洞的對象,可以看到一堆 Intel CPU 上專屬的安全性問題都跟 SMT 有關...

剛好岔個題聊一下,先前弄了一顆 AMDRyzen 7 3700X 在用 (也是跑 Linux 桌機),才感受到現在的網頁真的很吃 CPU,開個網頁版的 SlackOffice 365 的速度比原來的老機器快了好多,差點想要把家裡的桌機也換掉...

uBlock Origin 的開發版 (Dev) 被 Chrome Web Store 拒絕的事件...

uBlock Origin 是一個在瀏覽器上擋廣告的軟體,以前在推廣的時候都只提到可以過濾掉網站上的廣告,大家興趣其實都不太高 (還會有「留口飯讓別人吃」之類的 XDDD),但最近跟同事推廣的時候改用「可以擋 YouTube 的影音廣告喔」,大家接受度意外的爆高,不過這有點扯遠了,回到原來的主題上...

先介紹一下 uBlock Origin 的開發模式,除了一般的 stable 版本外 (「uBlock Origin」這組),另外會有另外一個 dev 版本上傳到 Chrome Web Store (CWS) 上 (「uBlock Origin development build」這組),這樣讓使用者比較容易安裝與測試,這個方式也可以在 Tampermonkey 上看到。

這次主要維護者 Raymond Hill (gorhill) 在 1.22.5rc1 版上傳到 CWS 上後收到被拒絕上架的通知:「Dev build 1.22.5rc1 "REJECTED" from Chrome Web Store」。

拒絕的原因是 CWS 要求要有套件必須符合「目的單一性」,也就是不能把目的不同的東西強迫使用者綁在一起使用:

Your item did not comply with the following section of our policy: An extension should have a single purpose that is clear to users. Do not create an extension that requires users to accept bundles of unrelated functionality, such as an email notifier and a news headline aggregator. If two pieces of functionality are clearly separate, they should be put into two different extensions, and users should have the ability to install and uninstall them separately. For example, an extension that provides a broad array of functionalities on the New Tab Page/ Start-up Page but also changes the default search are better delivered as separate extensions, so that users can select the services they want. For more information on the new Chrome extensions quality policy, please refer to the FAQ: https://developer.chrome.com/extensions/single_purpose

後續的 1.22.5rc2 也被拒絕,然後他回信詢問了 CWS 官方,得到的仍然是罐頭回應,然後他就決定丟著 (而這個作法還蠻聰明的),接著這件事情就被丟著變成 PR 事件上了一些媒體,然後昨天就突然解了...

Google 最近的動作愈來愈多了,一方面在嘗試避免觸動反托拉斯法的情況,儘可能打壓這些擋廣告的套件...

Google Chrome 78 又要搞事把 www 開頭拿掉了...

剛剛改回用 beta channel 的 Google Chrome (78 版),結果發現網址列裡的 www 被拿掉,而且本來可以關掉的方式 (omnibox-ui 系列的設定) 從 chrome://flags 裡面也拔掉了,代表你現在關不掉了...

搜了一下資料,發現在 Reddit 上有人討論過 78 版的這個問題 (當時還在 dev channel):「Chrome 78 dev hide 'www' again」。

裡面有人提到可以安裝 Suspicious Site Reporterwww 重現,當初看到還半信半疑,結果裝了發現真的會動,WTF...

看了一下 extension 的 source code 發現好大一包,以後應該會有人生一個小的 extension 出來專門做這件事情?

EC2 的 ARM 也支援 bare metal 版本了...

AWSARM 平台的採用速度比想像中快,昨天發表了 bare metal 版本 a1.metal:「Now Available: Bare Metal Arm-Based EC2 Instances」。

相較於 x86 系列提供的 bare metal,ARM 這邊提供的機器不算大台,只有 16 核心與 32 GB 的記憶體,所以主要的用途應該是在於可以存取系統底層的資訊,或是因為軟體的授權不支援虛擬化而需要租用 bare metal:

  • need access to physical resources and low-level hardware features, such as performance counters, that are not always available or fully supported in virtualized environments,
  • are intended to run directly on the hardware, or licensed and supported for use in non-virtualized environments.

這次的 a1.metal 共八區可以用:

You can start using a1.metal instances today in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Tokyo), Asia Pacific (Mumbai), and Asia Pacific (Sydney).

Ken Thompson 的密碼

剛剛看到這串還蠻歡樂的...

起因於 BSD 3 的程式碼裡面有個 /etc/passwd,而且是帶有 crypt 的版本:「unix-history-repo/etc/passwd」。

裡面有蠻多密碼都已經被解出來了,但還是有些還沒解出來... 而最近的消息是 ken (Ken Thompson) 的密碼被解了出來:「Ken Thompson's Unix password」。

From: Nigel Williams <nw@retrocomputingtasmania.com>
Cc: TUHS main list <tuhs@minnie.tuhs.org>
Subject: Re: [TUHS] Recovered /etc/passwd files
Date: Wed, 9 Oct 2019 16:49:48 +1100

ken is done:

ZghOT0eRm4U9s:p/q2-q4!

took 4+ days on an AMD Radeon Vega64 running hashcat at about 930MH/s
during that time (those familiar know the hash-rate fluctuates and
slows down towards the end).

另外解出來的人也發現了這組密碼是一組西洋棋的 Descriptive notation,跟 Ken Thompson 的背景也相符:

From: Nigel Williams <nw@retrocomputingtasmania.com>
Cc: TUHS main list <tuhs@minnie.tuhs.org>
Subject: Re: [TUHS] Recovered /etc/passwd files
Date: Wed, 9 Oct 2019 16:52:00 +1100

On Wed, Oct 9, 2019 at 4:49 PM Nigel Williams
<nw@retrocomputingtasmania.com> wrote:
> ZghOT0eRm4U9s:p/q2-q4!

BTW, is that a chess move?

不過我覺得最好玩的是這個,不確定是不是本尊就是了:

From: Ken Thompson via TUHS <tuhs@minnie.tuhs.org>
To: Andy Kosela <akosela@andykosela.com>
Cc: TUHS main list <tuhs@minnie.tuhs.org>
Subject: Re: [TUHS] Recovered /etc/passwd files
Date: Wed, 9 Oct 2019 01:53:25 -0700

congrats.

On Wed, Oct 9, 2019 at 1:16 AM Andy Kosela <akosela@andykosela.com> wrote:
>
> On 10/9/19, Warner Losh <imp@bsdimp.com> wrote:
> > On Tue, Oct 8, 2019, 11:52 PM Nigel Williams
> > <nw@retrocomputingtasmania.com>
> > wrote:
> >
> >> On Wed, Oct 9, 2019 at 4:49 PM Nigel Williams
> >> <nw@retrocomputingtasmania.com> wrote:
> >> > ZghOT0eRm4U9s:p/q2-q4!
> >>
> >> BTW, is that a chess move?
> >>
> >
> > Most common opening.
> >
>
> Descriptive chess notation is not as popular today as it was back in
> the 70s, but it actually makes perfect sense as Ken is a long time
> chess enthusiast.
>
> --Andy

還有 Rob Pike 對這件事情不怎麼贊同的看法:

From: Rob Pike <robpike@gmail.com>
To: Nigel Williams <nw@retrocomputingtasmania.com>
Cc: TUHS main list <tuhs@minnie.tuhs.org>
Subject: Re: [TUHS] Recovered /etc/passwd files
Date: Wed, 9 Oct 2019 09:59:43 -1000

I coulda told you that. One tends to learn passwords (inadvertently) when
they're short and typed nearby often enough. (Sorry, ken.)

If I remember right, the first half of this password was on a t-shirt
commemorating Belle's first half-move, although its notation may have been
different.

Interesting though it is, though, I find this hacking distasteful. It was
distasteful back when, and it still is. The attitudes around hackery have
changed; the position nowadays seems to be that the bad guys are doing it
so the good guys should be rewarded for doing it first. That's disingenuous
at best, and dangerous at worst.

-rob


On Tue, Oct 8, 2019 at 7:50 PM Nigel Williams <nw@retrocomputingtasmania.com>
wrote:

> ken is done:
>
> ZghOT0eRm4U9s:p/q2-q4!
>
> took 4+ days on an AMD Radeon Vega64 running hashcat at about 930MH/s
> during that time (those familiar know the hash-rate fluctuates and
> slows down towards the end).
>

意外的引誘到一群人跑出來...

EC2 推出 18TB 與 24TB 的機器...

AWS 又把機器給生出來啦:「EC2 High Memory Update – New 18 TB and 24 TB Instances」。

一樣是限制要買三年 RI 才能用,不過價錢頁面上好像還在更新,在「Amazon EC2 Dedicated Hosts Pricing」只看到了之前就公佈的 12TB 價錢,還沒看到 18TB 與 24TB 的部份...

然後以前會跟同事說,資料小於這台機器記憶體大小的不能叫 big data (當時是 12TB),現在升級到 24TB 啦...

用更少訓練時間的 KataGo

最近開始在不同的地方會看到 KataGo 這個名字 (TwitterYouTube 上都有看到),翻了一下資料發現是在訓練成本上有重大突破,依照論文的宣稱快了五十倍...

在第一次跑的時候,只用了 35 張 V100 跑七天就有 Leela Zero 第 130 代的強度:

The first serious run of KataGo ran for 7 days in Februrary 2019 on up to 35xV100 GPUs. This is the run featured in the paper. It achieved close to LZ130 strength before it was halted, or up to just barely superhuman.

而第二次跑的時候用了 28 張 V100 跑 20 blocks 的訓練,跑了 19 天就已經超越 Facebook 當初提供的 ELFv2 版本,而對應到 Leela Zero 大約是第 200 代左右的強度 (要注意的是在 Leela Zero 這邊已經是用 40 blocks 的結構訓練了一陣子了):

Following some further improvements and much-improved hyperparameters, KataGo performed a second serious run in May-June a max of 28xV100 GPUs, surpassing the February run after just three and a half days. The run was halted after 19 days, with the final 20-block networks reaching a final strength slightly stronger than LZ-ELFv2! (This is Facebook's very strong 20-block ELF network, running on Leela Zero's search architecture). Comparing to the yet larger Leela Zero 40-block networks, KataGo's network falls somewhere around LZ200 at visit parity, despite only itself being 20 blocks.

從論文裡面可以看到,跟 Leela Zero 一樣是逐步提昇 (應該也是用 Net2Net),而不是一開始就拉到 20x256:

In KataGo’s main 19-day run, (b, c) began at (6, 96) and switched to (10, 128), (15, 192), and (20, 256), at roughly 0.75 days, 1.75 days, and 7.5 days, respectively. The final size approximately matches that of AlphaZero and ELF.

訓練速度上會有這麼大的改善,分成兩個類型,一種是一般性的 (在「Major General Improvements」這章),另外一類是特定於圍棋領域的改進 (在「Major Domain-Specific Improvements」這章)。

在 Leela Zero 的 issue tracking 裡面也可以看到很多關於 KataGo 的消息,看起來作者也在裡面一起討論,應該會有一些結果出來...

HTML 裡 head 的整理...

HEAD - A free guide to <head> elements」這篇整理了 HTML 裡 <head> 的標準,以及網路上夠大的廠商所定義的標準 (另外還把中國的瀏覽器也拉出來)。

從目錄就可以看出來講了哪些:

另外一種看法就是,你要怎麼樣讓你的 <head> 塞滿垃圾然後變得很肥... XD

Google Chrome 對 CPU bug 的 patch

既然有方向了,後續應該會有人去找底層的問題...

先是在 Hacker News 上看到「Speculative fix to crashes from a CPU bug」這個猜測性的修正,這是因為他們發現在 IntelGemini Lake 低功耗晶片組上會發生很詭異的 crash:

For the last few months Chrome has been seeing many "impossible" crashes on Intel Gemini Lake, family 6 model 122 stepping 1 CPUs. These crashes only happen with 64-bit Chrome and only happen in the prologue of two functions. The crashes come and go across different Chrome versions.

然後依照 crash log 猜測跟 alignment 有關,所以決定用 gcc/clang 都有支援的 __attribute__ 強制設定 alignment 來避開,但看起來手上沒有可以重製的環境,所以只能先把實做丟上來...