phk 的 ministat

Facebook 上看到朋友貼的統計分析小工具:「A small tool to do the statistics legwork on benchmarks etc.」,看了一下原來是 phk 寫的,後來被拉出來獨立跑...

從兩個檔案讀取兩組數列,然後用 Student's t-test 分析的小工具,在 manpage 裡面可以看到說明:

Specify desired confidence level for Student's T analysis. Possible values are 80, 90, 95, 98, 99 and 99.5 %

雖然說有些人不喜歡 Student's t-test 被濫用,不過畢竟還是一套合理的數學方法,在分析的時候可以快速的判斷...

翻了一下發現 Ubuntu 也有得裝:「Ubuntu – Package Search Results -- ministat」。

詩篇的作者抱怨不知道自己詩篇考題的答案...

2017 年的文章,最近不知道為什麼冒出來,但還蠻有趣的...

看到「I Can’t Answer These Texas Standardized Test Questions About My Own Poems」這篇,Sara Holbrook 收到信件跑來問問題 (節錄前面的部份):

Hello Mrs. Holbrook. My name is Sean, and I’m an 8th grade English teacher in Texas. I’m attempting to decipher the number of stanzas in your poem, ‘Midnight’. This isn’t clear from the formatting in our most recent benchmark. The assessment asks the following question:

作者最後的抱怨也很有趣:

My final reflection is this: any test that questions the motivations of the author without asking the author is a big baloney sandwich. Mostly test makers do this to dead people who can’t protest. But I’m not dead.

I protest.

這邊其實也是在偷戳「作者之死」現象... 另外一則也有類似的情況,發生的早一點的台灣 (2016) XDDD

文學的過度解讀現象 XDDD

Amazon DynamoDB 提供 Docker Image 讓開發者可以在本地端測試

AWS 推出了 Amazon DynamoDB 的相容 Docker Image,讓開發者可以在本地端測試 DynamoDB 的 API:「Use Amazon DynamoDB Local More Easily with the New Docker Image」,在 amazon/dynamodb-local 這邊可以拉到,裡面其實是包 Java:

DynamoDB local is now available to download as a self-contained Docker image or a .jar file that can run on Microsoft Windows, Linux, macOS, and other platforms that support Java.

這樣在 Continuous Integration (CI) 的過程裡面也可以拉起 service 測試...

MySQL 裡 performance_schema 對效能的影響

最近在弄 MySQLperforemance_schema,開起來後發現效能影響沒有很大,跟印象中不太一樣... 找了一下文章發現 Percona 在 2017 年年初時有針對效能測試過:「Performance Schema Benchmarks: OLTP RW」。重點在這張圖:

圖上可以看到 Default 其實對效能的影響有限,另外文章也整理出來,有哪些設定對效能影響不會太大,可以考慮平常就開著:

Using Performance Schema with the default options, Memory, Metadata Locks and Statements instrumentation doesn’t have a great impact on read-write workload performance. You might notice slowdowns with Stages instrumentation after reaching 32 actively running parallel connections. The real performance killer is Waits instrumentation. And even with it on, you will start to notice a performance drop only after 10,000 transactions per second.

AWS 提供模擬 Amazon Aurora 異常的測試功能...

Twitter 上看到 Jeff Barr 提到了在 Amazon Aurora 上的模擬 (這邊應該是講 MySQL):

指到的頁面是文件「Managing Amazon Aurora MySQL - Amazon Relational Database Service」,翻了一下 Wayback Machine,看起來之前就有了,只是現在拿出來再宣傳一下:「Managing Amazon Aurora MySQL - Amazon Relational Database Service」。

透過主動觸發 Amazon Aurora 異常,可以測試整個系統的後續反應:

  • A crash of the master instance or an Aurora Replica
  • A failure of an Aurora Replica
  • A disk failure
  • Disk congestion

前面三種都屬於 Aurora 本身的故障測試,第四種除了有可能是 Aurora 本身的問題外,也可以測壓力過大時的情境 (i.e. 前面透過 auto scaling 撐住了,但後面的資料庫可能沒有足夠的能力支撐)。

Google 發表新的 TTS (Text-to-Speech) 技術 Tacotron 2

Tacotron 是 Google 發表的 TTS 技術 (i.e. 輸入文字,請電腦發音),而前一版的 Tacotron 的錄音可以參考「Audio samples from "Tacotron: Towards End-to-End Speech Synthesis"」,論文則是在「Tacotron: Towards End-to-End Speech Synthesis」這邊可以看到。

這一版的則是在 Twitter 上看到有人提到:

這一版叫做 Tacotron 2,錄音可以參考「Audio samples from "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"」,論文在「Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions」。

這次在錄音頁面的最下面提供了盲測 (人類與 Tacotron 2 的錄音),基本上已經分不出哪個是真人了...

原來 Oracle 與 Microsoft 裡的條款是這樣來的...

看到「That time Larry Ellison allegedly tried to have a professor fired for benchmarking Oracle」這篇文章的講古,想起很久前就有聽過 Microsoft 有這樣的條款 (禁止未經原廠同意公開 benchmark 結果),原來是 Oracle 在三十幾年前創出來的?而且這種條款還有專有名詞「DeWitt Clauses」,出自當初被搞的教授 David DeWitt...

Microsoft 的條款是這樣:

You may not disclose the results of any benchmark test … without Microsoft’s prior written approval

Oracle 的則是:

You may not disclose results of any Program benchmark tests without Oracle’s prior consent

IBM 的反而在 license 裡面直接允許:

Licensee may disclose the results of any benchmark test of the Program or its subcomponents to any third party provided that Licensee (A) publicly discloses the complete methodology used in the benchmark test (for example, hardware and software setup, installation procedure and configuration files), (B) performs Licensee’s benchmark testing running the Program in its Specified Operating Environment using the latest applicable updates, patches and fixes available for the Program from IBM or third parties that provide IBM products (“Third Parties”), and © follows any and all performance tuning and “best practices” guidance available in the Program’s documentation and on IBM’s support web sites for the Program…

用 Makefile 跑測試的 Makefile.test

Box 放出來的東西,用 Makefile 跑各種測試:「Introducing Makefile.test: A Generic Makefile to Run Test Executables」,專案在 GitHub 上的 box/Makefile.test

Makefile.test can be used to run any type of test executables. It is not language specific nor it requires any changes to your code. Parallel, serial execution, various platforms and make versions are supported. The executables can be organized in any desired way. The user only lists the test files, the rest is taken care of Makefile.test.

程式碼很短,看程式碼其實比看說明容易理解:「Makefile.test」。看完後感覺沒有很實用... 就當作趣味看一看 XD

Microsoft Blogs 上 TDD 的戰文...

文章標題就直接寫「#NoTDD」的戰文 XDDD

列了 Pros (一行) 跟 Cons (超長 XDDD):

Pros

  • We end up with tests that verify the behavior of the code and help prevent regressions

這個是 TDD 的目的。而 Cons:

Cons

  • It takes us longer to write code using TDD
  • The tests get in the way. Because my design does not have low coupling, I end up with tests that also do not have low coupling. This means that if I change the behavior of how class works, I often have to fix tests for other classes.
  • Because I don’t have low coupling, I need to use mocks or other tests doubles often. Tests are good to the extent that the tests use the code in precisely the same way the real system uses the code. As soon as I introduce mocks, I now have a test that only works as long as that mock faithfully matches the behavior of the real system. If I have lots of mocks – and since I don’t have low coupling, I need lots of mocks – then I’m going to have cases where the behavior does not match. This will either show up as a broken test, or a missed regression.
  • Design on the fly is a learned skill. If you don’t have the refactoring skills to drive it, it is possible that the design you reach through TDD is going to be worse than if you spent 15 minutes doing up-front design.

這四個問題講的是時間與能力兩個因子的作用,轉一個角度來討論其實是:如果 junior engineer 可以寫出好的測試,他們就不叫 junior engineer 了... 而 senior engineer 是稀缺資源,讓他們多花時間寫出「好的測試」未必是划算的。

反而是另外一種常見的方式常常跟 TDD 在對抗:透過 QA team 在完成後測試,尤其是用手動測試的 QA team XDDD

這個方法很簡單,而且行之有年。而且很無奈的,跟一般軟體工程所期望的相反,junior engineer 就可以做的不錯,所以人力的部份相當好 scale,而品質也有不錯的水準 (畢竟是直接測實際的功能了)。

剛好前陣子有提到另外一篇論文也在討論 TDD 的效果 (參考「又一篇戰文:討論 TDD 的過程」這邊),也有類似的反思。

前陣子在 Twitter 上看到這則也是很有趣,拿來當結尾 XDDD:

又一篇戰文:討論 TDD 的過程

最近在 39th International Conference on Software Engineering 上受邀參加說明的論文,在 The Morning Paper 上看到的:「A dissection of the test-driven development process: does it really matter to test-first or test-last?」。

論文本身在「A Dissection of the Test-Driven Development Process: Does It Really Matter to Test-First or to Test-Last?」這邊可以下載,是去年 2016 就發出來的論文。

論文拆解 TDD 的行為,分析到底是哪些階段才是有正面幫助的:

Background: Test-driven development (TDD) is a technique that repeats short coding cycles interleaved with testing. The developer first writes a unit test for the desired functionality, followed by the necessary production code, and refactors the code. Many empirical studies neglect unique process characteristics related to TDD iterative nature. Aim: We formulate four process characteristic: sequencing, granularity, uniformity, and refactoring effort. We investigate how these characteristics impact quality and productivity in TDD and related variations. Method: We analyzed 82 data points collected from 39 professionals, each capturing the process used while performing a specific development task. We built regression models to assess the impact of process characteristics on quality and productivity. Quality was measured by functional correctness.

比較特別的是作者指出 refactoring 的負面效果:

Result: Quality and productivity improvements were primarily positively associated with the granularity and uniformity. Sequencing, the order in which test and production code are written, had no important influence. Refactoring effort was negatively associated with both outcomes. We explain the unexpected negative correlation with quality by possible prevalence of mixed refactoring. Conclusion: The claimed benefits of TDD may not be due to its distinctive test-first dynamic, but rather due to the fact that TDD-like processes encourage fine-grained, steady steps that improve focus and flow.

在 The Morning Paper 上的註解也可以看看 (反駁論文的意見),像是對論文使用自評系統的批評。