Twitter 的 280 字帶來的差異

在「140 Vs. 280: Users Engage With Longer Tweets Data Shows」這邊分析了在 Twitter 上 0~140 與 141~280 字的 tweet 所帶來的互動差異:

可以看到較長的 tweet 會有比較多的 retweet 與 like,不過更細一步的分析就沒有了... 文章內也有提到資料的分析是怎麼來的:

The data parameters: 30,000 publisher tweets that included links between November 29 – December 6.
The results: The click-through rate was roughly equal for both tweet length types but overall engagement nearly doubled for longer tweets. On tweets containing 141-280 characters, the average retweet was a staggering 26.52% – compared the 13.71% for tweets with 0-140 characters. For likes, tweets containing 141-280 characters had an average of a whopping 50.28%, compared to 0-140’s 26.96%.

拿來分析 Android APK 檔的 Droidefence

忘記在哪邊看到的,分析 Android APK 檔的軟體:「Droidefense: Advance Android Malware Analysis Framework」。

Droidefense (originally named atom: analysis through observation machine)* is the codename for android apps/malware analysis/reversing tool. It was built focused on security issues and tricks that malware researcher have on they every day work. For those situations on where the malware has anti-analysis routines, Droidefense attemps to bypass them in order to get to the code and 'bad boy' routine. Sometimes those techniques can be virtual machine detection, emulator detection, self certificate checking, pipes detection. tracer pid check, and so on.

Droidefense uses an innovative idea in where the code is not decompiled rather than viewed. This allow us to get the global view of the execution workflow of the code with a 100% accuracy on gathered information. With this situation, Droidefense generates a fancy html report with the results for an easy understanding.

看起來是輔助用的工具... 先記錄下來 XD

shellcheck

在朋友的 Twitter 上看到 shellcheck (ShellCheck, a static analysis tool for shell scripts) 這個專案:

可以看到有點像是 lint 工具,給了不少建議:

而且看起來已經發展很久了... 在各大 distribution 內都可以直接裝。

分析 FCC 對網路中立性的留言,將鄉民與機器人分開來分析

Boing Boing 的立場其實還蠻鮮明的,所以有時候他們的新聞看看就好...

但這篇真的很有趣,把 FCC (美國的聯邦通信委員會) 上兩千兩百萬則對網路中立性的留言拿出來分析,結果發現真人與機器人的差異超明顯 XDDD:「Analysis of 22 million FCC comments show that humans love Net Neutrality and bots really, really hate it」,引用的文章在「Discovering truth through lies on the internet - FCC comments analyzed」這邊。

分析可以發現,真人偏好網路中立,而機器人反對網路中立 XDDD:(其實大家心裡都有底是怎麼玩出來的... 只是這次有機會分析,讓事情更明顯)

Data analysis company Gravwell ingested 22,000,000 comments sent to the FCC's docket on Net Neutrality and posted their preliminary findings, which are that the majority of comments came from bots, and these bots oppose Net Neutrality; of the comments that appear to originate with humans, the vast majority favor Net Neutrality.

文章中列出幾個有趣的現象,像是機器人的 comment 大量重複:

A very small minority of comments are unique -- only 17.4% of the 22,152,276 total. The highest occurrence of a single comment was over 1 million.

甚至是 pornhub.com 的郵件位置 XDDD:

Most comments were submitted in bulk and many come in batches with obviously incorrect information -- over 1,000,000 comments in July claimed to have a pornhub.com email address

然後機器人的 pattern 也很容易辨別:

Bot herders can be observed launching the bots -- there are submissions from people living in the state of "{STATE}" that happen minutes before a large number of comment submissions

這個有點類似「50 Cent Party (五毛黨)」,但是是自動化機器人,而且產出的「品質」不太好 XDDD

Amazon S3 的改善

其實老牌的 Amazon S3 也改了不少東西:「Revolutionizing S3 Storage Management with 4 new features」。

其中的「S3 Object Tagging」讓管理可以透過 tag 處理,管理上會多一些選擇。而「S3 Analytics, Storage Class Analysis」則是可以分析存取的 pattern,藉此重新規劃 policy。

看到之前的同事說 CloudFront 要支援 2-tier cache,但卻還沒看到公告,不知道是怎麼樣的實作方式... 這對大型的 live streaming 幫助很大啊,後面的壓力會小很多。

Amazon Athena:直接在 S3 上進行分析

Amazon Athena 提供另外一種選擇,讓分析的便利性增加了許多:「Amazon Athena – Interactive SQL Queries for Data in Amazon S3」。

以往都需要開 server 起來分析,這個新的服務直接使用就好:

Athena is based on the Presto distributed SQL engine and can query data in many different formats including JSON, CSV, log files, text with custom delimiters, Apache Parquet, and Apache ORC.

果然是用 Presto 改出來的... XDDD

指定好各種資料來源之後直接下 SQL query 分析,然後依照分析的量來算錢... 而 FAQ 的地方也有提到可以透過 JDBC 接上去,這樣看起來跑報表的場合直接丟給他處理了:

Amazon Athena can be accessed via the AWS management console and a JDBC driver. You can programmatically run queries, add tables or partitions using the JDBC driver.

隔壁 Amazon Redshift 的立場變得很尷尬啊,Amazon Athena 不需要養機器而且又可以直接從 Amazon S3 拉資料,如果之後把 Presto 對 RDBMS 的部分再補上來的話就更棒了... (應該是下一階段的任務,把 RDS 補上)

Go 上面的白箱安全性檢查

HP 的 open source 專案「Go AST Scanner」,分析 Go 的原始程式碼拉出 AST 進行分析 (Static program analysis),再找出可能的安全性問題。

雖然是 alpha 階段,但看起來是個好東西啊... 至少寫的太誇張的 SQL injection 可以掃出來。

MIT 開發出靜態分析工具,找出 23 個 RoR 軟體未被發現的漏洞

MIT 弄出來的新玩意,靜態分析工具叫做 Space:「New MIT Scanner Finds Web App Flaws in a Minute」,MIT 官方的報導在「Patching up Web applications」這邊:

In tests on 50 popular Web applications written using Ruby on Rails, the system found 23 previously undiagnosed security flaws, and it took no more than 64 seconds to analyze any given program.

接下來就是什麼時候會放出來了...

四位數密碼的分佈

分析信用卡四位數密碼的分佈:「PIN number analysis」。

透過已經外洩的資料分析:

Obviously, I don’t have access to a credit card PIN number database. Instead I’m going to use a proxy. I’m going to use data condensed from released/exposed/discovered password tables and security breaches.

19xx 那邊特別高,拉出來看可以看到分佈:(很像是出生年 XDDD)

相同的 abab (前兩碼與後兩碼相同) 也可以看出特別高,而 aaaa (四碼都一樣) 的特別亮:

當不只四碼時,也有一些數據:

另外是特別高的 1004 的原因:

Many people also asked the significance of 1004 in the four character PIN table. This comes from Korean speakers. When spoken, "1004" is cheonsa (cheon = 1000, sa=4).

"Cheonsa" also happens to be the Korean word for Angel.

Oracle 的 CSO (Chief Security Officer) 對資安的想法

這應該是昨天很熱鬧的新聞,也不難看出 Oracle 對資安的心態。

參考「Oracle to 'sinner' customers: Reverse engineering is a sin and we know best」這篇報導,Oracle CSO Mary Ann Davidson 發表的原文已經被刪除,但這是 internet 時代,當然有完整的備份下來:「No, You Really Can’t (Mary Ann Davidson Blog)」。

提供給 ZDNet 報導的補充是:

The security of our products and services has always been critically important to Oracle. Oracle has a robust program of product security assurance and works with third party researchers and customers to jointly ensure that applications built with Oracle technology are secure. We removed the post as it does not reflect our beliefs or our relationship with our customers.

Oracle 的公關能力一如往常的優秀!