AWS 對 Elastic Stack 實作免費的開源版本 Open Distro for Elasticsearch

Elasticsearch 的主體是 Apache License 2.0,但 Elastic Stack (以前叫做 X-Pack) 則是需要付費使用的功能,其中包括了不少跟安全有關的項目在裡面,所以其實有不少人抱怨過產品凌駕安全性的問題,像是「ES 6.3: X-Pack Licence is "Expired" on New Install」這篇官方回應的:

A basic license is not entitled to security features. To try out security you need to use a trial license or obtain a subscription.

AWS 這次則是出手實作了他們自己的版本,叫做 Open Distro for Elasticsearch:「New – Open Distro for Elasticsearch」。

如果你看文章說明,他列出來的 feature 全部都是在 Elastic Stack 這頁上列出來的項目,針對性的意思其實很清楚了:

In addition to Elasticsearch and Kibana, the first release includes a set of advanced security, event monitoring & alerting, performance analysis, and SQL query features (more on those in a bit).

而前面提到的安全性功能也包括在內:

Security – This plugin that supports node-to-node encryption, five types of authentication (basic, Active Directory, LDAP, Kerberos, and SAML), role-based access controls at multiple levels (clusters, indices, documents, and fields), audit logging, and cross-cluster search so that any node in a cluster can run search requests across other nodes in the cluster.

目前支援 Docker Image 與 RPM,之後看看有沒有機會出 deb 版本:

In addition to the source code repo, Open Distro for Elasticsearch and Kibana are available as RPM and Docker containers, with separate downloads for the SQL JDBC and the PerfTop CLI.

這樣應該會讓 Elasticsearch 的服務模式受到很大的影響,來看 Elastic N.V. Ordinary Shares Real Time Stock Quotes 這邊會掉多少...

各種 Java 的版本

看到這則 tweet,提到 Java 的支援度:

主要是裡面有張圖列出了目前市場上有的選擇,可以當關鍵字來查:

目前看起來如果要 Java 8 只有三個方案,其中有過 TCK 的只有兩個,看起來用 Amazon Corretto 算是個還不錯的選擇?

加州大學宣佈不與 Elsevier 續約

加州大學 (這是一個大學系統,包括了十個校區,超過 25 萬的學生與 14 萬的教職員) 認為 Elsevier 沒有達到 open access 應有的標準,決定將不再跟 Elsevier 續約,並且發出新聞稿抨擊 Elsevier:「UC terminates subscriptions with world’s largest scientific publisher in push for open access to publicly funded research」。

As a leader in the global movement toward open access to publicly funded research, the University of California is taking a firm stand by deciding not to renew its subscriptions with Elsevier. Despite months of contract negotiations, Elsevier was unwilling to meet UC’s key goal: securing universal open access to UC research while containing the rapidly escalating costs associated with for-profit journals.

這應該是美國頂尖學院裡面的第一槍?後續會帶動多少單位不續訂...

開源的語音助理 Leon

目前的語音助理都會把資訊傳回到中央伺服器,而 leon-ai/leon 則是希望在本地端解決:

Leon is an open-source personal assistant who can live on your server.

He does stuff when you ask him for.

You can talk to him and he can talk to you. You can also text him and he can also text you. If you want to, Leon can communicate with you by being offline to protect your privacy.

作者錄了一段目前版本的功能,可以看到語音輸入與分析運作的還不錯:

看到 1.0.0 beta,最近應該會釋出正式版,接下來就是看可以掛什麼 plugin 進去讓他更好用?

組一台 RISC-V 的 PC

RISC-V 是目前最有名的 open source 指令集,但似乎不太好找對應的硬體... 在「Building a RISC-V PC」這篇則是介紹了怎麼弄一台 RISC-V 的 PC:

採用的硬體是 HiFive Unleashed (USD$999),硬體架構包括了:

  • SiFive Freedom U540 SoC
  • 8 GB DDR4 with ECC
  • Gigabit Ethernet port
  • 32 MB quad SPI flash from ISSI
  • microSD card for removable storage
  • FMC connector for future expansion

沒有 USB 與 HDMI,作者是透過 HiFive Unleashed Expansion Board (USD$1999) 接出來的,不過這張還包括了 FPGA,這好像有點超過,兩個套件再加上 M.2 或是 SATA 的 disk 應該是超過台幣十萬...

不過畢竟算是早期的產品,之後有規模後應該會開始有中低價位的產品出現...

Mercury Web Parser 開源

看到「Mercury Goes Open Source!」這篇,Postlight 的團隊開源了 Mercury Web Parser,程式碼在 GitHub 上的 postlight/mercury-parser 可以取得。

這個版本是用 Node.js 寫的,可以從範例看出用法以及結果:

import Mercury from '@postlight/mercury-parser';
Mercury.parse(url).then(result => console.log(result););
{
  "title": "Thunder (mascot)",
  "content": "<div><div><p>This is the content of the page!</div></div>",
  "author": "Wikipedia Contributors",
  "date_published": "2016-09-16T20:56:00.000Z",
  "lead_image_url": null,
  "dek": null,
  "next_page_url": null,
  "url": "https://en.wikipedia.org/wiki/Thunder_(mascot)",
  "domain": "en.wikipedia.org",
  "excerpt": "Thunder Thunder is the stage name for the horse who is the official live animal mascot for the Denver Broncos",
  "word_count": 4677,
  "direction": "ltr",
  "total_pages": 1,
  "rendered_pages": 1
}

先前其他的軟體與服務可以參考「Evaluating Text Extraction Algorithms」這篇的整理與比較,不過這篇連原網站都不見了... 只能從 Internet Archive 上翻出來。

這個主題有不少團隊都做過 (給一個 html 網頁,抓出實際的內容塊落),但也死了不少團隊... 比較有印象的是 Readability,在 2016 年收掉了:「The Readability bookmarking service will shut down on September 30, 2016.」。

要撈資料可以拿來用...

GitHub Actions

GitHub 藉著 open source 函式庫,說明了目前還在 beta 的 GitHub Actions 是什麼:「An open source parser for GitHub Actions」。

GitHub Actions 是 GitHub 規劃將自動化設定包裝成設定檔的服務,以往是透過 GitHub 網站上設定,現在則是把這些設定放到 git repository 內。

重點在於 Actions 其實是透過 HCL (HashiCorp Configuration Language) 語法定義:

All Actions workflow files are valid HCL, but not all HCL files are valid workflows.

AWS 推出 MongoDB 服務:Amazon DocumentDB

AWS 推出了 Amazon DocumentDB 服務,相容於 MongoDB 3.4 3.6 的界面:「New – Amazon DocumentDB (with MongoDB Compatibility): Fast, Scalable, and Highly Available」。

這個新聞其實引起不少關注,不單純是 AWS 支援了 MongoDB service,而是 AWS 對去年一系列 license issue 的態度。

先講 license 的事情,後面再提技術上的差異。

背景是 MongoDB 在去年十月的時候決定換 license,決定從 GNU AGPL 換成他們自己定義的 SSPL:「MongoDB now released under the Server Side Public License」。

相關的報導可以參考 TechCrunch 當時寫的「MongoDB switches up its open-source license」,主要的重點在於:

[T]he SSPL explicitly states that anybody who wants to offer MongoDB as a service — or really any other software that uses this license — needs to either get a commercial license or open source the service to give back the community.

而 AWS 在三個月後的回應也意外的清楚,他直接照著 MongoDB 3.6 版的 API 刻一個出來,不需要用你的軟體提供服務 (所以就不用照你的 license 走):

Amazon DocumentDB implements the Apache 2.0 open source MongoDB 3.6 API by emulating the responses that a MongoDB client expects from a MongoDB server, allowing you to use your existing MongoDB drivers and tools with Amazon DocumentDB.

TechCrunch 下的標題也頗直接,認為 AWS 對這套搞法不怎麼認同:「AWS gives open source the middle finger」。

回到技術上的層面來看,可以看到 Amazon DocumentDB 提供的技術資料看起來跟 Amazon Aurora 很像,都是六份三區:

Amazon DocumentDB uses a purpose-built SSD-based storage layer, with 6x replication across 3 separate Availability Zones.

連 read replica 的限制也都是 15 份,可以「猜測」後面應該是用同一套技術在運作...:

In Amazon DocumentDB, the storage and compute are decoupled, allowing each to scale independently, and developers can increase the read capacity to millions of requests per second by adding up to 15 low latency read replicas in minutes, regardless of the size of your data.

看了一下價錢,最小台是 db.r4.large,需要 USD$0.277/hr,相當於一個月要 USD$200 左右,而且 storage 與 i/o 要另外計算,門檻不算低。

目前主要還是歐美區先上:

Amazon DocumentDB (with MongoDB compatibility) is available now and you can start using it today in the US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Ireland) Regions.

隔壁棚的 Redis 不知道有什麼感想...

Syncthing 發行 1.0.0 版

Syncthing 是一個檔案分享軟體,如果要說類型的話,可以看作是 Dropbox 的 open source 版本,找台便宜的 VPS 主機就可以架起來丟著 (挑個空間夠大的 OpenVZ instance)。

官方在前幾天宣佈推出 1.0.0 了:「Syncthing graduation day」。會推出主要的原因是現在的版本其實夠穩定,就不要因為 0.x 版而造成使用者誤解了 (這邊應該是在講因為 Semantic Versioning 的流行,0.x 版會給人不穩定的印象):

As much as a version number means anything at all, a “major zero” version number means that you can expect breakage. This is not what we want to communicate. Especially, it’s not the mindset that we should have towards our users. Hence Syncthing is now graduating from being in perpetual beta to being actual release software, yet the journey of development continues.

來把手上的機器都升級上去...

歐盟對十四套 Open Source 軟體推出 Bug Bounty Program

歐盟對於 14 套 open source 軟體推出 bug bounty program,協助改善這些軟體的品質 (主要是資安這塊):「EU to fund bug bounty programs for 14 open source projects starting January 2019」、「In January, the EU starts running Bug Bounties on Free and Open Source Software」。

這十四套軟體的選擇應該可以參考「EU aims to increase the security of password manager and web server software: KeePass and Apache chosen for open source audits」這邊...

然後看到「Intigriti/Deloitte」這個才知道原來 Deloitte 也有做這個啊...