Tag Archives: access

Open Distro for Elasticsearch 的比較

先前提到的「AWS 對 Elastic Stack 實作免費的開源版本 Open Distro for Elasticsearch」,在「Open Distro for Elasticsearch Review」這邊有整理了一份重點:

可以看到主要重點都在安全性那塊...

加州大學宣佈不與 Elsevier 續約

加州大學 (這是一個大學系統,包括了十個校區,超過 25 萬的學生與 14 萬的教職員) 認為 Elsevier 沒有達到 open access 應有的標準,決定將不再跟 Elsevier 續約,並且發出新聞稿抨擊 Elsevier:「UC terminates subscriptions with world’s largest scientific publisher in push for open access to publicly funded research」。

As a leader in the global movement toward open access to publicly funded research, the University of California is taking a firm stand by deciding not to renew its subscriptions with Elsevier. Despite months of contract negotiations, Elsevier was unwilling to meet UC’s key goal: securing universal open access to UC research while containing the rapidly escalating costs associated with for-profit journals.

這應該是美國頂尖學院裡面的第一槍?後續會帶動多少單位不續訂...

Cloudflare 發表 Logpush,把 log 傳出來

Cloudflare 推出了 Logpush,可以把 log 傳到 Amazon S3 或是 Google Cloud Storage 上 (目前就支援這兩個):「Logpush: the Easy Way to Get Your Logs to Your Cloud Storage」,目前是 enterprise 方案可以用:

Today, we’re excited to announce a new way to get your logs: Logpush, a tool for uploading your logs to your cloud storage provider, such as Amazon S3 or Google Cloud Storage. It’s now available in Early Access for Enterprise domains.

Logpush currently works with Amazon S3 and Google Cloud Storage, two of the most popular cloud storage providers.

以往是自己拉,需要考慮 HA 問題 (兩台小機器就可以了,機器成本是還好,但維護成本頗高),現在則是可以讓 Cloudflare 主動推上來...

我大多數的建議是 log 儘量留 (包括各種系統的 log 與 http access log)。主要是因為儲存成本一直下降 (S3 的 1TB 才 USD$23/month,如果使用 Glacier 會低到 USD$4/month),而且 log 在壓縮後會變小很多 (經常會有重複的字)。

最傳統的 AWStats 就可以看到不少資訊,如果是透過 IP address 與 User-agent,交叉配合其他 event data 就可以讓很多 machine learning 演算法取得的資訊更豐富。

Amazon EFS 的 IA Storage Class

Amazon EFS 一直找不太到好用的使用情境,因為 NFS 的關係所以大量 I/O 時的 latency issue 使得速度快不起來,而拿來堆 log 的成本又超級高...

最新推出的 storage class 則是透過提供低儲存成本的版本,解決了堆 log 這種使用情境:「New – Infrequent Access Storage Class for Amazon Elastic File System (EFS)」。

不過 EFS 不像 S3 可以直接選擇 storage class,是需要讓系統管理的:

開啟後 30 天沒有被碰過的檔案就會切過去:

Eligible Files – Files that are 128 KiB or larger and that have not been accessed or modified for at least 30 days can be transitioned to the new storage class. Modifications to a file’s metadata that do not change the file will not delay a transition.

而 latency 也會增加:

Files that have not been read or written for 30 days will be transitioned to the Infrequent Access storage class with no further action on your part. Files in the Standard Access class can be accessed with latency measured in single-digit milliseconds; files in the Infrequent Access class have latency in the double-digits.

us-east-1 為例子來說,Standard 是 USD$0.3/GB-month,而 IA 只要 USD$0.045/GB-month,但抓取時會有 USD$0.01/GB 的傳輸費用,可以看出價錢低不少。

不過文章裡沒提到什麼時候會把資料從 IA 跳回 Standard,可能得找機會問問看...

Amazon EFS 也要推出 IA 版本了 (Infrequent Access)

Amazon EFS 也要推出 IA (Infrequent Access) 版本了,Infrequent Access 指的是不常存取的資料:「Coming Soon – Amazon EFS Infrequent Access Storage Class」。

這剛好配合上很多人拿 Amazon EFS 來堆 log 的行為... AWS 是說有機會到省到 85%,不過應該是非常大的量才有機會有這個價錢?

EFS IA reduces storage costs for files not accessed every day, with savings up to 85% compared to the EFS Standard storage class.

其實用過 Amazon EFS 的人都對效能抱怨頗嚴重 (透過 NFS 有太多操作沒辦法 cache,於是 network latency issue 就出現了),堆 log 或是當作跨機器的空間大概是目前的主流用法...

Netflix 用 CloudTrail 記錄找出 AWS key 外洩的小工具

在「aws-credential-compromise-detection – Detecting Credential Compromise in AWS」這邊看到可以抓漏的專案 Netflix-Skunkworks/aws-credential-compromise-detection

透過分析 CloudTrail 記錄找出有哪些可疑的 AWS key 被外部使用,看起來預設值會過濾掉 Private IP range 以及 100.64.0.0/10 (設給給 CGNAT 使用的網段)。

不過 Netflix-Skunkworks 的定位是什麼啊,裡面好像有不少有趣的東西...

AWS 推出可以直接操作機器的 Shell Access

AWS 推出了 Shell Access,讓你可以直接操作 EC2 上的機器,而且又符合稽核上的要求:「New – AWS Systems Manager Session Manager for Shell Access to EC2 Instances」。

看起來機器上只要有 ssm-agent 就可以做這件事情了。在官方的圖可以看到用法:

另外 AWS 也在開發新的功能,一個是繞過 security group 的 SSH client:

SSH Client – You will be able to create SSH sessions atop Session Manager without opening up any inbound ports.

另外一個是給外部機器用的版本:

On-Premises Access – We plan to give you the ability to access your on-premises instances (which must be running the SSM Agent) via Session Manager.

這功能最主要還是稽核面的需求... 以前要作到這件事情會需要透過跳板之類的東西記錄使用者行為,現在 AWS 做出來後就省下這件事情。

歐洲研究機構的資助者推動研究論文的開放存取

在「Radical open-access plan could spell end to journal subscriptions」這邊看到歐洲 11 個研究機構資助者成立了「cOAlition S」,推動研究論文的開放存取。

目標是在 2020 年開始,由這些機構所資助的研究都必須投在符合完全開放條件的平台上:

cOAlition S signals the commitment to implement, by 1 January 2020, the necessary measures to fulfil its main principle: “By 2020 scientific publications that result from research funded by public grants provided by participating national and European research councils and funding bodies, must be published in compliant Open Access Journals or on compliant Open Access Platforms.

而現在大約只有 15%:

According to a December 2017 analysis, only around 15% of journals publish work immediately as open access (see ‘Publishing models’) — financed by charging per-article fees to authors or their funders, negotiating general open-publishing contracts with funders, or through other means.

用這種方式降低那些收錢才能下載的平台的影響力...

Amazon Aurora (MySQL) 推出的 Asynchronous Key Prefetch

Amazon Aurora (MySQL) 推出新的效能改善,可以改善 JOIN 時的效能:「Amazon Aurora (MySQL) Speeds Join Queries by More than 10x with Asynchronous Key Prefetch」。

看起來像是某個情況的 optimization,將可能的 random access 換成 sequential access 而得到大量的效能:

This feature applies to queries that require use of the Batched Key Access (BKA) join algorithm and Multi-Range Read (MRR) optimization, and improves performance when the underlying data set is not in the main memory buffer pool or query cache.

其實記憶體還是最好用的加速器,能加大硬拼就先硬拼... XD

Amazon Device Farm 支援讓使用者直接連上去 debug 了...

Amazon Device Farm 推出這樣的功能又朝著設備租賃服務更進一步了:「Amazon Device Farm Launches Direct Device Access for Private Devices」。

Now, with direct device access, mobile applications developers can use individual devices in their private test set as if they were directly connected to their local machine via USB. Developers can now test against a wide array of devices just like they would as if the devices were sitting on their desk.

這樣就可以使用更底層的東西了...