The project was heavily inspired by FerretDB and is on its early days. The main difference is that there is no intention to support any database other than PostgreSQL (FerretDB is also supporting Tigris) and it's written in Rust, as opposed to Go.
看起來大家都拿 PostgreSQL 在搞事,但這個專案裡面好像沒搜到 GIN 這個關鍵字,不知道是不是連 index 都沒下...
CLI tool and python library that converts the output of popular command-line tools and file-types to JSON or Dictionaries. This allows piping of output to tools like jq and simplifying automation scripts.
In order to make it easier for AWS Partners and others to gain access to CloudWatch metrics faster and at scale, we are launching CloudWatch Metric Streams. Instead of polling (which can result in 5 to 10 minutes of latency), metrics are delivered to a Kinesis Data Firehose stream.
When you set up a stream you choose between the binary Open Telemetry 0.7 format, and the human-readable JSON format.
另外一個賣點是價位,每千次 $0.003:
Pricing – You pay $0.003 for every 1000 metric updates, and for any charges associated with the Kinesis Data Firehose. To learn more, check out the pricing page.
之前如果要自己拉出來的話是透過 API call 抓,每 1000 次是 USD$0.01,這個方法相較起來便宜不少,不過數量多的時候還是一筆費用 (而且有不少 metrics 是一分鐘更新一次)。
如果只是要備份起來或是跑分析的話,也許先前用 API 拉的作法可能還是比較好?一個小時拉一次對於備份與分析應該都很夠了,而 alarm 的機制還是掛在 CloudWatch 上。
這次產品的定位看起來是要把 ecosystem 做起來:
We designed this feature with the goal of making it easier & more efficient for AWS Partners including Datadog, Dynatrace, New Relic, Splunk, and Sumo Logic to get access to metrics so that the partners can build even better tools.
I'm compressing ~1.3 GB folders each filled with 1440 JSON files and find that there's a 15-fold difference between using the tar command on macOS or Raspbian 10 (Buster) and using Python's built-in tarfile library.
看到 1440 個檔案應該會有直覺是一分鐘一個檔案,跑一天的量...
隔天他把原因找出來了,在裝了 GNU Tar 並且加上 --sort='name' 參數後,壓出來的大小就跟 Python 的 tarfile 差不多了:
Ok, I think I found the issue: BSD tar and GNU tar without any sort options put the files in the archive in an undefined order.
After installing GNU tar on my Mac with:
brew install gnu-tar
And then tarring the same folder, but with the --sort option:
My JSON files contain measurements from hundreds of sensors. Every minute I read out all sensors, but only a few of these sensors have a different value from minute to minute.
By sorting the files by name (which has the creation unixtime at the beginning of it), two subsequent files have very little different characters between them. Apparently this is very favourable for the compression efficiency.
丟進 array 是 OK 的,但問題在於他需要判斷 entry 是否重複,卻沒有用 hash 或是 tree 的結構,而這邊大約有 63k 筆資料,用 array 實做就產生了 O(n^2) 的演算法:
But before it’s stored? It checks the entire array, one by one, comparing the hash of the item to see if it’s in the list or not. With ~63k entries that’s (n^2+n)/2 = (63000^2+63000)/2 = 1984531500 checks if my math is right. Most of them useless. You have unique hashes why not use a hash map.
if it’s called again within the string’s range, return cached value
而第二個問題他直接把檢查是否有重複的跳過,因為資料本身不重複:
And as for the hash-array problem, it’s more straightforward - just skip the duplicate checks entirely and insert the items directly since we know the values are unique.
I found this while making a collection of what C implementation does what at https://news.ycombinator.com/item?id=26298300.
There are two basic implementation strategies. The BSD (FreeBSD and OpenBSD and more than likely NetBSD too), Microsoft, GNU, and MUSL C libraries use one, and suffer from this; whereas the OpenWatcom, P.J. Plauger, Tru64 Unix, and my standard C libraries use another, and do not.
The 2002 report in the comp.lang.c Usenet newsgroup (listed in that discussion) is the earliest that I've found so far.
RFC 7807 "Problem Details for HTTP APIs" has become a widely used standard for representing error messages. work has started on an updated version of the format. if you have any feedback about the spec, now would be a good time to make yourself heard. https://t.co/ERrsf1BbbNpic.twitter.com/dEQnzUMS3M
除了 application/problem+json 與 application/problem+xml 以外,在「3.1. Members of a Problem Details Object」裡面則是說明 JSON (或是 XML) 裡面有哪些必要以及可選的資訊要填,然後「3.2. Extension Members」這邊則大概描述一下怎麼擴充。