很多 MTurk 的接案者都用 LLM 在解決文字類的問題

剛剛在 Hacker News 上翻到的:「33-46% of workers on MTurk used LLMs in a text production task (arxiv.org)」,論文在「Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks」這邊,這個標題取的很故意... XD

Hacker News 上的標題主要是出自論文 abstract 的這段:

We reran an abstract summarization task from the literature on Amazon Mechanical Turk and, through a combination of keystroke detection and synthetic text classification, estimate that 33-46% of crowd workers used LLMs when completing the task.

想想還蠻正常的?能輕鬆賺當然就輕鬆賺... 但這也代表開發者可以思考 offload 給 LLM 的品質,以及如果需要外部的工人智慧,是不是可以搭配 LLM 再 offload 一些簡單的處理給人類就好?

話說好久沒聽到 MTurk 這個服務了,翻了 wiki 看起來是 2005 年就有的服務。

NVIDIA 開源 Linux GPU Kernel Driver

NVIDIA 宣佈開源 Linux 下的 GPU Kernel Driver:「NVIDIA Releases Open-Source GPU Kernel Modules」。

從一些描述上可以看出來,應該是因為 Datacenter 端的動力推動的,所以這次 open source 的版本中,對 Datacenter GPU 的支援是 production level,但對 GeForce GPU 與 Workstation GPU 的支援直接掛 alpha level:

Which GPUs are supported by Open GPU Kernel Modules?

Open kernel modules support all Ampere and Turing GPUs. Datacenter GPUs are supported for production, and support for GeForce and Workstation GPUs is alpha quality. Please refer to the Datacenter, NVIDIA RTX, and GeForce product tables for more details (Turing and above have compute capability of 7.5 or greater).

然後 user-mode driver 還是 closed source:

Will the source for user-mode drivers such as CUDA be published?

These changes are for the kernel modules; while the user-mode components are untouched. So the user-mode will remain closed source and published with pre-built binaries in the driver and the CUDA toolkit.

nouveau 來說,是可以從 open source driver 裡面挖一些東西出來用,不過能挖到跟 proprietary 同樣效能水準嗎?

Stream 對 .io 的感冒

Stream 的人寫了一篇「Why Stream Stopped Using .IO Domain Names for Production Traffic」表達他們對 .io 的感冒...

主要是因為 9/20 爛掉的情況不太妙。第一個是 .io 爛掉了兩個小時 (以月來算 SLA 就等於直接掉了 0.2% uptime,變成不到 99.8%),第二個是爛掉時 server 傳回的不是 SERVFAIL,而是 NXDOMAIN

The outage lasted for almost 2 hours, during which 1/5th of DNS queries for any .getstream.io record would fail.

他們的解法是改到 .com 上,畢竟影響的時候應該會修得比較快。另外文章裡也有延伸提到 Amazon Route 53 爛掉時要怎麼辦,如果他們真的決定要解決的話,應該是會拿出像「StackOverflow 對於多 DNS 商的同步方式...」或是「GitHub 也自己搞了一套管理多家 DNS 的程式...」的搞法吧。

不過這的確是當初選 .io 沒預料到的...

Netflix 對於拍攝影片的要求

Netflix 對於拍攝影片的要求直接放在網站上:「Production and Post-Production Requirements v2.1」。

Provide a set of technical requirements for production and post-production workflows to ensure that a high level of quality is maintained throughout the lifecycle of a project from capture to archive. This serves the purpose of future-proofing the content as the Netflix platform and viewing experience continue to evolve.

裡面有提到一些產品,這些資訊其實可以當作採購指南用... (當預算有到這個 range 時 XD)

不過動畫會怎麼算啊 XDDD (應該是另外的 requirement?)

線上測試 SQL Injection 喔喔喔

在「An SQL Injection Attack Is a Legal Company Name in the UK」這邊看到英國的這家公司:「; DROP TABLE "COMPANIES";-- LTD」,根本就是在幫大家測試 XDDD

當然,大家也都馬上聯想到這則 xkcd 漫畫:「Exploits of a Mom」。

來招喚 QQ 姊翻譯這則 xkcd 漫畫?

Amazon Aurora 支援 t2.medium...

Amazon Aurora 宣佈支援 t2.medium:「Use Amazon Aurora for Dev & Test Workloads with new T2.Medium DB Instance Class」。

雖然官方的文章標題是寫提供給 dev & test 情境下使用,但其實對於某些 production 應該也不錯:

The db.t2.medium should be a great fit for many of your development and test scenarios, and you should also consider them for some of your less-demanding production workloads.

沒提供更低的 t2.small 有點可惜...

Let's Encrypt 的 Limited Beta

Let's Encrypt 在拿到 IdenTrust 提供的 cross sign (參考「Let's Encrypt 正式出發」) 後所啟動的計畫:「Beta Program Announcements」。

要先透過 Google Forms 填單等待,等通過後要指定 production server 才會發出來。

要注意的是,Let's Encrypt 發出來的 SSL certificate 會比較短,只有 90 天有效,設計上是依靠自動機制在更新,所以理論上不會有問題:

Certificates from Let's Encrypt are valid for 90 days. We recommend renewing them every 60 days to provide a nice margin of error. As a beta participant, you should be prepared to manually renew your certificates at that time. As we get closer to General Availability, we hope to have automatic renewal tested and working on more platforms, but for now, please play it safe and keep track.

再來是有註冊限制,分成 Registrations/IP address 與 Certificates/Domain 兩個限制:

There are two rate limits in play: Registrations/IP address, and Certificates/Domain.

前者的限制是每天 10 次:

Registrations/IP address limits the number of registrations you can make in a given day; currently 10. This means you should avoid deleting the /etc/letsencrypt/accounts folder, or you may not be able to re-register.

後者的限制是每個 domain 只給 4 個 certificate:

Certificates/Domain you could run into through repeated re-issuance. This limit measures certificates issued for a given combination of Top Level Domain + Domain. This means if you issue certificates for the following domains, at the end you would have what we consider 4 certificates for the domain example.com.

拿來測試應該是夠用。