WebKit 專案將從 Subversion 搬到 GitHub 上

Hacker News 首頁上看到 WebKit 專案宣佈從本來的 Subversion 搬到 GitHub 上:「WebKit on GitHub!」,新的專案位置在 WebKit/WebKit 這邊。對應的討論在「WebKit Migrates from Subversion to GitHub (webkit.org)」這邊。

但 issue tracking system 的部份目前看起來還是繼續用 WebKit Bugzilla,沒有開 GitHub 的 issue 功能,但至少是從 Subversion 換成 Git 了。

在 Hacker News 上的討論意外看到已經是歷史的 SVK (試著在 Subversion 上面堆一些功能),而且還聊到了一些 tricky 的技巧:

Funny story: my first task when I joined the original iPhone team was to merge our forked WebKit with master. It was a sort of hazing ritual slash "when else would we do it but when someone new joins?". Anyways, we used a tool called SVK[1] in order to get very primitive "git-like" abilities. It was basically a bunch of scripts that used SVN under the hood. For example, in order to get the "local everything"-style behaviors of git, the very first thing it did was checkout every single version of the repository in question. For WebKit, this meant that the first day was spent leaving the computer alone and letting it download for hours. I made the mistake of having a space somewhere in the path of the target folder, which broke something or other, so I ended up having to do it all over again.

Anyways, I distinctly remember one of the instructions for merging WebKit in our internal wiki being something like "now type `svk merge`, but hit ctrl-c immediately after! You don't want to use the built-in merge, it'll break everything, but this is the only way to get a magic number that you can find stored in [some file] after the merge has started. If it's not there, try `svk merge` again and let it go a little longer than last time." A few hires later (I think possibly a year after) someone set up a git mirror internally to avoid having to do this craziness, which if I remember correctly, was treated with some skepticism. This was 2007, so why would we try some new-fangled git thing when we had svk?

1. https://wiki.c2.com/?SvkVersionControl

我記得那個時間點 VCS 的選擇的確是個有趣的決策過程... 除了 Git 以外還有 Mercurial,另外還有幾個當時就已經算小眾的 open source solution。

而到了 2010 後就比較明朗了,現在幾乎是 Git 一統天下了,Mercurial 目前最大的使用者應該是 Meta (Facebook)?

跑在本機的 GitHub Copilot 替代品

Hacker News 上看到「FauxPilot – an attempt to build a locally hosted version of GitHub Copilot (github.com/moyix)」這個本機上跑 GitHub Copilot 協定的專案。專案的 GitHub 在「FauxPilot - an open-source GitHub Copilot server」這邊。

裡面用的是 Salesforce 放出來的 CodeGen,不過 Salesforce 提供了 350M、2B、6B 與 16B 的 model,但在 FauxPilot 這邊目前只看到 350M、6B 與 16B 的 model 可以用,少了 2B 這組,然後需要的 VRAM 就有點尷尬了:

[1] codegen-350M-mono (2GB total VRAM required; Python-only)
[2] codegen-350M-multi (2GB total VRAM required; multi-language)
[3] codegen-6B-mono (13GB total VRAM required; Python-only)
[4] codegen-6B-multi (13GB total VRAM required; multi-language)
[5] codegen-16B-mono (32GB total VRAM required; Python-only)
[6] codegen-16B-multi (32GB total VRAM required; multi-language)

13GB 剛好超過 3080 Ti 的 12GB,所以不是 3090 或 3090 Ti 的使用者就只能跑 350M 這個版本?看 Hacker News 上的討論似乎是有打算要弄 2B 的版本啦...

然後我自己雖然是 11GB 的 1080 Ti,想跑個 350M 的版本測試看看,但看起來相關的 Nvidia driver 沒裝好造成他識別不到,加上我是用 neovim,看了一下目前 ~/.config/github-copilot/hosts.json 的內容,程式碼應該是寫死到 GitHub API 上使用:

{"github.com":{"user":"gslin","oauth_token":"x"}}

先暫時放著好了,晚點等 2B 版本出現後再回來看看有沒有比較完整的指示...

測試 Neovim + GitHub Copilot

如同之前在「GitHub Copilot 宣佈 GA」提到的,Copilot 有支援 Neovim,找了一下在 GitHub 上的 github/copilot.vim 這邊可以取得。

Copilot.vim is a Vim plugin for GitHub Copilot. For now, it requires Neovim 0.6 (for virtual lines support) and a Node.js installation.

主要有兩個 dependency 問題,第一個是 Neovim 版本要 0.6+,而在 Ubuntu 20.04 內的版本不夠新 (22.04 的看起來就夠),可以裝 PPA 版本解決:「Neovim Stable」。

另外一個是 Node.js 版本需要到 16+ (20.04 與 22.04 內建的都不夠),這個我是靠 nvm 解決。

先在 GitHub 網站上開通 Copilot,再照著說明,回到 Neovim 裡執行 :Copilot setup,跟著步驟跑授權流程就可以了。

接下來隨便開個 test.py 或是 test.php 檔開始寫,就會發現有 suggestion 跑出來了。

這邊拿 feedgen 測試會不會動,輸入 feed. 後就會出現灰色的 subtitle(title)

這時候按 tab 就會展出來了。

AWS 也推出了 GitHub Copilot 的競爭對手 Amazon CodeWhisperer

AWS 推出了 Amazon CodeWhisperer,可以看做是 GitHub Copilot 的競爭產品:「Now in Preview – Amazon CodeWhisperer- ML-Powered Coding Companion」,在 Hacker News 上的討論還不多:「Copilot just got company: Amazon announced Codewhisperer (amazon.com)」。

目前還是 Preview 所以是免費的,但也還沒有提供價錢:

During the preview period, developers can use CodeWhisperer for free.

另外目前提供的程式語言只有 PythonJavaJavaScript

The preview supports code written in Python, Java, and JavaScript, using VS Code, IntelliJ IDEA, PyCharm, WebStorm, and AWS Cloud9. Support for the AWS Lambda Console is in the works and should be ready very soon.

至於 training 的資料集,這邊有提到的是 open source 專案與 Amazon 自家的東西:

CodeWhisperer code generation is powered by ML models trained on various data sources, including Amazon and open-source code.

開發應該需要一段時間,不知道是剛好,還是被 GitHub Copilot 轉 GA 的事件強迫推出 Preview 版...

GitHub Copilot 宣佈 GA

GitHub Copilot 宣佈 GA:「GitHub Copilot is generally available to all developers」,Hacker News 上的討論可以看一下:「GitHub Copilot is generally available (github.blog)」。

價錢也出來了,US$10/mo 或是 US$100/year:

We’re making GitHub Copilot, an AI pair programmer that suggests code in your editor, generally available to all developers for $10 USD/month or $100 USD/year. It will also be free to use for verified students and maintainers of popular open source projects.

不過重點不是價錢,而是還沒有被挑戰過的 license 問題,像是在 Hacker News 上有人提到有些程式碼的授權是有感染性的 GPL 類的,這些在法院上還沒有被戰過。

不過還是很看好這個服務,畢竟可以處理掉很多無聊的 coding 時間... 查了一下發現 Neovim 已經有支援了,似乎可以來看看要怎麼玩 :o

GitHub 可以在 Markdown 文件裡寫 TeX 語法了

Hacker News 首頁上看到 GitHub 上的「Render mathematical expressions in Markdown」這個公告:

You can now use LaTeX style syntax to render math expressions within Markdown inline (using $ delimiters) or in blocks (using $$ delimiters).

其中 TeX rendering 這塊是透過 MathJax 產生的:

GitHub's math rendering capability uses MathJax; an open source, JavaScript-based display engine.

我記得 MathJax 的效能好像不怎麼樣... 反正是跑在使用者端的 javascript?XD

用 GitHub Actions 做的監控服務 Upptime

是在 Twitter 上看到這個:

然後翻到 Upptime 這個 open source monitoring 工具,直接是用 GitHub Actions 提供的 schedule (cron job) 每五分鐘跑一次。這邊要注意的是,如果是 public repository 的話不受限制,如果是 private repository 的話會有機會把 quota 吃完:

Billing note: Upptime uses thousands of build minutes every month (approximately 3,000 minutes in the default setting). If you use a public repository, GitHub offers unlimited free build minutes, but if you use a private repository, you'll have to pay for this time.

依照說明是用 GitHub Actions、GitHub IssuesGitHub Pages 三個功能在運作:

  • GitHub Actions is used as an uptime monitor
  • GitHub Issues are used for incident reports
  • GitHub Pages are used for the status website

除了用這三個功能外,另外還是會每天塞一些資料回 git history 裡面:

We also record the response time once per day and commit it to git history. This way, we can graph long-term trends in your websites' response times by going through git commit history. We generate these graphs once every day, also using schedulers.

好像可以玩看看...

GitHub 將在 2023 年底強制所有使用者都啟用 2FA

GitHub 公佈了強制使用 2FA 的計畫:「Software security starts with the developer: Securing developer accounts with 2FA」。

文章副標題把該講的都講完了:

GitHub will require all users who contribute code on GitHub.com to enable one or more forms of two-factor authentication (2FA) by the end of 2023.

自己開是一件事情,整個服務強制啟用是另外一個等級,還有一年多的時間...

這個月 GitHub 的不穩問題,都是 mysql1 這個 cluster 的鍋...

GitHub 針對了這個月的四次 downtime 說明,大致上都跟 mysql1 這組 cluster 有關:「An update on recent service disruptions」,這是 Keith Ballinger 發的文章,找了一下掛的頭銜是 SVP of Engineering at GitHub。

文章裡提到的 mysql1 在「Partitioning GitHub’s relational databases to handle scale」這邊可以看到一些資訊 (我在「GitHub 的 MySQL 架構與數字」這邊也有提到),基本上有 ProxySQL + Vitess 兩套方案在 scale,但可以看出來主資料庫本身還是有很大的 loading 在上面跑。

這次的問題是 mysql1 看起來這次遇到了效能上的瓶頸,不過還是沒找到原因,這可以從這幾次的說明看出來,從第一次的 outage:

The incident appeared to be related to peak load combined with poor query performance for specific sets of circumstances.

第二次的:

The following day, we saw the same peak traffic pattern and load on mysql1. We were not able to pinpoint and address the query performance issues before this peak, and we decided to proactively failover before the issue escalated.

第三次的:

While we had reduced load seen in the previous incidents, we were not fully confident in the mitigations.

In this third incident, we enabled memory profiling on our database proxy in order to look more closely at the performance characteristics during peak load.

到最近第四次的:

In order to reduce load, we throttled webhook traffic and will continue to use that as a mitigation to prevent future recurrence during peak load times as we continue to investigate further mitigations.

可以看到基本上還沒完,之後再遇到問題時應該還是會把 webhook traffic 拿出來開刀...

GitHub 放出了他們整理過的 GitHub Advisory Database

GitHub 宣佈開放他們整理過的 GitHub Advisory Database:「GitHub Advisory Database now open to community contributions」,Hacker News 上有 GitHub 的 PM 回答一些問題,也可以看看:「GitHub’s database of security advisories is now open source (github.blog)」。

對應的 repository 在「github/advisory-database」這邊可以看到,用的格式是 Open Source Vulnerability format,裡面都是 JSON 檔案。

裡面看起來是從 2017/10 開始的資料,這樣算起來大約累積了四年半,算是一個來源...