AWS 也推出了 GitHub Copilot 的競爭對手 Amazon CodeWhisperer

AWS 推出了 Amazon CodeWhisperer,可以看做是 GitHub Copilot 的競爭產品:「Now in Preview – Amazon CodeWhisperer- ML-Powered Coding Companion」,在 Hacker News 上的討論還不多:「Copilot just got company: Amazon announced Codewhisperer (amazon.com)」。

目前還是 Preview 所以是免費的,但也還沒有提供價錢:

During the preview period, developers can use CodeWhisperer for free.

另外目前提供的程式語言只有 PythonJavaJavaScript

The preview supports code written in Python, Java, and JavaScript, using VS Code, IntelliJ IDEA, PyCharm, WebStorm, and AWS Cloud9. Support for the AWS Lambda Console is in the works and should be ready very soon.

至於 training 的資料集,這邊有提到的是 open source 專案與 Amazon 自家的東西:

CodeWhisperer code generation is powered by ML models trained on various data sources, including Amazon and open-source code.

開發應該需要一段時間,不知道是剛好,還是被 GitHub Copilot 轉 GA 的事件強迫推出 Preview 版...

GitHub Copilot 宣佈 GA

GitHub Copilot 宣佈 GA:「GitHub Copilot is generally available to all developers」,Hacker News 上的討論可以看一下:「GitHub Copilot is generally available (github.blog)」。

價錢也出來了,US$10/mo 或是 US$100/year:

We’re making GitHub Copilot, an AI pair programmer that suggests code in your editor, generally available to all developers for $10 USD/month or $100 USD/year. It will also be free to use for verified students and maintainers of popular open source projects.

不過重點不是價錢,而是還沒有被挑戰過的 license 問題,像是在 Hacker News 上有人提到有些程式碼的授權是有感染性的 GPL 類的,這些在法院上還沒有被戰過。

不過還是很看好這個服務,畢竟可以處理掉很多無聊的 coding 時間... 查了一下發現 Neovim 已經有支援了,似乎可以來看看要怎麼玩 :o

GitHub 可以在 Markdown 文件裡寫 TeX 語法了

Hacker News 首頁上看到 GitHub 上的「Render mathematical expressions in Markdown」這個公告:

You can now use LaTeX style syntax to render math expressions within Markdown inline (using $ delimiters) or in blocks (using $$ delimiters).

其中 TeX rendering 這塊是透過 MathJax 產生的:

GitHub's math rendering capability uses MathJax; an open source, JavaScript-based display engine.

我記得 MathJax 的效能好像不怎麼樣... 反正是跑在使用者端的 javascript?XD

用 GitHub Actions 做的監控服務 Upptime

是在 Twitter 上看到這個:

然後翻到 Upptime 這個 open source monitoring 工具,直接是用 GitHub Actions 提供的 schedule (cron job) 每五分鐘跑一次。這邊要注意的是,如果是 public repository 的話不受限制,如果是 private repository 的話會有機會把 quota 吃完:

Billing note: Upptime uses thousands of build minutes every month (approximately 3,000 minutes in the default setting). If you use a public repository, GitHub offers unlimited free build minutes, but if you use a private repository, you'll have to pay for this time.

依照說明是用 GitHub Actions、GitHub IssuesGitHub Pages 三個功能在運作:

  • GitHub Actions is used as an uptime monitor
  • GitHub Issues are used for incident reports
  • GitHub Pages are used for the status website

除了用這三個功能外,另外還是會每天塞一些資料回 git history 裡面:

We also record the response time once per day and commit it to git history. This way, we can graph long-term trends in your websites' response times by going through git commit history. We generate these graphs once every day, also using schedulers.

好像可以玩看看...

GitHub 將在 2023 年底強制所有使用者都啟用 2FA

GitHub 公佈了強制使用 2FA 的計畫:「Software security starts with the developer: Securing developer accounts with 2FA」。

文章副標題把該講的都講完了:

GitHub will require all users who contribute code on GitHub.com to enable one or more forms of two-factor authentication (2FA) by the end of 2023.

自己開是一件事情,整個服務強制啟用是另外一個等級,還有一年多的時間...

這個月 GitHub 的不穩問題,都是 mysql1 這個 cluster 的鍋...

GitHub 針對了這個月的四次 downtime 說明,大致上都跟 mysql1 這組 cluster 有關:「An update on recent service disruptions」,這是 Keith Ballinger 發的文章,找了一下掛的頭銜是 SVP of Engineering at GitHub。

文章裡提到的 mysql1 在「Partitioning GitHub’s relational databases to handle scale」這邊可以看到一些資訊 (我在「GitHub 的 MySQL 架構與數字」這邊也有提到),基本上有 ProxySQL + Vitess 兩套方案在 scale,但可以看出來主資料庫本身還是有很大的 loading 在上面跑。

這次的問題是 mysql1 看起來這次遇到了效能上的瓶頸,不過還是沒找到原因,這可以從這幾次的說明看出來,從第一次的 outage:

The incident appeared to be related to peak load combined with poor query performance for specific sets of circumstances.

第二次的:

The following day, we saw the same peak traffic pattern and load on mysql1. We were not able to pinpoint and address the query performance issues before this peak, and we decided to proactively failover before the issue escalated.

第三次的:

While we had reduced load seen in the previous incidents, we were not fully confident in the mitigations.

In this third incident, we enabled memory profiling on our database proxy in order to look more closely at the performance characteristics during peak load.

到最近第四次的:

In order to reduce load, we throttled webhook traffic and will continue to use that as a mitigation to prevent future recurrence during peak load times as we continue to investigate further mitigations.

可以看到基本上還沒完,之後再遇到問題時應該還是會把 webhook traffic 拿出來開刀...

GitHub 放出了他們整理過的 GitHub Advisory Database

GitHub 宣佈開放他們整理過的 GitHub Advisory Database:「GitHub Advisory Database now open to community contributions」,Hacker News 上有 GitHub 的 PM 回答一些問題,也可以看看:「GitHub’s database of security advisories is now open source (github.blog)」。

對應的 repository 在「github/advisory-database」這邊可以看到,用的格式是 Open Source Vulnerability format,裡面都是 JSON 檔案。

裡面看起來是從 2017/10 開始的資料,這樣算起來大約累積了四年半,算是一個來源...

GitHub 的 Markdown 透過 Mermaid 支援各種流程圖

前幾天 GitHub 宣佈他們站上的 Markdown 透過 Mermaid 支援流程圖:「Include diagrams in your Markdown files with Mermaid」。

翻了一下 GitLab 也有 Mermaid 支援:「GitLab Flavored Markdown」,所以這個部份兩邊的系統可以通了...

寫 API 文件時也蠻常用到的東西,之前是在 GitLab 上面弄了 PlantUML 的支援,找時間來分析...

GitHub 搜尋的改善

GitHub 宣佈搜尋的新功能與改善:「Improving GitHub code search」。

其中對我感覺比較有趣的是宣佈支援 regular expression,透過 / 包起來搜:

Search for an exact string, with support for substring matches and special characters, or use regular expressions (enclosed in / separators).

不知道後面的搜尋系統怎麼做的,我猜是先拉出 substring 搜尋,然後 filtering 處理?

GitHub 的 MySQL 架構與數字

前幾天 GitHub 有寫一篇文章提到他們的 MySQL 是怎麼 scale 的,另外裡面也有一些數字可以看:「Partitioning GitHub’s relational databases to handle scale」。

他們最主要的 database cluster 叫做 mysql1,裡面有提到 2019 年的時候這個 cluster 是 950K qps,其中 primary 有 50K qps:

In 2019, mysql1 answered 950,000 queries/s on average, 900,000 queries/s on replicas, and 50,000 queries/s on the primary.

在 2021 年的時候變成 1.125M qps,其中 75K qps 在 primary 上:

Today, in 2021, the same database tables are spread across several clusters. In two years, they saw continued growth, accelerating year-over-year. All hosts of these clusters combined answer 1,200,000 queries/s on average (1,125,000 queries/s on replicas, 75,000 queries/s on the primaries). At the same time, the average load on each host halved.

另外這幾年比較成熟的方案都拿出來用了,包括用 ProxySQL 降低連線數的壓力 (connection pool 的概念):

[W]e started using ProxySQL to reduce the number of connections opened against our primary MySQL instances.

ProxySQL is used for multiplexing client connections to MySQL primaries.

另外用 Vitess 協助 sharding 之間的轉移:

Vitess is a scaling layer on top of MySQL that helps with sharding needs. We use its vertical sharding feature to move sets of tables together in production without downtime.

這兩套應該是已經蠻成熟的了... 另外也可以發現老方法還是很好用,就算在 GitHub 這種量還是可以暴力解決很多事情。