Trainium-based EC2 Trn1 instances solve this challenge by delivering faster time-to-train while offering up to 50% cost-to-train savings over comparable GPU-based instances.
For large-scale model training, Trn1 instances integrate with Amazon FSx for Lustre high-performance storage and are deployed in EC2 UltraClusters. EC2 UltraClusters are hyperscale clusters interconnected with a non-blocking petabit-scale network.
但第一波開放的區域有點少,只有萬年美東一區 us-east-1 與美西二區 us-west-2:
You can launch Trn1 instances today in the AWS US East (N. Virginia) and US West (Oregon) Regions as On-Demand, Reserved, and Spot Instances or as part of a Savings Plan.
沒記錯的話,這種機器應該是要另外申請 limit 才能開,沒辦法說測就測。另外在公告裡面有提到 savings plan ,但沒提到 RI (reserved instance),不確定是不是還沒開 RI 讓使用者買 (不過我記得 savings plan 好像也有類似的折扣結構):
Starting today, Amazon EC2 High Memory instances with 3TiB (u-3tb1.56xlarge), 6TiB (u-6tb1.56xlarge, u-6tb1.112xlarge), 9TiB (u-9tb1.112xlarge), and 12TiB of memory (u-12tb1.112xlarge) are available in Asia Pacific (Tokyo) region. Customers can start using these new High Memory instances with On Demand and Savings Plan purchase options.
然後每次推這些機器的時候都會提到 SAP HANA,都沒有其他的例子可以說... 話說業界就只剩下這套系統是完全都沒在考慮分散式架構嗎 XDDD (完全沒用過)
SAP customers continue to leverage AWS as their platform of choice and innovation. Some are in the early stages of their SAP cloud journeys and are focused on executing their migration. Others have hardened their SAP systems on AWS and are innovating around their core business processes with advanced AWS services.
Pairing Lyra with new video compression technologies, like AV1, will allow video chats to take place, even for users connecting to the internet via a 56kbps dial-in modem.
這次的突破在於可以使用 3kbps 的頻寬傳輸,但清晰度比 Opus 的 6kbps 效果還好不少。
Google 在文章裡面給了兩個 sample,一個是乾淨背景音,另外一個是吵雜的背景音,跟 Opus 與 Speex 比起來都好很多。
We provide extensive subjective performance evaluations that show that our system based on generative modeling provides state-of-the-art coding performance at 3 kb/s for real-world speech signals at reasonable computational complexity.
看起來是 2019 年年初的時候 MySQL 5.1 出問題,後續決定安排升級,在 2019 年年中把系統升級到 MySQL 5.7 (Percona Server 版本):
Our first major hurdle was to get current with our version of MySQL. In July, 2019 we completed the MySQL 5.1 to MySQL 5.7 (v5.7.19-17-log Percona Server to be precise) upgrade across all MySQL instances.
Not only was support for MySQL 5.1 at End-of-Life (more than 5 years ago) but our MySQL 5.1 instances on EC2/AWS had limited storage and we were scheduled to run out of space at the end of July. Our backs were up against the wall and we had to deliver!
另外在升級到 5.7 的時候,順便把本來是 INT 的 primary key 都換成 BIGINT:
As part of the cut-over to MySQL 5.7, we also took the opportunity to bake in a number of improvements. We converted all primary key columns from INT to BIGINT to prevent hitting MAX value.
In parallel with the MySQL 5.7 upgrade we also Upgraded Django to 1.6 due a behavioral change in MySQL 5.7 related to how transactions/commits were handled for SELECT statements. This behavior change was resulting in errors with older version of Python/Django running on MySQL 5.7
Eventbrite had traditionally used pt-online-schema-change (pt-osc) to ALTER MySQL tables in production. pt-osc uses MySQL triggers to move data from the original to the “duplicate” table which is a very expensive operation and can cause replication lag. Matter of fact, it had directly resulted in several outages in H1 of 2019 due to replication lag or breakage.
Next on the list was implementing improvements to MySQL high availability and automatic failover using Orchestrator. In February of 2020 we implemented a new HAProxy layer in front of all DB clusters and we released Orchestrator to production!
Orchestrator can successfully detect the primary failure and promote a new primary. The goal was to implement Orchestrator with HAProxy first and then eventually move to Orchestrator with ProxySQL.
然後最後題到了 Square 研發的 Shift,把 gh-ost 包裝起來變成有個 web UI 可以操作:
Multi-Attach for Provisioned IOPS (io1) volumes on Amazon Elastic Block Store (EBS) is available today at no extra charge to customers in the US East (N. Virginia & Ohio), US West (Oregon), EU (Ireland), and Asia Pacific (Seoul) regions.
其中常用的目的是 HA:
Multi-Attach capability makes it easier to achieve higher availability for applications that provide write ordering to maintain storage consistency.
Heartbeat 類的應用應該可以用上這個東西,不過本來就可以透過 command line API 做到 detach & attach,用這個只是少了一個動作...
Amazon RDS Proxy supports Amazon RDS for MySQL and Amazon Aurora with MySQL compatibility, with support for additional RDS database engines coming soon.
定價策略看起來是依照後端資料庫的 vCPU 計算:
Pricing is simple and predictable: you pay per vCPU of the database instance for which the proxy is enabled.
Amazon RDS Proxy is available in preview for RDS MySQL and Aurora MySQL in US East (N. Virginia), US East (Ohio), US West (Oregon), EU West (Ireland), and Asia Pacific (Tokyo) regions. Support for RDS PostgreSQL and Aurora PostgreSQL is coming soon.
Dqlite (“distributed SQLite”) extends SQLite across a cluster of machines, with automatic failover and high-availability to keep your application running. It uses C-Raft, an optimised Raft implementation in C, to gain high-performance transactional consensus and fault tolerance while preserving SQlite’s outstanding efficiency and tiny footprint.
文章是由 Stanford 的 Sherlock 發出來的,不過看起來跟電視劇沒關係,從網站上的標語「The HPC cluster for all your computing needs」可以看出是 HPC 相關的單位。
在 HPC 環境裡面可以預期單一目錄裡會有很多檔案,所以使用者跑來抱怨 ls 的速度就不算太意外了。不過這次使用者有提到在他自己的 laptop 上跑 ls 反而很快:
It all started from a support question, from a user reporting a usability problem with ls taking several minutes to list the contents of a 15,000+ entries directory on $SCRATCH.
Having thousands of files in a single directory is usually not very file system-friendly, and definitely not recommended. The user knew this already and admitted that wasn’t great, but when he mentioned his laptop was 1,000x faster than Sherlock to list this directory’s contents, of course, it stung. So we looked deeper.
From 13s with the default settings, to 0.3s with a small LS_COLORS tweak, that’s a 40x speedup right there, for the cheap price of not having setuid/setgid or executable files colorized differently.
Of course, this is now setup on Sherlock, for every user’s benefit.