Home » Posts tagged "cost"

Vultr 開始要收台灣的稅了...

這幾天收到 Vultr 的通知信,要收 5% 的稅了:

Dear Valued Client,

Vultr.com will start collecting a Value Added Tax (also known as VAT) for services provided after 2018 June 01 in order to comply with new Taiwan regulations. Beginning on 2018 July 1, your invoices will include an additional tax charge of 5% for customers who purchase electronic services in Taiwan. The tax is applied to comply with new Taiwan VAT legislation requiring non-residents who are providing "remote services" to begin collecting Taiwanese VAT on these services when they are provided to Taiwanese residents or persons who are not registered for VAT.

Affected customers need to submit their VAT ID to Vultr. If you don’t provide a business VAT ID, your account charges might increase.

If you have any questions about this upcoming change, please contact our support team today. Thank you again for being a customer!

The Vultr.com Team

從 2018 七月開始收...

AWS 推出 EC2 Fleet:直接混搭標準 EC2、Spot、RI 的計算

AWS 將本來 EC2Spot Fleet 加上了 EC2 Fleet,計算的公式從本來只有 Spot Instace,變成把標準 EC2 Instance 與 RI 的計算全部都納進來:「EC2 Fleet – Manage Thousands of On-Demand and Spot Instances with One Request」。

Today we are extending and generalizing the set-it-and-forget-it model that we pioneered in Spot Fleet with EC2 Fleet, a new building block that gives you the ability to create fleets that are composed of a combination of EC2 On-Demand, Reserved, and Spot Instances with a single API call.

不過目前有些服務還沒整,主要是跟 auto scaling 有關的部份,這部份應該是一次上一大包:

We plan to connect EC2 Fleet and EC2 Auto Scaling groups. This will let you create a single fleet that mixed instance types and Spot, Reserved and On-Demand, while also taking advantage of EC2 Auto Scaling features such as health checks and lifecycle hooks. This integration will also bring EC2 Fleet functionality to services such as Amazon ECS, Amazon EKS, and AWS Batch that build on and make use of EC2 Auto Scaling for fleet management.


Cloudflare 的 jpegtran 在 ARM 上面的表現

Cloudflare 花了不少力氣在 ARM 的伺服器上 (可以參考「Cloudflare 用 ARM 當伺服器的進展...」,或是更早的「Cloudflare 測試 ARM 新的伺服器」這篇),最近在 ARM 上發現 jpegtran 的效能不是太好,花了不少力氣最佳化,發現有意外收穫:「NEON is the new black: fast JPEG optimization on ARM server」。

他們設的低標是讓每個 core 的效能大約在 Xeon 的 50%,但發現只有 26% 左右的效能:

Ideally we want to have the ARM performing at or above 50% of the Xeon performance per core. This would make sure we have no performance regressions, and net performance gain, since the ARM CPUs have double the core count as our current 2 socket setup.

In this case, however, I was disappointed to discover an almost 4X slowdown.

而他就想到這些圖形運算的程式應該早就在使用各種 SIMD 指令集加速,於是作者就想到,把 SSE 的最佳化部份 porting 到 ARM 上面的 NEON 說不定會有很大的幫助:

Not one to despair, I figured out that applying the same optimizations I did for Intel would be trivial. Surely the NEON instructions map neatly to the SSE instructions I used before?

而 porting 完後重新測試發現達到了 66% 的效能,已經超過本來的目標... 另外在批次處理中,也比 Xeon 快了:

繼續發研究時又發現 NEON 有一些在 SSE 沒有的指令 (沒有相似功能),也許能提供更進一步的加速:

While going over the ARMv8 NEON instruction set, I found several unique instructions, that have no equivalent in SSE.

如果再把這些指令實做出來,會發現單 core 的效能已經到 Xeon 的 83%,而批次的速度又提昇了不少:

最後是整台伺服器都跑滿時的測試,會發現整台的效能差不多 (其實 ARM 的版本還贏一些),但吃電量不到一半,而就算只拿他們常態在跑的 4 workers 來看 (應該是為了 latency 問題),用電效率來到 6.5 倍:

With the new implementation Centriq outperforms the Xeon at batch reduction for every number of workers. We usually run Polish with four workers, for which Centriq is now 1.3 times faster while also 6.5 times more power efficient.

這篇在提醒之後在 ARM 上寫最佳化時,不要只從 SSE porting 到 NEON,要多看一下有沒有其他指令集是有幫助的...

Amazon S3 推出新的等級 One Zone-IA

Amazon S3 有 RRS,提供給那些掉了可以重新產生的資料使用 (像是縮圖);另外也有 IA,提供給不常存取的資料使用。現在推出的這個等級結合了兩者,使得價錢更低:「Amazon S3 Update: New Storage Class and General Availability of S3 Select」。

New S3 One Zone-IA Storage Class – This new storage class is 20% less expensive than the existing Standard-IA storage class. It is designed to be used to store data that does not need the extra level of protection provided by geographic redundancy.

Percona 的人接受 AWS 的建議,重新測試了 Percona XtraDB Cluster 在 gp2 上的效能...

去年年底的時候 Percona 的人在 AWS 上測試 Percona XtraDB Cluster 的效能,尤其是針對底層應該選擇哪種 EBS 的部分給了一些建議。可以參考先前寫的「Percona 分析在 AWS 上跑 Percona XtraDB Cluster 的效能 (I/O bound)」這篇。

當時的建議是用 io1,雖然是比較貴,但對於效能比較好。

而後來 Percona 的人收到 AWS 工程師的建議,可以用另外一個方式,可以在 gp2 上拉出類似的效能,但成本會比 io1 低不少:「Percona XtraDB Cluster on Amazon GP2 Volumes」。

這個方式是利用 gp2 會依照空間大小,計算可用的 IOPS。在官方的文件裡是這樣描述 gp2 的效能 (IOPS):

General Purpose SSD (gp2) volumes offer cost-effective storage that is ideal for a broad range of workloads. These volumes deliver single-digit millisecond latencies and the ability to burst to 3,000 IOPS for extended periods of time. Between a minimum of 100 IOPS (at 33.33 GiB and below) and a maximum of 10,000 IOPS (at 3,334 GiB and above), baseline performance scales linearly at 3 IOPS per GiB of volume size. AWS designs gp2 volumes to deliver the provisioned performance 99% of the time. A gp2 volume can range in size from 1 GiB to 16 TiB.

在這個前提下,需要 10000 IOPS 的效能會需要 3.3TB 以上的空間,所以 Percona 就被 AWS 的工程師建議直接拉高空間重新測試:

After publishing our material, Amazon engineers pointed that we should try GP2 volumes with the size allocated to provide 10000 IOPS. If we allocated volumes with size 3.3 TiB or more, we should achieve 10000 IOPS.


接下來就比較儲存成本,大約是 io1 版本的一半價錢:

如上面文件中提到的,gp1 不完全保證效能,但統計出來經常能夠提供出 3 IOPS/GB 的效能。而 io1 則是保證效能,不太需要擔心效能不穩定的問題。就是這個差異,反應到成本上面就有蠻大的差距。善用這點設計系統,應該會對整體成本有蠻大的幫助... (但對 latency 就未必了,尤其是 P99 之類的數值)


測試 TPUv2 的 C/P 值

有人用相同演算法實際測試 Google 的 TPUv2 與 NVIDIATesla P100 的 C/P 值了:「Benchmarking Google’s new TPUv2」。

如果以 ResNet-50 當作計算的演算法,可以看到其實 C/P 值的差距沒有想像中大。主要原因是 GPU 可以使用較低的精度計算以加快速度,而非 Google 之前新聞稿故意使用較高精度比較 (TPU 使用 8-bit matrix engine,所以 GPU 使用較低的 fp16 版本比較會比較有參考價值):

真正的差異是在 LSTM

It turns out that the TPU is even faster on the LSTM model (21402 examples/s): ~12.9 times faster than a P100 (1658 examples/s) and ~7.7 times faster than a V100 (2778 examples/s)!


Steam 停止使用 Bitcoin 購買遊戲

Steam 宣佈停止使用 Bitcoin 購買遊戲:「Steam is no longer supporting Bitcoin」。

官方提到的原因是因為交易費用太高 (雖然是讓使用者付):

In the past few months we've seen an increase in the volatility in the value of Bitcoin and a significant increase in the fees to process transactions on the Bitcoin network. For example, transaction fees that are charged to the customer by the Bitcoin network have skyrocketed this year, topping out at close to $20 a transaction last week (compared to roughly $0.20 when we initially enabled Bitcoin).


Historically, the value of Bitcoin has been volatile, but the degree of volatility has become extreme in the last few months, losing as much as 25% in value over a period of days.

所以這樣推測,Steam 不是直接換成法幣?我記得他們合作的交易所 (BitPay) 可以馬上換成法幣...

Amazon EC2 推出 T2 Unlimited,可以付費超量使用 CPU

Amazon EC2t2 系列的機器上推出 T2 Unlimited:「T2 Unlimited – Going Beyond the Burst with High Performance」。

這不是新的機種,而是現有的機器上可以超量使用 CPU credit,AWS 會另外收費。


us-east-1 來算,其實相當便宜,看不出什麼 penalty fee:t2.micro 的 CPU credit 是 10% baseline,每小時單價是 $0.0116,所以先有個 100% 數字是 $0.116 的概念 (如果所有東西都是十倍)。

us-east-1 的 T2 Unlimited 是 $0.05 vCPU-hour,這樣看起來其實不賴?風險應該是在於不保證可以拿到多的 CPU resource...

可能要重新算一下 c4c5 的使用方式了...

另外雖然文章後面寫了一大串,但對照 region 表後,看起來是所有的區域都支援了:(美國政府的 region 除外)

You can launch T2 Unlimited instances today in the US East (Northern Virginia), US East (Ohio), US West (Northern California), US West (Oregon), Canada (Central), South America (São Paulo), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Asia Pacific (Mumbai), Asia Pacific (Seoul), EU (Frankfurt), EU (Ireland), and EU (London) Regions today.

AWS 推出新的 Cost Explorer API,不過 API 不是重點啦...

AWS 的「New – Interactive AWS Cost Explorer API」這篇文章在一開頭就提到了 Cost Explorer API 的情況,所以他們要推出新的 API 改進:

We launched the AWS Cost Explorer a couple of years ago in order to allow you to track, allocate, and manage your AWS costs. The response to that launch, and to additions that we have made since then, has been very positive. However our customers are, as Jeff Bezos has said, “beautifully, wonderfully, dissatisfied.”

不過讓我笑出來的是選圖啊... 大概是 Jeff Barr 夠硬的關係,直接選了這張:

我喜歡這張 XD (尤其配合著出現在官方 blog 上的時候)

Amazon EC2 的 C5 家族...

Amazon EC2 推出新的 instance:「Now Available – Compute-Intensive C5 Instances for Amazon EC2」,官方宣稱這次單位價錢的效能與 C4 相比大約提升了 25%,而極端的情況可以到 50%:

The new instances offer a 25% price/performance improvement over the C4 instances, with over 50% for some workloads.

這次比較特別的是切分方式,是 large、xlarge、2xlarge、4xlarge、9xlarge (咦?) 以及 18xlarge (...)。

然後亞洲區都還沒上 XD

You can launch C5 instances today in the US East (Northern Virginia), US West (Oregon), and EU (Ireland) Regions in On-Demand and Spot form (Reserved Instances are also available), with additional Regions in the works.