Instagram 改善影片上架速度的方式

不是什麼魔法,其實是改產品面上的規格 (但是發表到 Instagram Engineering 上):「Video Upload Latency Improvements at Instagram」。

最原始的版本是所有的格式都轉完後才可以上架:

然後把規格改成最高畫質的版本轉完後就可以先上架:

The idea is, instead of blocking until all video versions are available, we can publish the video once the highest-quality video version is available.

然後是把影片切段上傳,所以傳一半就可以先處理一半,變成 pipeline 的概念,但增加程式的複雜度,以及被迫要調整影片品質的參數:

Segmented uploads reduce upload latency in many cases but come with a few tradeoffs. For instance, segmented uploads increase the complexity of the pipeline. There are some quality metrics that are only available per segment at transcode time, such as SSIM. These metrics are not helpful to us on a per segment basis. Therefore, we need to do a duration weighted average of the SSIM of all segments to come up with the SSIM of the whole video. Similarly, handling exceptions is more complex since there are more cases to handle.

另外有一種特例是上傳的影片本身就已經符合伺服器的規格,這樣的話可以直接放行 (不過這樣不會有 security concern 嗎...):

Another performance optimization we use to improve the upload latency and save CPU utilization is something we call a “passthrough” upload. In some cases, the media that is uploaded is already ready for playback on most devices.

都是想的出來而且會帶有 tradeoff 的方法,而不是完全正面的改善 :o

MySQL 5.7 的 Pipeline 查詢加速

在「MySQL 5.7.12 – Part 2: Improving the MySQL Protocol」這邊看到介紹 MySQL 的 Asynchronous API,藉由 pipeline 加速查詢。

本來的:

res_1 = conn.query("DO 1");
res_2 = conn.query("DO 2");

會產生這樣的 flow:

而 Asynchronous API 可以這樣寫,先把兩個 SQL query 都丟出去,然後等結果:

hndl_1 = conn.query_send("DO 1");
hndl_2 = conn.query_send("DO 2");

# wait for completion
res_1 = conn.query_recv(hndl_1);
res_2 = conn.query_recv(hndl_2);

也就是產生這樣的 flow:

感覺 PHP 上只要 PDO 改善這塊後,各家 ORM library 就可以支援受益...

Amazon Redshift 的 Data Ingestion

AWS 放出的「Amazon redshift migration and load data 20150722」這份投影片則是解釋了 data ingestion 時的建議行為。

其實這張就道盡目前 Amazon Redshift 架構上的最佳作法,也就是每次都 TRUNCATE 掉重新 import。後面的所有方法其實都是 workaround,效能不會太好... XD

另外後面是介紹倒資料進 Amazon Redshift 的方式,官方是還蠻推 AWS Data Pipeline,但你用過就會知道有多麻煩與痛苦...

AWS 的 CodeDeploy、CodeCommit、CodePipeline

這次 re:Invent 的另外一個重大發表:「New AWS Tools for Code Management and Deployment」。

講的簡略一點,AWS CodeDeploy 是處理程式的 deployment,CodeCommit 是 Git hosting,CodePipeline 是 hook。

不過目前只有 CodeDeploy 可以用,其他兩個還沒好:

CodeDeploy is launching today and you can start using it now. Please stay tuned for more information on CodeCommit and CodePipeline!

以往是自己兜這些方案出來,現在是 AWS 直接包好提供...