AWS CodeDeploy 支援在 AWS Lambda 上跑更多奇怪花樣

AWS Lambda 總算可以跟 AWS CodeDeploy 深度整合了:「AWS Lambda Supports Traffic Shifting and Phased Deployments with AWS CodeDeploy」。

像是兩個版本之間依照權重轉移:

You can now shift incoming traffic between two AWS Lambda function versions based on pre-assigned weights.

而把這個功能延伸就會是 Phased Deployments 了...

AWS CodeDeploy 支援單機測試模式

AWS CodeDeploy 本來是個 client-server 服務架構,但現在讓你方便在本機測試,支援直接在本機下指令 deploy (不需要 server) 看看發生什麼狀況:「AWS CodeDeploy Supports Local Testing and Debugging」。

Previously, if you wanted to test and debug your deployment, you had to fully configure AWS CodeDeploy. This includes installing the agent on the target host, creating a CodeDeploy Application, and creating a CodeDeploy Deployment Group.

Now, you can execute a deployment directly on a local machine or instance where the CodeDeploy agent is installed. If your deployment has errors, you can easily find and view the error logs by accessing the agent with your terminal. This makes it faster and easier to find and fix bugs before configuring CodeDeploy for production.

是有很多人一直中獎然後跟 AWS 反應嗎... XD

Reddit 的 Deploy 機制 (的歷史)

Reddit 主要是用 Python 寫的,這邊介紹了他們歷年來的 Code Deploy 系統:「The Evolution of Code Deploys at Reddit」。

最早期的時候 (2007 到 2010) 是用 rsync 更新程式碼,然後跑個迴圈用 ssh 連進去重跑:

# build the static files and put them on the static server
`make -C /home/reddit/reddit static`
`rsync /home/reddit/reddit/static public:/var/www/`

# iterate through the app servers and update their copy
# of the code, restarting once done.
foreach $h (@hostlist) {
    `git push $h:/home/reddit/reddit master`
    `ssh $h make -C /home/reddit/reddit`
    `ssh $h /bin/restart-reddit.sh`
}

2011 時因為人變多了,用 IRC 把過程丟出來 (okay,我知道你想問的問題... Slack 是 2013 年推出的):

The process for actually doing the deploy looked the same, but now the system did the work for you and told everyone what you were doing.

另外值得一提的是,因為他們不是自己架 IRC server 而是用外面第三方的伺服器,所以他們決定 IRC 只有單向告知的功能:

There was a lot of talk of systems that managed deploys from chat around this time, but since we used third party IRC servers we weren’t able to fully trust the chat room with production control and so it remained a one-way flow of information.

2012 時則是把機器列表放到 DNS 上,某種 service discovery 系統:

First, it fetched its list of hosts from DNS rather than keeping it hard-coded. This allowed us to update the list of hosts without having to remember to update the deploy tool as well — a rudimentary service discovery system.

另外是固定的版本,而非拉 master 下來,這樣可以避免 race condition 的不一致性 (推到一半有人把 code 塞進 master):

Another small but important change was to always deploy a fixed version of the code. The previous version of the tool would update master on a given host, but what if master changed mid-deploy because someone accidentally pushed up code? By deploying a specific git revision instead of branch name, we ensured that the deploy got the same version everywhere in production.

2013 往雲上搬,於是遇到像是「開新機器時剛好在 deploy 會拉到舊的 code」這種 edge case。

What happens if a server is launched while a deploy is ongoing? We had to make sure each newly launched server checked in to get new code if present. What about servers going away mid-deploy? The tool had to be made smarter to detect when the server was gone legitimately rather than there being an issue with the deploy process itself that should be noisily alerted on.

2014 遇到機器數量太多,推一輪要一個小時而被迫要平行化處理:

Over time, the number of servers needed to serve peak traffic grew. This meant that deploys took longer and longer. At its worst, a normal deploy took close to an hour. This was not good.

2015 則是加上 deploy lock,避免同時間有兩個人在 deploy:

Engineers would ask for the deploy lock and either get it or get put in the queue. This helped keep order in deploys and let people relax a bit while waiting for the lock.

2017 的部份則是提到了伺服器的數量:

This new mechanism allows us to deploy to a lot more machines concurrently, and deploy timings are down to 7 minutes for around 800 servers despite the extra waiting for safety.

看起來到現在還是維持手動 deploy,而不是自動化... 這塊還蠻有趣的 :o

CodeDeploy 的權限設定...

這陣子在弄 AWS CodeDeploy,調整了半天才把權限壓低到合理的範圍 (AWS 給的預設值還是有點大),記錄下來之後會比較好找...

在權限部份,AWS CodeDeploy 應該有三個權限要設定:

  • IAM user:給 CI 跑完後丟上 Amazon S3 並且呼叫 AWS CodeDeploy 佈署用的,像是 Travis CI 或是之類的服務。但如果是手動到 web console 觸發的話,這個部份就不需要了。
  • Service Role:給 AWS CodeDeploy 的服務本身用的。
  • EC2 Role:給跑在機器上 AWS CodeDeploy Agent 用的。

如同前面講的,IAM user 的部份有兩個要處理,一個是讓 CI 服務把檔案傳上 Amazon S3,另外一個是讓他有權限可以呼叫 AWS CodeDeploy 佈署新版本。

後者比較簡單,直接拉 AWSCodeDeployDeployerAccess 就可以了,前者比較麻煩一點,需要透過 Policy Generator 建出對應的權限:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1486156178000",
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::codedeploy-testbucket/*"
            ]
        }
    ]
}

第二個的 Service Role 是最簡單的,直接拉 AWSCodeDeployRole 就可以了。

第三個的 EC2 Role 只需要給存取 Amazon S3 的權限。照 AWS 的建議用 AmazonEC2RoleforAWSCodeDeploy 是會動啦,但權限給太大了,整個帳號可以存取的 S3 bucket 他都可以拉到:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "s3:GetObject",
        "s3:GetObjectVersion",
        "s3:ListObjects"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

改成自己用 Policy Generator 生,限制在 codedeploy-testbucket 上會比較好:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1486165995000",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::codedeploy-testbucket/*"
            ]
        }
    ]
}

這樣權限的部份就差不多了...

AWS CodeDeploy 支援 BlueGreenDeployment

AWS CodeDeploy 推出了 BlueGreenDeployment 的功能:「AWS CodeDeploy Introduces Blue/Green Deployments」。

BlueGreenDeployment 的目的不計成本想辦法把上線的 downtime 壓到最低,而且當出問題時 rollback 的時間壓到最低的方法:

One of the challenges with automating deployment is the cut-over itself, taking software from the final stage of testing to live production. You usually need to do this quickly in order to minimize downtime.

Blue-green deployment also gives you a rapid way to rollback - if anything goes wrong you switch the router back to your blue environment.

其實就是直接跑兩個環境 (所以成本比較高),一套跑舊的一套跑新的,然後在前面的 load balancer 切換:

The blue-green deployment approach does this by ensuring you have two production environments, as identical as possible.

Vagrant 1.7 引入的 Push 機制

離上次改版也半年了,Vagrant 1.7 引入了 Push 機制:「Vagrant Push: One Command to Deploy Any Application」。

功能很簡單,就是「把整個目錄傳到指定的位置上」,可以看做是 code deploy 機制。目前支援的通訊協定 Atlas、FTP/SFTP、Heroku 以及自訂 script。

Vagrant 的這個功能有著 Unix philosophy 的想法。功能很簡單,但似乎是以後新功能的墊腳石。code deploy 這個功能現在已經有一堆 open source 產品可以用,如果之後不以此延伸,看起來沒有太大意義。

當然也是順便幫自家產品 Atlas 打廣告...

另外是 Vagrant 這個字的念法,也可以在這篇文章裡的影片看到示範:

做 Ubuntu 14.04.1 的 VMware Template

公司有買 VMware license,但一直沒研究要如何做一份 VM template 出來,所以花了點時間研究到底怎麼做才會比較好。

這邊提供的方法是為了之後的通用性 (像是之後有可能用 KVM 堆虛擬機),所以不使用 VMware 獨規的設計,因此有些設定需要手動調整。

首先是先生出一台 Ubuntu 14.04.1 的 VM,在裝完基本系統後有些要先處理的:

  • 可以考慮用 DHCP 或是不開啟網路設定,反正不要設固定 IP address,以免同時裝多台機器時發生狀況。
  • 因為是 VMware 裡面,安裝 open-vm-tools 可以省下一些 puppet 安裝的時間。
  • 砍掉 /etc/ssh/ssh_host_*,每台機器的 host key 應該要不一樣。

做好 VM 後轉成 Template,之後每次在 deploy 完、開起來後,還要做這些事情:

  • /etc/hostname/etc/hosts 裡的機器名稱。
  • /etc/network/interfaces 裡的網路設定。
  • dpkg-reconfigure openssh-server 重新產生 SSH host key。

有些參考資料:

過程其實還蠻簡單的,只是有一些眉眉角角的東西要注意...

AWS 的 CodeDeploy、CodeCommit、CodePipeline

這次 re:Invent 的另外一個重大發表:「New AWS Tools for Code Management and Deployment」。

講的簡略一點,AWS CodeDeploy 是處理程式的 deployment,CodeCommit 是 Git hosting,CodePipeline 是 hook。

不過目前只有 CodeDeploy 可以用,其他兩個還沒好:

CodeDeploy is launching today and you can start using it now. Please stay tuned for more information on CodeCommit and CodePipeline!

以往是自己兜這些方案出來,現在是 AWS 直接包好提供...

各種 credential 儲存的方式 (像是連到資料庫的密碼)

John Resig (現在在 Khan Academy) 在月初的時候發表了「Keeping Passwords in Source Control」討論要怎麼儲存 credential。

這不只是開發者的問題而已,這跟 code deploy 機制也很有關。目前沒有完美的方案,不同的解法都是在不同的環境與限制下而誕生出的產物。