Home » Posts tagged "file"

Mozilla 實做百度發表的 Speech-To-Text 引擎 Deep Speech

Hacker News 上看到 MozillaGitHub 上的 mozilla/DeepSpeech 這個專案,用 TensorFlow 實做了百度的「Deep Speech: Scaling up end-to-end speech recognition」論文:

A TensorFlow implementation of Baidu's DeepSpeech architecture

語音轉文字的方案,Mozilla 開專案實做出來了...

這程式碼需要安裝 Git Large File Storage 才能完整下載包含訓練資料的部份:

Manually install Git Large File Storage, then clone the repository normally:
git clone https://github.com/mozilla/DeepSpeech

而目前已經有的資料來自於 Mozilla 另外一個專案「Common Voice」:

The Common Voice project is Mozilla's initiative to help teach machines how real people speak.

Common Voice 這個專案目前只有英文,網頁上就可以參與 validation 過程...

Amazon EFS 推出 File Sync 服務

先前 Amazon EFS 需要找台機器掛上去再同步 (無論是 EC2 的機器還是透過 VPN 將自己的機器接上去),現在推出可以直接把檔案同步進去的服務了:「Sync Files to Amazon Elastic File System Quickly, Easily and Securely with EFS File Sync」。

不過不是所有提供 Amazon EFS 的區域都有,目前只有 us-east-1us-east-2us-west-2 以及 eu-west-1

EFS File Sync is available in the US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland) regions, with availability in the EU (Frankfurt) and Asia Pacific (Sydney) regions coming in December 2017.

另外這是有費用的,目前有提供的四區都是 USD$0.01/GB。

OnionShare:透過 Tor 的 Hidden Service 分享檔案

OnionShare 官網上的說明是這樣寫:

OnionShare is an open source tool that lets you securely and anonymously share a file of any size.

Wiki 的說明比較清楚,實際上是在本機開一個 Hidden Service,而使用者必須透過 Tor Browser 這類的服務下載,這樣可以達成提供者與下載者都彼此匿名:

OnionShare lets you securely and anonymously share files of any size. It works by starting a web server, making it accessible as a Tor onion service, and generating an unguessable URL to access and download the files. It doesn't require setting up a server on the internet somewhere or using a third party filesharing service. You host the file on your own computer and use a Tor onion service to make it temporarily accessible over the internet. The other user just needs to use Tor Browser to download the file from you.


Open OnionShare, drag and drop files and folders you wish to share into it, and click Start Sharing. After a moment, it will show you a .onion URL such as http://asxmi4q6i7pajg2b.onion/egg-cain. This is the secret URL that can be used to download the file you're sharing.

如果下載者不是那麼在意匿名性的話,Tor2web 這類的服務應該也可以用...

感覺以現在提供的功能來看還是不夠便利,實驗性質居多 XD

Bash 裡處理 PID file 的方式...

看到「Age comparison in Bash for files and processes」後查了一些資料,如果在不使用外部程式處理的話,的確是多做了不少事情。

這是 Bash-ot 的說明:

file1 -ot file2
       True if file1 is older than file2, or if file2 exists and file1 does not.

而這是 test (也就是 [ 這隻程式) 對 -ot 的說明:

       FILE1 is older than FILE2


另外可以參考「What's the difference between [ and [[ in Bash?」這邊的說明,Bash 的 [[ 是 Bash 特有的加強版,而 [ 則是用系統的 test 運算。

CodeDeploy 對發生衝突的調整

AWS CodeDeploy 推出了新的功能,當 CodeDeploy 偵測到發生衝突時可以設定處理的方式:「AWS CodeDeploy Adds File Handling Support」。


Previously, CodeDeploy would fail the deployment when it detected a discrepancy between the files present at a target location and those in the last successful deployment. Now, you can choose how CodeDeploy responds (e.g., fail the deployment, retain the content, or overwrite the content) when encountering such files for each deployment.

這樣方便不少啊,之前在測試階段時偷改東西常常被抓包... XD

AWS Storage Gateway 的新功能:File Gateway

AWS Storage Gateway 推出新功能 File Gateway:「File Interface to AWS Storage Gateway」。

可以讓使用者在前面用 NFS,後面實際接的是 Amazon S3

After you set it up in your data center or in the cloud, your configured buckets will be available as NFS mount points. Your application simply reads and writes files and directories over NFS; behind the scenes, the gateway turns these operations into object-level requests on your S3 buckets, where they are accessible natively (one S3 object per file).

以前自己用 FUSE 惡搞的方式,現在變成官方支援的功能?XDDD

在 ext4 上的 CCFS

在「Application crash consistency and performance with CCFS」這篇看到的東西。

CCFS 目標是拉高 ext4 的 data integrity,並且還是有高效能:

CCFS (the Crash-Consistent File System) is an extension to ext4 that restores ordering and weak atomicity guarantees for applications, while at the same time delivering much improved performance.

如果你需要絕對的 data integrity,你需要用 data=journal 確保資料可以在 system crash 後被 replay,預設的 data=ordered 是無法達到的,而 CCFS 也沒打算達到絕對的 data integrity,而是盡量達到。所以在測試上可以發現 CCFS 大幅改善了 data integrity:

而效能還提昇了 (喂喂):


翻了一下好像沒 open source 出來 (至少現在沒看到),來等看看有沒有人會實做出來...