Skip to content

Gea-Suan Lin's BLOG

幹壞事是進步最大的原動力

Gea-Suan Lin's BLOG

Tag: metadata

AWS 的 VPC 內可以用 IPv6 的 Metadata Service、Time Service 與 DNS Server 了

在「IPv6 endpoints are now available for the Amazon EC2 Instance Metadata Service, Amazon Time Sync Service, and Amazon VPC DNS Server」這邊看到的,AWS 把一些基礎服務補上了 IPv6 interface:

The Amazon EC2 Instance Metadata Service, Amazon Time Sync Service, and Amazon VPC DNS server can now be accessed over IPv6 endpoints by instances built on the Nitro System.

Metadata Service 的部份在「Use IMDSv2」這邊可以看到,在 IPv4 的時候是 169.254.169.254,在 IPv6 時則是 fd00:ec2::254:

The examples in this section use the IPv4 address of the instance metadata service: 169.254.169.254. If you are retrieving instance metadata for EC2 instances over the IPv6 address, ensure that you enable and use the IPv6 address instead: fd00:ec2::254. The IPv6 address of the instance metadata service is compatible with IMDSv2 commands. The IPv6 address is only accessible on Instances built on the Nitro System.

Time Service 的部份可以在「Set the time for your Linux instance」這邊翻到,可以看到 fd00:ec2::123 這個位置:

The Amazon Time Sync Service is available through NTP at the 169.254.169.123 IPv4 address or the fd00:ec2::123 IPv6 address for any instance running in a VPC. The IPv6 address is only accessible on Instances built on the Nitro System.

DNS 的部份則是在「DNS support for your VPC」這邊翻,可以看到 fd00:ec2::253 這個 IPv6 address 的 DNS 服務:

If this attribute is true, queries to the Amazon provided DNS server at the 169.254.169.253 IPv4 address (or the reserved IP address at the base of the VPC IPv4 network range plus two) and the fd00:ec2::253 IPv6 address will succeed. The IPv6 address is only accessible on EC2 instances built on the Nitro System.

算是又補齊了一些基礎建設...

另外 EC2 剛好這三個字都在十六進位表示法的範圍內,就拿來當 IPv6 address 也是讓人好記很多。

另外結尾故意用 ::254、::123、::253 (要注意這是十六進位) 也讓人好記很多,而不是 ::fe 這類 hex。

Author Gea-Suan LinPosted on August 27, 2021Categories AWS, Cloud, Computer, DNS, Murmuring, Network, ServiceTags address, amazon, aws, cloud, dns, domain, ec2, ip, ipv6, metadata, network, server, service, system, time, vpcLeave a comment on AWS 的 VPC 內可以用 IPv6 的 Metadata Service、Time Service 與 DNS Server 了

Kafka 拔掉 ZooKeeper 的進展已經進到 trunk

差不多一年前提到了 Kafka 打算拔掉 ZooKeeper 的計畫 (在「Kafka 拔掉 ZooKeeper 的計畫」這邊有提到),昨天 Confluent 發了一篇進展:「Apache Kafka Made Simple: A First Glimpse of a Kafka Without ZooKeeper」。

這件事情可以在「KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum」這邊看到說明,另外在 Jira 上面則是在 KAFKA-9119 這邊追蹤。

目前的好消息就是 KIP-500 的程式碼都已經推進 trunk,目標在 2.8 釋出時就可以用:

So we’re very pleased to say that the early access of the KIP-500 code has been committed to trunk and is expected to be included in the upcoming 2.8 release.

文件裡面會用「Kafka Raft Metadata mode」或是「KRaft」稱呼這種模式:

For the first time, you can run Kafka without ZooKeeper. We call this the Kafka Raft Metadata mode, typically shortened to KRaft (pronounced like craft) mode.

然後有缺一些功能,另外就是常態性宣導這是測試版,建議不要在 production 環境用:

Beware, there are some features that are not available in this early-access release. We do not yet support the use of ACLs and other security features or transactions. Also, both partition reassignment and JBOD are unsupported in KRaft mode (these are anticipated to be available in an Apache Kafka release later in the year). Hence, consider the quorum controller experimental software—we don’t advise subjecting it to production workloads. If you do try out the software, however, you’ll find a host of new advantages: It’s simpler to deploy and operate, you can run Kafka in its entirety as a single process, and it can accommodate significantly more partitions per cluster (see measurements below).

目前丟出來的測試數據可以看到有大幅改善,但仔細看居然是停機與恢復時間:

不知道實際的效能上有多少影響 (正面或是負面),還是得等了...

Author Gea-Suan LinPosted on April 1, 2021April 6, 2021Categories Computer, Murmuring, SoftwareTags apache, confluent, engine, kafka, kafka-9119, kip-500, kraft, metadata, mode, raft, stream, streaming, trunk, zookeeperLeave a comment on Kafka 拔掉 ZooKeeper 的進展已經進到 trunk

Android 版的 Facebook 蒐集大量系統資訊

Android 版的 Facebook 把整個 system library 的 metadata 都傳上去了:

Facebook scans system libraries from their Android app user’s phone in the background and uploads them to their server

This is called "Global Library Collector" at Facebook, known as "GLC" in app’s code

It periodically uploads metadata of system libraries to the server pic.twitter.com/olwk1BPMoQ

— Jane Manchun Wong (@wongmjane) August 30, 2019

而在 iOS 上,目前沒有觀察到類似的行為:

really interesting catch Jane. I have not seen this in the iOS version. my bet is that they want better stability on Android so want to analyze system library variants - and apparently did not even think to ask user permission.

— Will Strafach (@chronic) August 30, 2019

這些資訊足以識別裝置的單一性,而且難以更改,對於廣告有超大的幫助...

Author Gea-Suan LinPosted on August 31, 2019Categories Computer, Library, Murmuring, Network, Privacy, Security, Social, SoftwareTags android, facebook, library, metadata, privacy, system1 Comment on Android 版的 Facebook 蒐集大量系統資訊

把 HDFS 的 metadata 丟上 NewSQL

把 HDFS 的效能瓶頸 metadata server 的資料改到 NewSQL 上使得效能大幅提昇:「HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases」。

In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS’ single node in-memory metadata service, with a distributed metadata service built on a NewSQL database.

尤其是在 Spotify 的測試,有 16~37 倍的改善 (應該是指碰到 HDFS 時的這塊,像是從外部拉到 HDFS 上的分析,而非整體的效率改善):

Metadata capacity has been increased to at least 37 times HDFS’ capacity, and in experiments based on a workload trace from Spotify, we show that HopsFS supports 16 to 37 times the throughput of Apache HDFS.

論文裡面有提到用的是 MySQL Cluster 的 NDB (in-memory):

HopsFS stores all metadata normalized in a highly available, in-memory, distributed, relational database called Network Database (NDB), a NewSQL storage engine for MySQL Cluster.

這樣應該會讓 Hadoop 的人有改善方向...

Author Gea-Suan LinPosted on March 5, 2017Categories Computer, Database, Murmuring, MySQL, Network, SoftwareTags apache, cluster, database, db, distributed, file, filesystem, hadoop, hdfs, hopsfs, memory, metadata, mysql, ndb, newsql, server, service, spotify, systemLeave a comment on 把 HDFS 的 metadata 丟上 NewSQL

Markdown 轉電子書 (以及實體書) 的工具

看到有人用 Markdown 寫電子書:「How I wrote and published my novel using only open source tools」。

電子書的部份主要是透過 Pandoc 轉換,然後用 Dublin Core Metadata Element Set 提供 metadata。

實體書的部份也還是用 Pandoc 轉換成 ODT,而後面還是拿出 LibreOffice 對 page format 調整:

LibreOffice has many page formatting options that are non-obvious but incredibly important. For example, I wanted the first page of each chapter to be a right page, and to exclude the page header. I defined my chapter heading style to have a “page break before” and set a page style of “first page” (which is a right page without a header), which is followed by “right page”, which is followed by “left page”, which goes back to “right page”.

主要還是 Pandoc 這個工具,找機會來轉些東西看看效果...

Author Gea-Suan LinPosted on August 19, 2016Categories Book, Computer, Murmuring, Recreation, SoftwareTags book, core, document, dublin, ebook, libreoffice, markdown, metadata, novel, odt, open, pandoc, source, xml1 Comment on Markdown 轉電子書 (以及實體書) 的工具

一次下載論文所有的 Reference PDF

感覺就是個做研究的人會很愛的東西:「PDFx - Extract metadata and URLs from PDFs, and download all referenced PDFs」。

程式會試著抓出 PDF 裡的 reference,並且試著找出網路上的 PDF 連結並下載。對於該開始進入做研究時,背景知識可能不足以閱讀這篇論文,會需要從研究被引用的論文開始看,這個程式就派上用場了...

以前是手動抓,雖然不需要花太多時間,但這樣還是方便多了 :p

Author Gea-Suan LinPosted on October 27, 2015Categories Computer, Murmuring, Network, SoftwareTags extract, metadata, paper, pdf, urlLeave a comment on 一次下載論文所有的 Reference PDF

Amazon VPC 的流量紀錄 (Flow Logs)

Amazon VPC 發表新功能,可以紀錄流量流經 VPC 的 metadata:「VPC Flow Logs – Log and View Network Traffic Flows」。

依照官方的圖片,可以看到資料很陽春 (看起來是 unix timestamp 的欄位,然後出現 6/17,應該是 TCP/UDP?),不過至少有紀錄可以看了:

算是 IPFIX 的一種?

Author Gea-Suan LinPosted on June 11, 2015Categories AWS, Cloud, Computer, Murmuring, NetworkTags amazon, aws, cloud, ipfix, metadata, netflow, tcp, tcpdump, traffic, udp, vpcLeave a comment on Amazon VPC 的流量紀錄 (Flow Logs)

Treasure Data 用 AWS 的方法

在 Zite 上看到「Treasure Data and AWS - Developers.io 2015」這篇,可以看看 Treasure Data 遇到的問題以及他們的解法。

這種文章除了看「怎麼做」以外,如果有提到「不怎麼做」也很值得看,也就是解釋避開了哪些東西。

像是他們拿 PostgreSQL 放 metadata 這件事情就很奇怪:

而他們解釋 Amazon S3 的 LIST 的品質有問題:

另外也說明了因為 GiST 的需求,所以不使用 MySQL,而且這也自動說明了 Amazon DynamoDB 不符合需求:

另外也遇到與 Netflix 同樣的問題,寫進去後因為 S3 的 Eventually Consistent 性質,不保證馬上讀的到:

如果沒有用到 GiST,用 DynamoDB 放 metadata 好像可以解決問題...

Author Gea-Suan LinPosted on March 30, 2015Categories AWS, Cloud, Computer, Database, Murmuring, MySQL, Network, PostgreSQL, SoftwareTags amazon, aws, consistent, data, database, db, eventually, gist, list, metadata, mysql, netflix, postgresql, rdbms, s3, treasure, ziteLeave a comment on Treasure Data 用 AWS 的方法

改善維基百科的 Metadata

維基百科打算改善手上檔案的 metadata:「Cleaning up file metadata, for humans and robots」。

查了一下,看起來是「Commons:Machine-readable data」這邊的規範,利用 <td> 裡的 id attribute?那這樣一份文件不是只能有一個嗎?有些狀況應該是不能只有一個...

還有點看不太懂細節,不過這件事情會大大的影響整個 internet,因為 Wikipedia 是目前品質與量都很高的資料來源,常常是各種服務的加值資料來源 (像是 Facebook 的粉絲頁,以及 Google 搜尋的結果)

Author Gea-Suan LinPosted on November 8, 2014November 8, 2014Categories CMS, Computer, Murmuring, Network, Wiki, WWWTags human, metadata, robot, wikimedia, wikipediaLeave a comment on 改善維基百科的 Metadata

DigitalOcean 也提供 metadata service 了

DigitalOcean 在「Easily Automate The Provisioning Of Your DigitalOcean Droplets!」這篇裡面提到了 metadata server:「An Introduction to Droplet Metadata」。

完整的資料可以看「Droplet Metadata Overview」這邊,不過直接看 code 就知道大概是什麼樣子的服務了:

export HOSTNAME=$(curl -s http://169.254.169.254/metadata/v1/hostname)

這對於自動化時要取得資訊 (或是傳入參數) 會很有幫助,有這個東西會省不少功夫... (以往要自己架一組 cluster,以 public ip 對 API 交叉查詢後傳回)

Author Gea-Suan LinPosted on October 21, 2014Categories Computer, Murmuring, Network, ProgrammingTags api, digitalocean, droplet, metadata, service, vpsLeave a comment on DigitalOcean 也提供 metadata service 了

Posts navigation

Page 1 Page 2 Next page
  • Live 記錄
  • 訂閱 (subscribe)
  • 關於我 (about me)

Recent Comments

  • Tommy on Alpaca.cpp 有 13B 與 30B 的 model 可以玩了
  • Alpaca.cpp 有 13B 與 30B 的 model 可以玩了 on Stanford Alpaca 與 Alpaca.cpp
  • video plus on 用 YouTube 影片當作免空的方式
  • 日落 on AWS 官方推出了自己的 Amazon S3 FUSE 套件
  • Stanford Alpaca 與 Alpaca.cpp on 玩最近 Facebook Research (Meta) 放出來的 LLaMA

Archives

  • March 2023 (24)
  • February 2023 (22)
  • January 2023 (19)
  • December 2022 (22)
  • November 2022 (32)
  • October 2022 (36)
  • September 2022 (18)
  • August 2022 (38)
  • July 2022 (36)
  • June 2022 (28)
  • May 2022 (31)
  • April 2022 (35)
  • March 2022 (43)
  • February 2022 (40)
  • January 2022 (22)
  • December 2021 (28)
  • November 2021 (26)
  • October 2021 (31)
  • September 2021 (44)
  • August 2021 (24)
  • July 2021 (28)
  • June 2021 (31)
  • May 2021 (32)
  • April 2021 (24)
  • March 2021 (51)
  • February 2021 (34)
  • January 2021 (29)
  • December 2020 (38)
  • November 2020 (42)
  • October 2020 (40)
  • September 2020 (23)
  • August 2020 (32)
  • July 2020 (35)
  • June 2020 (22)
  • May 2020 (36)
  • April 2020 (33)
  • March 2020 (18)
  • February 2020 (28)
  • January 2020 (34)
  • December 2019 (43)
  • November 2019 (22)
  • October 2019 (24)
  • September 2019 (33)
  • August 2019 (30)
  • July 2019 (40)
  • June 2019 (37)
  • May 2019 (39)
  • April 2019 (30)
  • March 2019 (62)
  • February 2019 (35)
  • January 2019 (38)
  • December 2018 (36)
  • November 2018 (45)
  • October 2018 (19)
  • September 2018 (37)
  • August 2018 (28)
  • July 2018 (29)
  • June 2018 (25)
  • May 2018 (37)
  • April 2018 (36)
  • March 2018 (30)
  • February 2018 (57)
  • January 2018 (53)
  • December 2017 (84)
  • November 2017 (117)
  • October 2017 (93)
  • September 2017 (60)
  • August 2017 (37)
  • July 2017 (41)
  • June 2017 (37)
  • May 2017 (59)
  • April 2017 (55)
  • March 2017 (55)
  • February 2017 (35)
  • January 2017 (42)
  • December 2016 (48)
  • November 2016 (32)
  • October 2016 (35)
  • September 2016 (78)
  • August 2016 (69)
  • July 2016 (19)
  • June 2016 (42)
  • May 2016 (61)
  • April 2016 (51)
  • March 2016 (74)
  • February 2016 (87)
  • January 2016 (31)
  • December 2015 (36)
  • November 2015 (61)
  • October 2015 (72)
  • September 2015 (53)
  • August 2015 (42)
  • July 2015 (38)
  • June 2015 (30)
  • May 2015 (18)
  • April 2015 (57)
  • March 2015 (41)
  • February 2015 (50)
  • January 2015 (35)
  • December 2014 (50)
  • November 2014 (56)
  • October 2014 (41)
  • September 2014 (37)
  • August 2014 (37)
  • July 2014 (28)
  • June 2014 (50)
  • May 2014 (32)
  • April 2014 (46)
  • March 2014 (38)
  • February 2014 (29)
  • January 2014 (52)
  • December 2013 (50)
  • November 2013 (45)
  • October 2013 (40)
  • September 2013 (48)
  • August 2013 (22)
  • July 2013 (25)
  • June 2013 (13)
  • May 2013 (16)
  • April 2013 (28)
  • March 2013 (37)
  • February 2013 (36)
  • January 2013 (57)
  • December 2012 (44)
  • November 2012 (10)
  • October 2012 (12)
  • September 2012 (21)
  • August 2012 (21)
  • July 2012 (25)
  • June 2012 (8)
  • May 2012 (10)
  • April 2012 (11)
  • March 2012 (10)
  • February 2012 (11)
  • January 2012 (5)
  • December 2011 (13)
  • November 2011 (12)
  • October 2011 (10)
  • September 2011 (7)
  • August 2011 (5)
  • July 2011 (11)
  • June 2011 (21)
  • May 2011 (22)
  • April 2011 (36)
  • March 2011 (43)
  • February 2011 (23)
  • January 2011 (24)
  • December 2010 (34)
  • November 2010 (19)
  • October 2010 (16)
  • September 2010 (15)
  • August 2010 (10)
  • July 2010 (12)
  • June 2010 (3)
  • May 2010 (3)
  • April 2010 (4)
  • March 2010 (8)
  • February 2010 (14)
  • January 2010 (13)
  • December 2009 (16)
  • November 2009 (28)
  • October 2009 (24)
  • September 2009 (12)
  • August 2009 (7)
  • July 2009 (10)
  • June 2009 (11)
  • May 2009 (22)
  • April 2009 (21)
  • March 2009 (18)
  • February 2009 (7)
  • January 2009 (32)
  • December 2008 (19)
  • November 2008 (12)
  • October 2008 (15)
  • September 2008 (14)
  • August 2008 (15)
  • July 2008 (18)
  • June 2008 (20)
  • May 2008 (19)
  • April 2008 (27)
  • March 2008 (22)
  • February 2008 (21)
  • January 2008 (15)
  • December 2007 (22)
  • November 2007 (17)
  • October 2007 (29)
  • September 2007 (31)
  • August 2007 (34)
  • July 2007 (31)
  • June 2007 (36)
  • May 2007 (23)
  • April 2007 (22)
  • March 2007 (30)
  • February 2007 (50)
  • January 2007 (75)
  • December 2006 (48)
  • November 2006 (59)
  • October 2006 (89)
  • September 2006 (29)
  • August 2006 (48)
  • July 2006 (14)
  • June 2006 (35)
  • May 2006 (62)
  • April 2006 (63)
  • March 2006 (72)
  • February 2006 (83)
  • January 2006 (56)
  • December 2005 (46)
  • November 2005 (60)
  • October 2005 (27)
  • September 2005 (54)
  • August 2005 (83)

Tags

  • amazon
  • api
  • apple
  • aws
  • browser
  • cdn
  • certificate
  • chrome
  • cloud
  • cloudflare
  • cpu
  • data
  • database
  • db
  • dns
  • ec2
  • engine
  • facebook
  • firefox
  • github
  • google
  • https
  • image
  • instance
  • javascript
  • js
  • linux
  • mysql
  • network
  • open
  • percona
  • performance
  • php
  • privacy
  • rdbms
  • security
  • server
  • service
  • source
  • speed
  • ssl
  • storage
  • tls
  • ubuntu
  • web

Blogroll

  • Ashley's BLOG
  • Gea-Suan Lin’s BLOG for Class^H^H^H^H^H ACG
  • Gea-Suan Lin’s BLOG for Networking
  • Gea-Suan Lin’s BLOG for Work
  • Gea-Suan Lin's Blog
  • Gea-Suan Lin's Wiki
  • Zonble
  • 蔡依林的部落格
Gea-Suan Lin's BLOG Proudly powered by WordPress