在 Hacker News 上看到「Xlite: Query Excel and Open Document spreadsheets as SQLite virtual tables (github.com/x2bool)」這個專案,就如同說明,是一個支援讀取 Excel 檔案的 SQLite extension,原網站在 x2bool/xlite 這邊。
依照說明支援舊的 .xls
與新的 .xlsx
的格式,但不知道公式運算支援到什麼程度...
先 load extension:
sqlite3 # will open SQLite CLI > .load libxlite
接著是建立 virtual table:
CREATE VIRTUAL TABLE class_data USING xlite( FILENAME './path/to/example.xlsx', WORKSHEET 'Class Data', RANGE 'A2:F' );
接下來就可以搞事了:
SELECT COUNT(*), D FROM class_data GROUP BY D ORDER BY COUNT(*);
看起來是用 Rust + C 寫的,然後作者有提醒這是寫興趣的專案:
This project is experimental, use at your own risk. The project is developed in my free time as a way to learn Rust and database systems.
反倒是 Hacker News 討論串裡面提到了 multiprocessio/dsq 這個專案,看起來發展的比較久,支援度也比較完整了:
Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more
不過就不是綁 SQLite 了 (雖然還是有關),從範例可以看到他是獨立的程式:
$ dsq testdata.json "SELECT * FROM {} WHERE x > 10"
Under the hood dsq uses DataStation as a library and under that hood DataStation uses SQLite to power these kinds of SQL queries on arbitrary (structured) data.
如果是真的要用的話,這套看起來應該會好一些...