DuckDB

DuckDB is a database that supports reading and querying Parquet files really fast. Begin by creating a connection to DuckDB, and then install and load the httpfs extension to read and write remote files:

Python

JavaScript

Now you can write and execute your SQL query on the Parquet file:

Python

JavaScript

To query multiple files - for example, if the dataset is sharded:

Python

JavaScript

DuckDB-Wasm, a package powered by WebAssembly, is also available for running DuckDB in any browser. This could be useful, for instance, if you want to create a web app to query Parquet files from the browser!

< > Update on GitHub

Dataset viewer

DuckDB