Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started


Datasets Server automatically converts and publishes public datasets less than 5GB on the Hub as Parquet files. Parquet files are column-based and they shine when you’re working with big data. There are several different libraries you can use to work with the published Parquet files:

  • ClickHouse, a column-oriented database management system for online analytical processing
  • DuckDB, a high-performance SQL database for analytical queries
  • Pandas, a data analysis tool for working with data structures
  • Polars, a Rust based DataFrame library