Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Quickstart

In this quickstart, you’ll learn how to use the Datasets Server’s REST API to:

  • Check whether a dataset on the Hub is functional.
  • Return the configuration and splits of a dataset.
  • Preview the first 100 rows of a dataset.
  • Download slices of rows of a dataset.
  • Access the dataset as parquet files.

Each feature is served through an endpoint summarized in the table below:

Endpoint Method Description Query parameters
/valid GET Get the list of datasets hosted in the Hub and supported by the datasets server. none
/is-valid GET Check whether a specific dataset is valid. dataset: name of the dataset
/splits GET Get the list of configurations and splits of a dataset. dataset: name of the dataset
/first-rows GET Get the first rows of a dataset split. - dataset: name of the dataset
- config: name of the config
- split: name of the split
/rows GET Get a slice of rows of a dataset split. - dataset: name of the dataset
- config: name of the config
- split: name of the split
- offset: offset of the slice
- length: length of the slice (maximum 100)
/parquet GET Get the list of parquet files of a dataset. dataset: name of the dataset

There is no installation or setup required to use Datasets Server.

Sign up for a Hugging Face account if you don't already have one! While you can use Datasets Server without a Hugging Face account, you won't be able to access gated datasets like CommonVoice and ImageNet without providing a user token which you can find in your user settings.

Feel free to try out the API in Postman, ReDoc or RapidAPI. This quickstart will show you how to query the endpoints programmatically.

The base URL of the REST API is:

https://datasets-server.huggingface.co

Gated datasets

For gated datasets, you’ll need to provide your user token in headers of your query. Otherwise, you’ll get an error message to retry with authentication.

Python
JavaScript
cURL
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://datasets-server.huggingface.co/is-valid?dataset=mozilla-foundation/common_voice_10_0"
def query():
    response = requests.get(API_URL, headers=headers)
    return response.json()
data = query()

Check dataset validity

The /valid endpoint returns a JSON list of datasets stored on the Hub that load without any errors:

Python
JavaScript
cURL
import requests
API_URL = "https://datasets-server.huggingface.co/valid"
def query():
    response = requests.get(API_URL)
    return response.json()
data = query()

To check whether a specific dataset is valid, for example, Rotten Tomatoes, use the /is-valid endpoint instead:

Python
JavaScript
cURL
import requests
API_URL = "https://datasets-server.huggingface.co/is-valid?dataset=rotten_tomatoes"
def query():
    response = requests.get(API_URL)
    return response.json()
data = query()

List configurations and splits

The /splits endpoint returns a JSON list of the splits in a dataset:

Python
JavaScript
cURL
import requests
API_URL = "https://datasets-server.huggingface.co/splits?dataset=rotten_tomatoes"
def query():
    response = requests.get(API_URL)
    return response.json()
data = query()

Preview a dataset

The /first-rows endpoint returns a JSON list of the first 100 rows of a dataset. It also returns the types of data features (“columns” data types). You should specify the dataset name, configuration name (you can find out the configuration name from the /splits endpoint), and split name of the dataset you’d like to preview:

Python
JavaScript
cURL
import requests
API_URL = "https://datasets-server.huggingface.co/first-rows?dataset=rotten_tomatoes&config=default&split=train"
def query():
    response = requests.get(API_URL)
    return response.json()
data = query()

Download slices of a dataset

The /rows endpoint returns a JSON list of a slice of rows of a dataset at any given location (offset). It also returns the types of data features (“columns” data types). You should specify the dataset name, configuration name (you can find out the configuration name from the /splits endpoint), the split name and the offset and length of the slice you’d like to download:

Python
JavaScript
cURL
import requests
API_URL = "https://datasets-server.huggingface.co/rows?dataset=rotten_tomatoes&config=default&split=train&offset=150&length=10"
def query():
    response = requests.get(API_URL)
    return response.json()
data = query()

You can download slices of 100 rows maximum at a time.

Access Parquet files

Datasets Server converts every public dataset on the Hub to the Parquet format. The /parquet endpoint returns a JSON list of the Parquet URLs for a dataset:

Python
JavaScript
cURL
import requests
API_URL = "https://datasets-server.huggingface.co/parquet?dataset=rotten_tomatoes"
def query():
    response = requests.get(API_URL)
    return response.json()
data = query()