Hub Client Library documentation

Join the Hugging Face community

to get started

The huggingface_hub library provides functions to download files from the repositories stored on the Hub. You can use these functions independently or integrate them into your own library, making it more convenient for your users to interact with the Hub. This guide will show you how to:

It downloads the remote file, stores it on disk (in a version-aware way), and returns its local file path.

Use the repo_id and filename parameters to specify which file to download:

>>> from huggingface_hub import hf_hub_download
'/root/.cache/huggingface/hub/models--lysandre--arxiv-nlp/snapshots/894a9adde21d9a3e3843e6d5aeaaf01875c7fade/config.json'

Specify a particular file version by providing the file revision, which can be the branch name, a tag, or a commit hash. When using the commit hash, it must be the full-length hash instead of a 7-character commit hash:

>>> hf_hub_download(
...    repo_id="lysandre/arxiv-nlp",
...    filename="config.json",
...    revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a",
... )
'/root/.cache/huggingface/hub/models--lysandre--arxiv-nlp/snapshots/877b84a8f93f2d619faa2a6e514a32beef88ab0a/config.json'

To specify a file revision with the branch name:

>>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="main")

To specify a file revision with a tag identifier. For example, if you want v1.0 of the config.json file:

>>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="v1.0")

To download from a dataset or a space, specify the repo_type. By default, file will be considered as being part of a model repo.

>>> hf_hub_download(repo_id="google/fleurs", filename="fleurs.py", repo_type="dataset")

In case you want to construct the URL used to download a file from a repo, you can use hf_hub_url() which returns a URL. Note that it is used internally by hf_hub_download().

>>> from huggingface_hub import snapshot_download
'/home/lysandre/.cache/huggingface/hub/lysandre__arxiv-nlp.894a9adde21d9a3e3843e6d5aeaaf01875c7fade'

snapshot_download() downloads the latest revision by default. If you want a specific repository revision, use the revision parameter:

>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", revision="main")

However, you don’t always want to download the contents of an entire repository with snapshot_download(). Even if you don’t know the file name, you can download specific files if you know the file type with allow_patterns and ignore_patterns. Use the allow_patterns and ignore_patterns arguments to specify which files to download. These parameters accept either a single pattern or a list of patterns.

Patterns are Standard Wildcards (globbing patterns) as documented here. The pattern matching is based on fnmatch.

For example, you can use allow_patterns to only download JSON configuration files:

>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", allow_patterns="*.json")

On the other hand, ignore_patterns can exclude certain files from being downloaded. The following example ignores the .msgpack and .h5 file extensions:

>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", ignore_patterns=["*.msgpack", "*.h5"])