Download files from the Hub
The huggingface_hub
library provides functions to download files from the repositories
stored on the Hub. You can use these functions independently or integrate them into your
own library, making it more convenient for your users to interact with the Hub. This
guide will show you how to:
- Download and store a file from the Hub.
- Download all the files in a repository.
Download and store a file from the Hub
The hf_hub_download() function is the main function for downloading files from the Hub.
It downloads the remote file, stores it on disk (in a version-aware way), and returns its local file path.
Use the repo_id
and filename
parameters to specify which file to download:
>>> from huggingface_hub import hf_hub_download
>>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json")
'/root/.cache/huggingface/hub/models--lysandre--arxiv-nlp/snapshots/894a9adde21d9a3e3843e6d5aeaaf01875c7fade/config.json'


Specify a particular file version by providing the file revision, which can be the branch name, a tag, or a commit hash. When using the commit hash, it must be the full-length hash instead of a 7-character commit hash:
>>> hf_hub_download(
... repo_id="lysandre/arxiv-nlp",
... filename="config.json",
... revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a",
... )
'/root/.cache/huggingface/hub/models--lysandre--arxiv-nlp/snapshots/877b84a8f93f2d619faa2a6e514a32beef88ab0a/config.json'
To specify a file revision with the branch name:
>>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="main")
To specify a file revision with a tag identifier. For example, if you want v1.0
of the
config.json
file:
>>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="v1.0")
To download from a dataset
or a space
, specify the repo_type
. By default, file will
be considered as being part of a model
repo.
>>> hf_hub_download(repo_id="google/fleurs", filename="fleurs.py", repo_type="dataset")
Construct a download URL
In case you want to construct the URL used to download a file from a repo, you can use hf_hub_url() which returns a URL. Note that it is used internally by hf_hub_download().
Download an entire repository
snapshot_download() downloads an entire repository at a given revision. Like hf_hub_download(), all downloaded files are cached on your local disk.
Download a whole repository as shown in the following:
>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="lysandre/arxiv-nlp")
'/home/lysandre/.cache/huggingface/hub/lysandre__arxiv-nlp.894a9adde21d9a3e3843e6d5aeaaf01875c7fade'
snapshot_download() downloads the latest revision by default. If you want a specific
repository revision, use the revision
parameter:
>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", revision="main")
In general, it is usually better to download files with hf_hub_download() - if you already know the file names you need. snapshot_download() is helpful when you are unaware of which files to download.
However, you don’t always want to download the contents of an entire repository with
snapshot_download(). Even if you don’t know the file name, you can download specific
files if you know the file type with allow_patterns
and ignore_patterns
. Use the
allow_patterns
and ignore_patterns
arguments to specify which files to download. These
parameters accept either a single pattern or a list of patterns.
Patterns are Standard Wildcards (globbing patterns) as documented
here. The pattern
matching is based on fnmatch
.
For example, you can use allow_patterns
to only download JSON configuration files:
>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", allow_patterns="*.json")
On the other hand, ignore_patterns
can exclude certain files from being downloaded. The
following example ignores the .msgpack
and .h5
file extensions:
>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", ignore_patterns=["*.msgpack", "*.h5"])
Passing a pattern can be especially useful when repositories contain files that are never expected to be downloaded by snapshot_download().