Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Download files from the Hub

The huggingface_hub library provides functions to download files from the repositories stored on the Hub. You can use these functions independently or integrate them into your own library, making it more convenient for your users to interact with the Hub. This guide will show you how to:

  • Specify a file to download from the Hub.
  • Download and cache a file on your disk.
  • Download all the files in a repository.

Choose a file to download

Use the filename parameter in the hf_hub_url() function to retrieve the URL of a specific file to download:

>>> from huggingface_hub import hf_hub_url
>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json")
'https://huggingface.co/lysandre/arxiv-nlp/resolve/main/config.json'

/docs/assets/hub/repo.png

Specify a particular file version by providing the file revision, which can be the branch name, a tag, or a commit hash. When using the commit hash, it must be the full-length hash instead of a 7-character commit hash:

>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", 
...            filename="config.json", 
...            revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a",
... )
'https://huggingface.co/lysandre/arxiv-nlp/resolve/877b84a8f93f2d619faa2a6e514a32beef88ab0a/config.json'

To specify a file revision with the branch name:

>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="main")

To specify a file revision with a tag identifier. For example, if you want v1.0 of the config.json file:

>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="v1.0")

Download and store a file

cached_download() is used to download and cache a file on your local disk. Once a file is stored in your cache, you don’t have to redownload it the next time you use it. cached_download() is a hands-free solution for staying up to date with new file versions. When a downloaded file is updated in the remote repository, cached_download() will automatically download and store it.

Begin by retrieving the file URL with hf_hub_url(), and then pass the specified URL to cached_download() to download the file:

>>> from huggingface_hub import hf_hub_url, cached_download
>>> config_file_url = hf_hub_url("lysandre/arxiv-nlp", filename="config.json")
>>> cached_download(config_file_url)
'/home/lysandre/.cache/huggingface/hub/bc0e8cc2f8271b322304e8bb84b3b7580701d53a335ab2d75da19c249e2eeebb.066dae6fdb1e2b8cce60c35cc0f78ed1451d9b341c78de19f3ad469d10a8cbb1'

hf_hub_url() and cached_download() work hand-in-hand to download a file. This is such a standard workflow that hf_hub_download() is a wrapper that calls both of these functions.

>>> from huggingface_hub import hf_hub_download
>>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json")

Download an entire repository

snapshot_download() downloads an entire repository at a given revision. Like cached_download(), all downloaded files are cached on your local disk. However, even if only a single file is updated, the entire repository will be redownloaded.

Download a whole repository as shown in the following:

>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="lysandre/arxiv-nlp")
'/home/lysandre/.cache/huggingface/hub/lysandre__arxiv-nlp.894a9adde21d9a3e3843e6d5aeaaf01875c7fade'

snapshot_download() downloads the latest revision by default. If you want a specific repository revision, use the revision parameter:

>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", revision="main")

In general, it is usually better to download files with hf_hub_download() - if you already know the file name - to avoid redownloading an entire repository. snapshot_download() is helpful when you are unaware of which files to download.

However, you don’t always want to download the contents of an entire repository with snapshot_download(). Even if you don’t know the file name, you can download specific files if you know the file type with allow_regex and ignore_regex. Use the allow_regex and ignore_regex arguments to specify which files to download. These parameters accept either a single regex or a list of regexes.

The regex matching is based on fnmatch, which provides support for Unix shell-style wildcards.

For example, you can use allow_regex to only download JSON configuration files:

>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", allow_regex="*.json")

On the other hand, ignore_regex can exclude certain files from being downloaded. The following example ignores the .msgpack and .h5 file extensions:

>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", ignore_regex=["*.msgpack", "*.h5"])

Passing a regex can be especially useful when repositories contain files that are never expected to be downloaded by snapshot_download().

Note that passing allow_regex or ignore_regex does not prevent snapshot_download() from redownloading the entire model repository if an ignored file is changed.