The huggingface_hub library provides functions to download files from the repositories stored on the Hub. You can use these functions independently or integrate them into your own library, making it more convenient for your users to interact with the Hub. This guide will show you how to:

Use the filename parameter in the hf_hub_url() function to retrieve the URL of a specific file to download:

>>> from huggingface_hub import hf_hub_url
>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json")
'https://huggingface.co/lysandre/arxiv-nlp/resolve/main/config.json'

Specify a particular file version by providing the file revision, which can be the branch name, a tag, or a commit hash. When using the commit hash, it must be the full-length hash instead of a 7-character commit hash:

>>> hf_hub_url(repo_id="lysandre/arxiv-nlp",
...            filename="config.json",
...            revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a",
... )
'https://huggingface.co/lysandre/arxiv-nlp/resolve/877b84a8f93f2d619faa2a6e514a32beef88ab0a/config.json'

To specify a file revision with the branch name:

>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="main")

To specify a file revision with a tag identifier. For example, if you want v1.0 of the config.json file:

>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="v1.0")

>>> from huggingface_hub import hf_hub_url, cached_download
>>> config_file_url = hf_hub_url("lysandre/arxiv-nlp", filename="config.json")
'/home/lysandre/.cache/huggingface/hub/bc0e8cc2f8271b322304e8bb84b3b7580701d53a335ab2d75da19c249e2eeebb.066dae6fdb1e2b8cce60c35cc0f78ed1451d9b341c78de19f3ad469d10a8cbb1'

>>> from huggingface_hub import hf_hub_download
>>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json")

>>> from huggingface_hub import snapshot_download
'/home/lysandre/.cache/huggingface/hub/lysandre__arxiv-nlp.894a9adde21d9a3e3843e6d5aeaaf01875c7fade'

snapshot_download() downloads the latest revision by default. If you want a specific repository revision, use the revision parameter:

>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", revision="main")

However, you don’t always want to download the contents of an entire repository with snapshot_download(). Even if you don’t know the file name, you can download specific files if you know the file type with allow_regex and ignore_regex. Use the allow_regex and ignore_regex arguments to specify which files to download. These parameters accept either a single regex or a list of regexes.

The regex matching is based on fnmatch, which provides support for Unix shell-style wildcards.

For example, you can use allow_regex to only download JSON configuration files:

>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", allow_regex="*.json")

On the other hand, ignore_regex can exclude certain files from being downloaded. The following example ignores the .msgpack and .h5 file extensions:

>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", ignore_regex=["*.msgpack", "*.h5"])

Note that passing allow_regex or ignore_regex does not prevent snapshot_download() from redownloading the entire model repository if an ignored file is changed.