Hub Python Library documentation

Downloading files

You are viewing v0.5.1 version. A newer version v0.23.1 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Downloading files

huggingface_hub.hf_hub_download

< >

( repo_id: str filename: str subfolder: typing.Optional[str] = None repo_type: typing.Optional[str] = None revision: typing.Optional[str] = None library_name: typing.Optional[str] = None library_version: typing.Optional[str] = None cache_dir: typing.Union[str, pathlib.Path, NoneType] = None user_agent: typing.Union[typing.Dict, str, NoneType] = None force_download: typing.Optional[bool] = False force_filename: typing.Optional[str] = None proxies: typing.Optional[typing.Dict] = None etag_timeout: typing.Optional[float] = 10 resume_download: typing.Optional[bool] = False use_auth_token: typing.Union[bool, str, NoneType] = None local_files_only: typing.Optional[bool] = False )

Parameters

  • repo_id (str) — A user or an organization name and a repo name separated by a /.
  • filename (str) — The name of the file in the repo.
  • subfolder (str, optional) — An optional value corresponding to a folder inside the model repo.
  • repo_type (str, optional) — Set to "dataset" or "space" if uploading to a dataset or space, None or "model" if uploading to a model. Default is None.
  • revision (str, optional) — An optional Git revision id which can be a branch name, a tag, or a commit hash.
  • library_name (str, optional) — The name of the library to which the object corresponds.
  • library_version (str, optional) — The version of the library.
  • cache_dir (str, Path, optional) — Path to the folder where cached files are stored.
  • user_agent (dict, str, optional) — The user-agent info in the form of a dictionary or a string.
  • force_download (bool, optional, defaults to False) — Whether the file should be downloaded even if it already exists in the local cache.
  • force_filename (str, optional) — Use this name instead of a generated file name.
  • proxies (dict, optional) — Dictionary mapping protocol to the URL of the proxy passed to requests.request.
  • etag_timeout (float, optional, defaults to 10) — When fetching ETag, how many seconds to wait for the server to send data before giving up which is passed to requests.request.
  • resume_download (bool, optional, defaults to False) — If True, resume a previously interrupted download.
  • use_auth_token (str, bool, optional) — A token to be used for the download.
    • If True, the token is read from the HuggingFace config folder.
    • If a string, it’s used as the authentication token.
  • local_files_only (bool, optional, defaults to False) — If True, avoid downloading the file and return the path to the local cached file if it exists.

Download a given file if it’s not already present in the local cache.

Raises the following errors:

huggingface_hub.snapshot_download

< >

( repo_id: str revision: typing.Optional[str] = None cache_dir: typing.Union[str, pathlib.Path, NoneType] = None library_name: typing.Optional[str] = None library_version: typing.Optional[str] = None user_agent: typing.Union[typing.Dict, str, NoneType] = None proxies: typing.Optional[typing.Dict] = None etag_timeout: typing.Optional[float] = 10 resume_download: typing.Optional[bool] = False use_auth_token: typing.Union[bool, str, NoneType] = None local_files_only: typing.Optional[bool] = False allow_regex: typing.Union[typing.List[str], str, NoneType] = None ignore_regex: typing.Union[typing.List[str], str, NoneType] = None )

Parameters

  • repo_id (str) — A user or an organization name and a repo name separated by a /.
  • revision (str, optional) — An optional Git revision id which can be a branch name, a tag, or a commit hash.
  • cache_dir (str, Path, optional) — Path to the folder where cached files are stored.
  • library_name (str, optional) — The name of the library to which the object corresponds.
  • library_version (str, optional) — The version of the library.
  • user_agent (str, dict, optional) — The user-agent info in the form of a dictionary or a string.
  • proxies (dict, optional) — Dictionary mapping protocol to the URL of the proxy passed to requests.request.
  • etag_timeout (float, optional, defaults to 10) — When fetching ETag, how many seconds to wait for the server to send data before giving up which is passed to requests.request.
  • resume_download (bool, optional, defaults to False) -- If True`, resume a previously interrupted download.
  • use_auth_token (str, bool, optional) — A token to be used for the download.
    • If True, the token is read from the HuggingFace config folder.
    • If a string, it’s used as the authentication token.
  • local_files_only (bool, optional, defaults to False) — If True, avoid downloading the file and return the path to the local cached file if it exists.
  • allow_regex (list of str, str, optional) — If provided, only files matching this regex are downloaded.
  • ignore_regex (list of str, str, optional) — If provided, files matching this regex are not downloaded.

Download all files of a repo.

Downloads a whole snapshot of a repo’s files at the specified revision. This is useful when you want all files from a repo, because you don’t know which ones you will need a priori. All files are nested inside a folder in order to keep their actual filename relative to that folder.

An alternative would be to just clone a repo but this would require that the user always has git and git-lfs installed, and properly configured.

Raises the following errors:

huggingface_hub.cached_download

< >

( url: str library_name: typing.Optional[str] = None library_version: typing.Optional[str] = None cache_dir: typing.Union[str, pathlib.Path, NoneType] = None user_agent: typing.Union[typing.Dict, str, NoneType] = None force_download: typing.Optional[bool] = False force_filename: typing.Optional[str] = None proxies: typing.Optional[typing.Dict] = None etag_timeout: typing.Optional[float] = 10 resume_download: typing.Optional[bool] = False use_auth_token: typing.Union[bool, str, NoneType] = None local_files_only: typing.Optional[bool] = False )

Parameters

  • url (str) — The path to the file to be downloaded.
  • library_name (str, optional) — The name of the library to which the object corresponds.
  • library_version (str, optional) — The version of the library.
  • cache_dir (str, Path, optional) — Path to the folder where cached files are stored.
  • user_agent (dict, str, optional) — The user-agent info in the form of a dictionary or a string.
  • force_download (bool, optional, defaults to False) — Whether the file should be downloaded even if it already exists in the local cache.
  • force_filename (str, optional) — Use this name instead of a generated file name.
  • proxies (dict, optional) — Dictionary mapping protocol to the URL of the proxy passed to requests.request.
  • etag_timeout (float, optional defaults to 10) — When fetching ETag, how many seconds to wait for the server to send data before giving up which is passed to requests.request.
  • resume_download (bool, optional, defaults to False) — If True, resume a previously interrupted download.
  • use_auth_token (bool, str, optional) — A token to be used for the download.
    • If True, the token is read from the HuggingFace config folder.
    • If a string, it’s used as the authentication token.
  • local_files_only (bool, optional, defaults to False) — If True, avoid downloading the file and return the path to the local cached file if it exists.

Download from a given URL and cache it if it’s not already present in the local cache.

Given a URL, this function looks for the corresponding file in the local cache. If it’s not there, download it. Then return the path to the cached file.

Raises the following errors:

huggingface_hub.hf_hub_url

< >

( repo_id: str filename: str subfolder: typing.Optional[str] = None repo_type: typing.Optional[str] = None revision: typing.Optional[str] = None )

Parameters

  • repo_id (str) — A namespace (user or an organization) name and a repo name separated by a /.
  • filename (str) — The name of the file in the repo.
  • subfolder (str, optional) — An optional value corresponding to a folder inside the repo.
  • repo_type (str, optional) — Set to "dataset" or "space" if uploading to a dataset or space, None or "model" if uploading to a model. Default is None.
  • revision (str, optional) — An optional Git revision id which can be a branch name, a tag, or a commit hash.

Construct the URL of a file from the given information.

The resolved address can either be a huggingface.co-hosted url, or a link to Cloudfront (a Content Delivery Network, or CDN) for large files which are more than a few MBs.

Example:

>>> from huggingface_hub import hf_hub_url

>>> hf_hub_url(
...     repo_id="julien-c/EsperBERTo-small", filename="pytorch_model.bin"
... )
'https://huggingface.co/julien-c/EsperBERTo-small/resolve/main/pytorch_model.bin'

Notes:

Cloudfront is replicated over the globe so downloads are way faster for the end user (and it also lowers our bandwidth costs).

Cloudfront aggressively caches files by default (default TTL is 24 hours), however this is not an issue here because we implement a git-based versioning system on huggingface.co, which means that we store the files on S3/Cloudfront in a content-addressable way (i.e., the file name is its hash). Using content-addressable filenames means cache can’t ever be stale.

In terms of client-side caching from this library, we base our caching on the objects’ entity tag (ETag), which is an identifier of a specific version of a resource [1]_. An object’s ETag is: its git-sha1 if stored in git, or its sha256 if stored in git-lfs.

References: