Hub Python Library documentation

Managing local and online repositories

You are viewing v0.24.5 version. A newer version v0.25.1 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Managing local and online repositories

The Repository class is a helper class that wraps git and git-lfs commands. It provides tooling adapted for managing repositories which can be very large.

It is the recommended tool as soon as any git operation is involved, or when collaboration will be a point of focus with the repository itself.

The Repository class

class huggingface_hub.Repository

< >

( local_dir: Union clone_from: Optional = None repo_type: Optional = None token: Union = True git_user: Optional = None git_email: Optional = None revision: Optional = None skip_lfs_files: bool = False client: Optional = None )

Helper class to wrap the git and git-lfs commands.

The aim is to facilitate interacting with huggingface.co hosted model or dataset repos, though not a lot here (if any) is actually specific to huggingface.co.

Repository is deprecated in favor of the http-based alternatives implemented in HfApi. Given its large adoption in legacy code, the complete removal of Repository will only happen in release v1.0. For more details, please read https://huggingface.co/docs/huggingface_hub/concepts/git_vs_http.

__init__

< >

( local_dir: Union clone_from: Optional = None repo_type: Optional = None token: Union = True git_user: Optional = None git_email: Optional = None revision: Optional = None skip_lfs_files: bool = False client: Optional = None )

Parameters

  • local_dir (str or Path) — path (e.g. 'my_trained_model/') to the local directory, where the Repository will be initialized.
  • clone_from (str, optional) — Either a repository url or repo_id. Example:
    • "https://huggingface.co/philschmid/playground-tests"
    • "philschmid/playground-tests"
  • repo_type (str, optional) — To set when cloning a repo from a repo_id. Default is model.
  • token (bool or str, optional) — A valid authentication token (see https://huggingface.co/settings/token). If None or True and machine is logged in (through huggingface-cli login or login()), token will be retrieved from the cache. If False, token is not sent in the request header.
  • git_user (str, optional) — will override the git config user.name for committing and pushing files to the hub.
  • git_email (str, optional) — will override the git config user.email for committing and pushing files to the hub.
  • revision (str, optional) — Revision to checkout after initializing the repository. If the revision doesn’t exist, a branch will be created with that revision name from the default branch’s current HEAD.
  • skip_lfs_files (bool, optional, defaults to False) — whether to skip git-LFS files or not.
  • client (HfApi, optional) — Instance of HfApi to use when calling the HF Hub API. A new instance will be created if this is left to None.

Raises

EnvironmentError

  • EnvironmentError β€” If the remote repository set in clone_from does not exist.

Instantiate a local clone of a git repo.

If clone_from is set, the repo will be cloned from an existing remote repository. If the remote repo does not exist, a EnvironmentError exception will be thrown. Please create the remote repo first using create_repo().

Repository uses the local git credentials by default. If explicitly set, the token or the git_user/git_email pair will be used instead.

current_branch

< >

( ) β†’ str

Returns

str

Current checked out branch.

Returns the current checked out branch.

add_tag

< >

( tag_name: str message: Optional = None remote: Optional = None )

Parameters

  • tag_name (str) — The name of the tag to be added.
  • message (str, optional) — The message that accompanies the tag. The tag will turn into an annotated tag if a message is passed.
  • remote (str, optional) — The remote on which to add the tag.

Add a tag at the current head and push it

If remote is None, will just be updated locally

If no message is provided, the tag will be lightweight. if a message is provided, the tag will be annotated.

auto_track_binary_files

< >

( pattern: str = '.' ) β†’ List[str]

Parameters

  • pattern (str, optional, defaults to ”.“) — The pattern with which to track files that are binary.

Returns

List[str]

List of filenames that are now tracked due to being binary files

Automatically track binary files with git-lfs.

auto_track_large_files

< >

( pattern: str = '.' ) β†’ List[str]

Parameters

  • pattern (str, optional, defaults to ”.“) — The pattern with which to track files that are above 10MBs.

Returns

List[str]

List of filenames that are now tracked due to their size.

Automatically track large files (files that weigh more than 10MBs) with git-lfs.

check_git_versions

< >

( )

Raises

EnvironmentError

Checks that git and git-lfs can be run.

clone_from

< >

( repo_url: str token: Union = None )

Parameters

  • repo_url (str) — The URL from which to clone the repository
  • token (Union[str, bool], optional) — Whether to use the authentication token. It can be:
    • a string which is the token itself
    • False, which would not use the authentication token
    • True, which would fetch the authentication token from the local folder and use it (you should be logged in for this to work).
    • None, which would retrieve the value of self.huggingface_token.

Clone from a remote. If the folder already exists, will try to clone the repository within it.

If this folder is a git repository with linked history, will try to update the repository.

Raises the following error:

  • ValueError if an organization token (starts with β€œapi_org”) is passed. Use must use your own personal access token (see https://hf.co/settings/tokens).

  • EnvironmentError if you are trying to clone the repository in a non-empty folder, or if the git operations raise errors.

commit

< >

( commit_message: str branch: Optional = None track_large_files: bool = True blocking: bool = True auto_lfs_prune: bool = False )

Parameters

  • commit_message (str) — Message to use for the commit.
  • branch (str, optional) — The branch on which the commit will appear. This branch will be checked-out before any operation.
  • track_large_files (bool, optional, defaults to True) — Whether to automatically track large files or not. Will do so by default.
  • blocking (bool, optional, defaults to True) — Whether the function should return only when the git push has finished.
  • auto_lfs_prune (bool, defaults to True) — Whether to automatically prune files once they have been pushed to the remote.

Context manager utility to handle committing to a repository. This automatically tracks large files (>10Mb) with git-lfs. Set the track_large_files argument to False if you wish to ignore that behavior.

Examples:

>>> with Repository(
...     "text-files",
...     clone_from="<user>/text-files",
...     token=True,
>>> ).commit("My first file :)"):
...     with open("file.txt", "w+") as f:
...         f.write(json.dumps({"hey": 8}))

>>> import torch

>>> model = torch.nn.Transformer()
>>> with Repository(
...     "torch-model",
...     clone_from="<user>/torch-model",
...     token=True,
>>> ).commit("My cool model :)"):
...     torch.save(model.state_dict(), "model.pt")

delete_tag

< >

( tag_name: str remote: Optional = None ) β†’ bool

Parameters

  • tag_name (str) — The tag name to delete.
  • remote (str, optional) — The remote on which to delete the tag.

Returns

bool

True if deleted, False if the tag didn’t exist. If remote is not passed, will just be updated locally

Delete a tag, both local and remote, if it exists

git_add

< >

( pattern: str = '.' auto_lfs_track: bool = False )

Parameters

  • pattern (str, optional, defaults to ”.“) — The pattern with which to add files to staging.
  • auto_lfs_track (bool, optional, defaults to False) — Whether to automatically track large and binary files with git-lfs. Any file over 10MB in size, or in binary format, will be automatically tracked.

git add

Setting the auto_lfs_track parameter to True will automatically track files that are larger than 10MB with git-lfs.

git_checkout

< >

( revision: str create_branch_ok: bool = False )

Parameters

  • revision (str) — The revision to checkout.
  • create_branch_ok (str, optional, defaults to False) — Whether creating a branch named with the revision passed at the current checked-out reference if revision isn’t an existing revision is allowed.

git checkout a given revision

Specifying create_branch_ok to True will create the branch to the given revision if that revision doesn’t exist.

git_commit

< >

( commit_message: str = 'commit files to HF hub' )

Parameters

  • commit_message (str, optional, defaults to “commit files to HF hub”) — The message attributed to the commit.

git commit

git_config_username_and_email

< >

( git_user: Optional = None git_email: Optional = None )

Parameters

  • git_user (str, optional) — The username to register through git.
  • git_email (str, optional) — The email to register through git.

Sets git username and email (only in the current repo).

git_credential_helper_store

< >

( )

Sets the git credential helper to store

git_head_commit_url

< >

( ) β†’ str

Returns

str

The URL to the current checked-out commit.

Get URL to last commit on HEAD. We assume it’s been pushed, and the url scheme is the same one as for GitHub or HuggingFace.

git_head_hash

< >

( ) β†’ str

Returns

str

The current checked out commit SHA.

Get commit sha on top of HEAD.

git_pull

< >

( rebase: bool = False lfs: bool = False )

Parameters

  • rebase (bool, optional, defaults to False) — Whether to rebase the current branch on top of the upstream branch after fetching.
  • lfs (bool, optional, defaults to False) — Whether to fetch the LFS files too. This option only changes the behavior when a repository was cloned without fetching the LFS files; calling repo.git_pull(lfs=True) will then fetch the LFS file from the remote repository.

git pull

git_push

< >

( upstream: Optional = None blocking: bool = True auto_lfs_prune: bool = False )

Parameters

  • upstream (str, optional) — Upstream to which this should push. If not specified, will push to the lastly defined upstream or to the default one (origin main).
  • blocking (bool, optional, defaults to True) — Whether the function should return only when the push has finished. Setting this to False will return an CommandInProgress object which has an is_done property. This property will be set to True when the push is finished.
  • auto_lfs_prune (bool, optional, defaults to False) — Whether to automatically prune files once they have been pushed to the remote.

git push

If used without setting blocking, will return url to commit on remote repo. If used with blocking=True, will return a tuple containing the url to commit and the command object to follow for information about the process.

git_remote_url

< >

( ) β†’ str

Returns

str

The URL of the origin remote.

Get URL to origin remote.

is_repo_clean

< >

( ) β†’ bool

Returns

bool

True if the git status is clean, False otherwise.

Return whether or not the git status is clean or not

lfs_enable_largefiles

< >

( )

HF-specific. This enables upload support of files >5GB.

lfs_prune

< >

( recent = False )

Parameters

  • recent (bool, optional, defaults to False) — Whether to prune files even if they were referenced by recent commits. See the following link for more information.

git lfs prune

lfs_track

< >

( patterns: Union filename: bool = False )

Parameters

  • patterns (Union[str, List[str]]) — The pattern, or list of patterns, to track with git-lfs.
  • filename (bool, optional, defaults to False) — Whether to use the patterns as literal filenames.

Tell git-lfs to track files according to a pattern.

Setting the filename argument to True will treat the arguments as literal filenames, not as patterns. Any special glob characters in the filename will be escaped when writing to the .gitattributes file.

lfs_untrack

< >

( patterns: Union )

Parameters

  • patterns (Union[str, List[str]]) — The pattern, or list of patterns, to untrack with git-lfs.

Tell git-lfs to untrack those files.

list_deleted_files

< >

( ) β†’ List[str]

Returns

List[str]

A list of files that have been deleted in the working directory or index.

Returns a list of the files that are deleted in the working directory or index.

push_to_hub

< >

( commit_message: str = 'commit files to HF hub' blocking: bool = True clean_ok: bool = True auto_lfs_prune: bool = False )

Parameters

  • commit_message (str) — Message to use for the commit.
  • blocking (bool, optional, defaults to True) — Whether the function should return only when the git push has finished.
  • clean_ok (bool, optional, defaults to True) — If True, this function will return None if the repo is untouched. Default behavior is to fail because the git command fails.
  • auto_lfs_prune (bool, optional, defaults to False) — Whether to automatically prune files once they have been pushed to the remote.

Helper to add, commit, and push files to remote repository on the HuggingFace Hub. Will automatically track large files (>10MB).

tag_exists

< >

( tag_name: str remote: Optional = None ) β†’ bool

Parameters

  • tag_name (str) — The name of the tag to check.
  • remote (str, optional) — Whether to check if the tag exists on a remote. This parameter should be the identifier of the remote.

Returns

bool

Whether the tag exists.

Check if a tag exists or not.

wait_for_commands

< >

( )

Blocking method: blocks all subsequent execution until all commands have been processed.

Helper methods

huggingface_hub.repository.is_git_repo

< >

( folder: Union ) β†’ bool

Parameters

  • folder (str) — The folder in which to run the command.

Returns

bool

True if the repository is part of a repository, False otherwise.

Check if the folder is the root or part of a git repository

huggingface_hub.repository.is_local_clone

< >

( folder: Union remote_url: str ) β†’ bool

Parameters

  • folder (str or Path) — The folder in which to run the command.
  • remote_url (str) — The url of a git repository.

Returns

bool

True if the repository is a local clone of the remote repository specified, False otherwise.

Check if the folder is a local clone of the remote_url

huggingface_hub.repository.is_tracked_with_lfs

< >

( filename: Union ) β†’ bool

Parameters

  • filename (str or Path) — The filename to check.

Returns

bool

True if the file passed is tracked with git-lfs, False otherwise.

Check if the file passed is tracked with git-lfs.

huggingface_hub.repository.is_git_ignored

< >

( filename: Union ) β†’ bool

Parameters

  • filename (str or Path) — The filename to check.

Returns

bool

True if the file passed is ignored by git, False otherwise.

Check if file is git-ignored. Supports nested .gitignore files.

huggingface_hub.repository.files_to_be_staged

< >

( pattern: str = '.' folder: Union = None ) β†’ List[str]

Parameters

  • pattern (str or Path) — The pattern of filenames to check. Put . to get all files.
  • folder (str or Path) — The folder in which to run the command.

Returns

List[str]

List of files that are to be staged.

Returns a list of filenames that are to be staged.

huggingface_hub.repository.is_tracked_upstream

< >

( folder: Union ) β†’ bool

Parameters

  • folder (str or Path) — The folder in which to run the command.

Returns

bool

True if the current checked-out branch is tracked upstream, False otherwise.

Check if the current checked-out branch is tracked upstream.

huggingface_hub.repository.commits_to_push

< >

( folder: Union upstream: Optional = None ) β†’ int

Parameters

  • folder (str or Path) — The folder in which to run the command.
  • upstream (str, optional) —

Returns

int

Number of commits that would be pushed upstream were a git push to proceed.

Check the number of commits that would be pushed upstream

The name of the upstream repository with which the comparison should be made.

Following asynchronous commands

The Repository utility offers several methods which can be launched asynchronously:

  • git_push
  • git_pull
  • push_to_hub
  • The commit context manager

See below for utilities to manage such asynchronous methods.

class huggingface_hub.Repository

< >

( local_dir: Union clone_from: Optional = None repo_type: Optional = None token: Union = True git_user: Optional = None git_email: Optional = None revision: Optional = None skip_lfs_files: bool = False client: Optional = None )

Helper class to wrap the git and git-lfs commands.

The aim is to facilitate interacting with huggingface.co hosted model or dataset repos, though not a lot here (if any) is actually specific to huggingface.co.

Repository is deprecated in favor of the http-based alternatives implemented in HfApi. Given its large adoption in legacy code, the complete removal of Repository will only happen in release v1.0. For more details, please read https://huggingface.co/docs/huggingface_hub/concepts/git_vs_http.

commands_failed

< >

( )

Returns the asynchronous commands that failed.

commands_in_progress

< >

( )

Returns the asynchronous commands that are currently in progress.

wait_for_commands

< >

( )

Blocking method: blocks all subsequent execution until all commands have been processed.

class huggingface_hub.repository.CommandInProgress

< >

( title: str is_done_method: Callable status_method: Callable process: Popen post_method: Optional = None )

Utility to follow commands launched asynchronously.

< > Update on GitHub