Hub Python Library documentation

Upload files to the Hub

You are viewing v0.14.1 version. A newer version v0.26.2 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Upload files to the Hub

Sharing your files and work is an important aspect of the Hub. The huggingface_hub offers several options for uploading your files to the Hub. You can use these functions independently or integrate them into your library, making it more convenient for your users to interact with the Hub. This guide will show you how to push files:

  • without using Git.
  • that are very large with Git LFS.
  • with the commit context manager.
  • with the push_to_hub() function.

Whenever you want to upload files to the Hub, you need to log in to your Hugging Face account:

  • Log in to your Hugging Face account with the following command:

    huggingface-cli login
    # or using an environment variable
    huggingface-cli login --token $HUGGINGFACE_TOKEN
  • Alternatively, you can programmatically login using login() in a notebook or a script:

    >>> from huggingface_hub import login
    >>> login()

    If ran in a Jupyter or Colaboratory notebook, login() will launch a widget from which you can enter your Hugging Face access token. Otherwise, a message will be prompted in the terminal.

    It is also possible to login programmatically without the widget by directly passing the token to login(). If you do so, be careful when sharing your notebook. It is best practice to load the token from a secure vault instead of saving it in plain in your Colaboratory notebook.

Push files without Git

If you don’t have Git installed on your system, use create_commit() to push your files to the Hub. create_commit() uses the HTTP protocol to upload files to the Hub.

However, create_commit() is a low-level API for working at a commit level. The upload_file() and upload_folder() functions are higher-level APIs that use create_commit() under the hood and are generally more convenient. We recommend trying these functions first if you don’t need to work at a lower level.

Upload a file

Once you’ve created a repository with create_repo(), you can upload a file to your repository using upload_file().

Specify the path of the file to upload, where you want to upload the file to in the repository, and the name of the repository you want to add the file to. Depending on your repository type, you can optionally set the repository type as a dataset, model, or space.

>>> from huggingface_hub import HfApi
>>> api = HfApi()
>>> api.upload_file(
...     path_or_fileobj="/path/to/local/folder/README.md",
...     path_in_repo="README.md",
...     repo_id="username/test-dataset",
...     repo_type="dataset",
... )

Upload a folder

Use the upload_folder() function to upload a local folder to an existing repository. Specify the path of the local folder to upload, where you want to upload the folder to in the repository, and the name of the repository you want to add the folder to. Depending on your repository type, you can optionally set the repository type as a dataset, model, or space.

>>> from huggingface_hub import HfApi
>>> api = HfApi()

# Upload all the content from the local folder to your remote Space.
# By default, files are uploaded at the root of the repo
>>> api.upload_folder(
...     folder_path="/path/to/local/space",
...     repo_id="username/my-cool-space",
...     repo_type="space",
... )

Use the allow_patterns and ignore_patterns arguments to specify which files to upload. These parameters accept either a single pattern or a list of patterns. Patterns are Standard Wildcards (globbing patterns) as documented here. If both allow_patterns and ignore_patterns are provided, both constraints apply. By default, all files from the folder are uploaded.

Any .git/ folder present in any subdirectory will be ignored. However, please be aware that the .gitignore file is not taken into account. This means you must use allow_patterns and ignore_patterns to specify which files to upload instead.

>>> api.upload_folder(
...     folder_path="/path/to/local/folder",
...     path_in_repo="my-dataset/train", # Upload to a specific folder
...     repo_id="username/test-dataset",
...     repo_type="dataset",
...     ignore_patterns="**/logs/*.txt", # Ignore all text logs
... )

You can also use the delete_patterns argument to specify files you want to delete from the repo in the same commit. This can prove useful if you want to clean a remote folder before pushing files in it and you don’t know which files already exists.

The example below uploads the local ./logs folder to the remote /experiment/logs/ folder. Only txt files are uploaded but before that, all previous logs on the repo on deleted. All of this in a single commit.

>>> api.upload_folder(
...     folder_path="/path/to/local/folder/logs",
...     repo_id="username/trained-model",
...     path_in_repo="experiment/logs/",
...     allow_patterns="*.txt", # Upload all local text files
...     delete_patterns="*.txt", # Delete all remote text files before
... )

Upload a folder by chunks

upload_folder() makes it easy to upload an entire folder to the Hub. However, for large folders (thousands of files or hundreds of GB), it can still be challenging. If you have a folder with a lot of files, you might want to upload it in several commits. If you experience an error or a connection issue during the upload, you would not have to resume the process from the beginning.

To upload a folder in multiple commits, just pass multi_commits=True as argument. Under the hood, huggingface_hub will list the files to upload/delete and split them in several commits. The “strategy” (i.e. how to split the commits) is based on the number and size of the files to upload. A PR is open on the Hub to push all the commits. Once the PR is ready, the commits are squashed into a single commit. If the process is interrupted before completing, you can rerun your script to resume the upload. The created PR will be automatically detected and the upload will resume from where it stopped. It is recommended to pass multi_commits_verbose=True to get a better understanding of the upload and its progress.

The example below will upload the checkpoints folder to a dataset in multiple commits. A PR will be created on the Hub and merged automatically once the upload is complete. If you prefer the PR to stay open and review it manually, you can pass create_pr=True.

>>> upload_folder(
...     folder_path="local/checkpoints",
...     repo_id="username/my-dataset",
...     repo_type="dataset",
...     multi_commits=True,
...     multi_commits_verbose=True,
... )

If you want a better control on the upload strategy (i.e. the commits that are created), you can have a look at the low-level plan_multi_commits() and create_commits_on_pr() methods.

multi_commits is still an experimental feature. Its API and behavior is subject to change in the future without prior notice.

create_commit

If you want to work at a commit-level, use the create_commit() function directly. There are two types of operations supported by create_commit():

  • CommitOperationAdd uploads a file to the Hub. If the file already exists, the file contents are overwritten. This operation accepts two arguments:

    • path_in_repo: the repository path to upload a file to.
    • path_or_fileobj: either a path to a file on your filesystem or a file-like object. This is the content of the file to upload to the Hub.
  • CommitOperationDelete removes a file or a folder from a repository. This operation accepts path_in_repo as an argument.

For example, if you want to upload two files and delete a file in a Hub repository:

  1. Use the appropriate CommitOperation to add or delete a file and to delete a folder:
>>> from huggingface_hub import HfApi, CommitOperationAdd, CommitOperationDelete
>>> api = HfApi()
>>> operations = [
...     CommitOperationAdd(path_in_repo="LICENSE.md", path_or_fileobj="~/repo/LICENSE.md"),
...     CommitOperationAdd(path_in_repo="weights.h5", path_or_fileobj="~/repo/weights-final.h5"),
...     CommitOperationDelete(path_in_repo="old-weights.h5"),
...     CommitOperationDelete(path_in_repo="logs/"),
... ]
  1. Pass your operations to create_commit():
>>> api.create_commit(
...     repo_id="lysandre/test-model",
...     operations=operations,
...     commit_message="Upload my model weights and license",
... )

In addition to upload_file() and upload_folder(), the following functions also use create_commit() under the hood:

For more detailed information, take a look at the HfApi reference.

Push files with Git LFS

Git LFS automatically handles files larger than 10MB. But for very large files (>5GB), you need to install a custom transfer agent for Git LFS:

huggingface-cli lfs-enable-largefiles

You should install this for each repository that has a very large file. Once installed, you’ll be able to push files larger than 5GB.

commit context manager

The commit context manager handles four of the most common Git commands: pull, add, commit, and push. git-lfs automatically tracks any file larger than 10MB. In the following example, the commit context manager:

  1. Pulls from the text-files repository.
  2. Adds a change made to file.txt.
  3. Commits the change.
  4. Pushes the change to the text-files repository.
>>> from huggingface_hub import Repository
>>> with Repository(local_dir="text-files", clone_from="<user>/text-files").commit(commit_message="My first file :)"):
...     with open("file.txt", "w+") as f:
...         f.write(json.dumps({"hey": 8}))

Here is another example of how to use the commit context manager to save and upload a file to a repository:

>>> import torch
>>> model = torch.nn.Transformer()
>>> with Repository("torch-model", clone_from="<user>/torch-model", token=True).commit(commit_message="My cool model :)"):
...     torch.save(model.state_dict(), "model.pt")

Set blocking=False if you would like to push your commits asynchronously. Non-blocking behavior is helpful when you want to continue running your script while your commits are being pushed.

>>> with repo.commit(commit_message="My cool model :)", blocking=False)

You can check the status of your push with the command_queue method:

>>> last_command = repo.command_queue[-1]
>>> last_command.status

Refer to the table below for the possible statuses:

Status Description
-1 The push is ongoing.
0 The push has completed successfully.
Non-zero An error has occurred.

When blocking=False, commands are tracked, and your script will only exit when all pushes are completed, even if other errors occur in your script. Some additional useful commands for checking the status of a push include:

# Inspect an error.
>>> last_command.stderr

# Check whether a push is completed or ongoing.
>>> last_command.is_done

# Check whether a push command has errored.
>>> last_command.failed

push_to_hub

The Repository class has a push_to_hub() function to add files, make a commit, and push them to a repository. Unlike the commit context manager, you’ll need to pull from a repository first before calling push_to_hub().

For example, if you’ve already cloned a repository from the Hub, then you can initialize the repo from the local directory:

>>> from huggingface_hub import Repository
>>> repo = Repository(local_dir="path/to/local/repo")

Update your local clone with git_pull() and then push your file to the Hub:

>>> repo.git_pull()
>>> repo.push_to_hub(commit_message="Commit my-awesome-file to the Hub")

However, if you aren’t ready to push a file yet, you can use git_add() and git_commit() to only add and commit your file:

>>> repo.git_add("path/to/file")
>>> repo.git_commit(commit_message="add my first model config file :)")

When you’re ready, push the file to your repository with git_push():

>>> repo.git_push()