These next sections highlight features and additional information that you may find useful to make the most out of the Git repositories on the Hugging Face Hub.
Hugging Face supports accessing repos with Python via the
huggingface_hub library. The operations that we’ve explored, such as downloading repositories and uploading files, are available through the library, as well as other useful functions!
If you prefer to use git directly, please read the sections below.
To effectively use Git repos collaboratively and to work on features without releasing premature code you can use branches. Branches allow you to separate your “work in progress” code from your “production-ready” code, with the additional benefit of letting multiple people work on a project without frequently conflicting with each others’ contributions. You can use branches to isolate experiments in their own branch, and even adopt team-wide practices for managing branches.
To learn about Git branching, you can try out the Learn Git Branching interactive tutorial.
Git allows you to tag commits so that you can easily note milestones in your project. As such, you can use tags to mark commits in your Hub repos! To learn about using tags, you can visit this DevConnected post.
Beyond making it easy to identify important commits in your repo’s history, using Git tags also allows you to do A/B testing, clone a repository at a specific tag, and more! The
huggingface_hub library also supports working with tags, such as downloading files from a specific tagged commit.
If you’d like to copy a repository, depending on whether you want to preserve the Git history there are two options.
In many scenarios, if you want your own copy of a particular codebase you might not be concerned about the previous Git history. In this case, you can quickly duplicate a repo with the handy Repo Duplicator! You’ll have to create a User Access Token, which you can read more about in the security documentation.
A duplicate of a repository with the commit history preserved is called a fork. You may choose to fork one of your own repos, but it also common to fork other people’s projects if you would like to tinker with them.
Note that you will need to install Git LFS and the
huggingface_hub CLI to follow this process. When you want to fork or rebase a repository with LFS files you cannot use the usual Git approach that you might be familiar with since you need to be careful to not break the LFS pointers. Forking can take time depending on your bandwidth because you will have to fetch and re-upload all the LFS files in your fork.
For example, say you have an upstream repository, upstream, and you just created your own repository on the Hub which is myfork in this example.
Create a destination repository (e.g. myfork) in https://huggingface.co
Clone your fork repository:
git lfs clone https://huggingface.co/me/myfork.git
- Fetch non-LFS files:
cd myfork git lfs install --skip-smudge --local # affects only this clone git remote add upstream https://huggingface.co/friend/upstream.git git fetch upstream
- Fetch large files. This can take some time depending on your download bandwidth:
git lfs fetch --all upstream # this can take time depending on your download bandwidth
4.a. If you want to completely override the fork history (which should only have an initial commit), run:
git reset --hard upstream/main
4.b. If you want to rebase instead of overriding, run the following command and resolve any conflicts:
git rebase upstream/main
- Prepare your LFS files to push:
git lfs install --force --local # this reinstalls the LFS hooks huggingface-cli lfs-enable-largefiles . # needed if some files are bigger than 5Gb
- And finally push:
git push --force origin main # this can take time depending on your upload bandwidth
Now you have your own fork or rebased repo in the Hub!