Model Repos docs
The model cards are markdown files that accompany the models and provide very useful information. They are extremely important for discoverability, reproducibility and sharing! They are the
README.md file in any repo.
The model card should describe:
- the model
- its intended uses & potential limitations, including bias and ethical considerations as detailed in Mitchell, 2018
- the training params and experimental info (you can embed or link to an experiment tracking platform for reference)
- which datasets did you train on and your eval results
The model cards have a YAML section that specify metadata. These are the fields
language: - "List of ISO 639-1 code for your language" - lang1 - lang2 thumbnail: "url to a thumbnail used in social sharing" tags: - tag1 - tag2 license: "any valid license identifier" datasets: - dataset1 - dataset2 metrics: - metric1 - metric2
You can find the detailed specification here.
Some useful information on them:
- All the tags can be used to filter the list of models on https://huggingface.co/models.
- License identifiers are the keywords listed in the right column of this table.
- Dataset, metric, and language identifiers are those listed on the Datasets, Metrics and Languages pages and in the
Here is an example:
language: - ru - en tags: - translation license: apache-2.0 datasets: - wmt19 metrics: - bleu - sacrebleu
Each model page lists all the model's tags in the page header, below the model name.
Those are primarily computed from the model card metadata, except that we also add some of them automatically, as described in How is a model's type of inference API and widget determined?.
You can specify the widget input in the model card metadata section:
widget: - text: "Jens Peter Hansen kommer fra Danmark"
It is also possible to specify non-text example inputs in the model card metadata. For example, allow users to choose from two sample audio files for automatic speech recognition tasks by:
widget: - label: Librispeech sample 1 src: https://cdn-media.huggingface.co/speech_samples/sample1.flac - label: Librispeech sample 2 src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
We provide example inputs for some languages and most widget types in the DefaultWidget.ts file. If some examples are missing, we welcome PRs from the community to add them!
Generally, the Inference API for a model uses the default pipeline settings associated with each task. But if you'd like to change the pipeline's default settings and specify additional inference parameters, you can configure the parameters directly through the model card metadata. Refer here for some of the most commonly used parameters associated with each task.
For example, if you want to specify an aggregation strategy for a NER task in the widget:
inference: parameters: aggregation_strategy: "none"
Or if you'd like to change the temperature for a summarization task in the widget:
inference: parameters: temperature: 0.7
Yes!🔥 You can specify the framework in the model card metadata section:
tags: - flair
Find more about our supported libraries here!
You can specify the dataset in the metadata:
datasets: - wmt19
You can use the
huggingface_hub library to create, delete, update and retrieve information from repos. You can also use it to download files from repos and integrate it to your own library! For example, you can easily load a Scikit learn model with few lines.
from huggingface_hub import hf_hub_url, cached_download import joblib REPO_ID = "YOUR_REPO_ID" FILENAME = "sklearn_model.joblib" model = joblib.load(cached_download( hf_hub_url(REPO_ID, FILENAME) ))
Yes, we use the KaTeX math typesetting library to render math formulas server-side, before parsing the markdown.
You have to use the following delimiters:
$$ ... $$for display mode
)for inline mode (no space between the slashes and the parenthesis).
Then you'll be able to write:
When you want to fork or rebase a repository with LFS files (all files over 20MB are stored as such), you cannot use the usual Git approach since you need to be careful to not break the LFS pointers. Forking can take time depending on your bandwidth, because you will have to fetch an re-upload all the LFS files in your fork.
For example, say you have an upstream repository, upstream, and you just created your own repository on the Hub which is myfork in this example.
Create a destination repository (e.g. myfork) in https://huggingface.co
Clone your fork repository
git lfs clone https://huggingface.co/me/myfork.git
- Fetch non LFS files
cd myfork git lfs install --skip-smudge --local # affects only this clone git remote add upstream https://huggingface.co/friend/upstream.git git fetch upstream
- Fetch large files. This can take some time depending on your download bandwidth
git lfs fetch --all upstream # this can take time depending on your download bandwidth
4.a. If you want to override completely the fork history (which should only have an initial commit), run:
git reset --hard upstream/main
4.b. If you want to rebase instead of overriding, run the following command and solve any conflicts
git rebase upstream/main
- Prepare your LFS files to push
git lfs install --force --local # this reinstalls the LFS hooks huggingface-cli lfs-enable-largefiles . # needed if some files are bigger than 5Gb
- And finally push
git push --force origin main # this can take time depending on your upload bandwidth
Now you have your own fork or rebased repo in the Hub!
|Fullname||License identifier (to use in model card)|
|Academic Free License v3.0||
|Apache license 2.0||
|Artistic license 2.0||
|Boost Software License 1.0||
|BSD 2-clause "Simplified" license||
|BSD 3-clause "New" or "Revised" license||
|BSD 3-clause Clear license||
|Creative Commons license family||
|Creative Commons Zero v1.0 Universal||
|Creative Commons Attribution 4.0||
|Creative Commons Attribution Share Alike 4.0||
|Creative Commons Attribution Non Commercial 4.0||
|Creative Commons Attribution Non Commercial Share Alike 4.0||
|Do What The F*ck You Want To Public License||
|Educational Community License v2.0||
|Eclipse Public License 1.0||
|Eclipse Public License 2.0||
|European Union Public License 1.1||
|GNU Affero General Public License v3.0||
|GNU General Public License family||
|GNU General Public License v2.0||
|GNU General Public License v3.0||
|GNU Lesser General Public License family||
|GNU Lesser General Public License v2.1||
|GNU Lesser General Public License v3.0||
|LaTeX Project Public License v1.3c||
|Microsoft Public License||
|Mozilla Public License 2.0||
|Open Software License 3.0||
|SIL Open Font License 1.1||
|University of Illinois/NCSA Open Source License||
|Open Data Commons Public Domain Dedication and License||
|Lesser General Public License For Linguistic Resources||