Model Repos docs

What are model cards and why are they useful?

The model cards are markdown files that accompany the models and provide very useful information. They are extremely important for discoverability, reproducibility and sharing! They are the README.md file in any repo.

When sharing a model, what should I add to my model card?

The model card should describe:

  • the model
  • its intended uses & potential limitations, including bias and ethical considerations as detailed in Mitchell, 2018
  • the training params and experimental info (you can embed or link to an experiment tracking platform for reference)
  • which datasets did you train on and your eval results

Model card metadata

The model cards have a YAML section that specify metadata. These are the fields

---
language: 
  - "List of ISO 639-1 code for your language"
  - lang1
  - lang2
thumbnail: "url to a thumbnail used in social sharing"
tags:
- tag1
- tag2
license: "any valid license identifier"
datasets:
- dataset1
- dataset2
metrics:
- metric1
- metric2
---

You can find the detailed specification here.

Some useful information on them:

Here is an example:

---
language:
- ru
- en
tags:
- translation
license: apache-2.0
datasets:
- wmt19
metrics:
- bleu
- sacrebleu
---

How are model tags determined?

Each model page lists all the model's tags in the page header, below the model name.

Those are primarily computed from the model card metadata, except that we also add some of them automatically, as described in How is a model's type of inference API and widget determined?.

How can I control my model's widget example input?

You can specify the widget input in the model card metadata section:

widget:
- text: "Jens Peter Hansen kommer fra Danmark"

It is also possible to specify non-text example inputs in the model card metadata. For example, allow users to choose from two sample audio files for automatic speech recognition tasks by:

widget:
- label: Librispeech sample 1
  src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
- label: Librispeech sample 2
  src: https://cdn-media.huggingface.co/speech_samples/sample2.flac

We provide example inputs for some languages and most widget types in the DefaultWidget.ts file. If some examples are missing, we welcome PRs from the community to add them!

How can I control my model's widget Inference API parameters?

Generally, the Inference API for a model uses the default pipeline settings associated with each task. But if you'd like to change the pipeline's default settings and specify additional inference parameters, you can configure the parameters directly through the model card metadata. Refer here for some of the most commonly used parameters associated with each task.

For example, if you want to specify an aggregation strategy for a NER task in the widget:

inference:
  parameters:
    aggregation_strategy: "none"

Or if you'd like to change the temperature for a summarization task in the widget:

inference:
  parameters:
    temperature: 0.7

Can I specify which framework supports my model?

Yes!πŸ”₯ You can specify the framework in the model card metadata section:

tags:
- flair

Find more about our supported libraries here!

How can I link a model to a dataset?

You can specify the dataset in the metadata:

datasets:
- wmt19

Can I access models programatically?

You can use the huggingface_hub library to create, delete, update and retrieve information from repos. You can also use it to download files from repos and integrate it to your own library! For example, you can easily load a Scikit learn model with few lines.

from huggingface_hub import hf_hub_url, cached_download
import joblib

REPO_ID = "YOUR_REPO_ID"
FILENAME = "sklearn_model.joblib"

model = joblib.load(cached_download(
    hf_hub_url(REPO_ID, FILENAME)
))

Can I write LaTeX in my model card?

Yes, we use the KaTeX math typesetting library to render math formulas server-side, before parsing the markdown.

You have to use the following delimiters:

  • $$ ... $$ for display mode
  • \\ ( ... \\ ) for inline mode (no space between the slashes and the parenthesis).

Then you'll be able to write:

LaTeX \LaTeX

MSE=(1n)βˆ‘i=1n(yiβˆ’xi)2 \mathrm{MSE} = \left(\frac{1}{n}\right)\sum_{i=1}^{n}(y_{i} - x_{i})^{2}

E=mc2 E=mc^2

How can I fork or rebase a repository with LFS pointers?

When you want to fork or rebase a repository with LFS files (all files over 20MB are stored as such), you cannot use the usual Git approach since you need to be careful to not break the LFS pointers. Forking can take time depending on your bandwidth, because you will have to fetch an re-upload all the LFS files in your fork.

For example, say you have an upstream repository, upstream, and you just created your own repository on the Hub which is myfork in this example.

  1. Create a destination repository (e.g. myfork) in https://huggingface.co

  2. Clone your fork repository

git lfs clone https://huggingface.co/me/myfork.git
  1. Fetch non LFS files
cd myfork
git lfs install --skip-smudge --local # affects only this clone
git remote add upstream https://huggingface.co/friend/upstream.git
git fetch upstream
  1. Fetch large files. This can take some time depending on your download bandwidth
git lfs fetch --all upstream # this can take time depending on your download bandwidth

4.a. If you want to override completely the fork history (which should only have an initial commit), run:

git reset --hard upstream/main

4.b. If you want to rebase instead of overriding, run the following command and solve any conflicts

git rebase upstream/main
  1. Prepare your LFS files to push
git lfs install --force --local # this reinstalls the LFS hooks
huggingface-cli lfs-enable-largefiles . # needed if some files are bigger than 5Gb
  1. And finally push
git push --force origin main # this can take time depending on your upload bandwidth

Now you have your own fork or rebased repo in the Hub!

List of license identifiers

Fullname License identifier (to use in model card)
Academic Free License v3.0 afl-3.0
Apache license 2.0 apache-2.0
Artistic license 2.0 artistic-2.0
Boost Software License 1.0 bsl-1.0
BSD 2-clause "Simplified" license bsd-2-clause
BSD 3-clause "New" or "Revised" license bsd-3-clause
BSD 3-clause Clear license bsd-3-clause-clear
Creative Commons license family cc
Creative Commons Zero v1.0 Universal cc0-1.0
Creative Commons Attribution 4.0 cc-by-4.0
Creative Commons Attribution Share Alike 4.0 cc-by-sa-4.0
Creative Commons Attribution Non Commercial 4.0 cc-by-nc-4.0
Creative Commons Attribution Non Commercial Share Alike 4.0 cc-by-nc-sa-4.0
Do What The F*ck You Want To Public License wtfpl
Educational Community License v2.0 ecl-2.0
Eclipse Public License 1.0 epl-1.0
Eclipse Public License 2.0 epl-2.0
European Union Public License 1.1 eupl-1.1
GNU Affero General Public License v3.0 agpl-3.0
GNU General Public License family gpl
GNU General Public License v2.0 gpl-2.0
GNU General Public License v3.0 gpl-3.0
GNU Lesser General Public License family lgpl
GNU Lesser General Public License v2.1 lgpl-2.1
GNU Lesser General Public License v3.0 lgpl-3.0
ISC isc
LaTeX Project Public License v1.3c lppl-1.3c
Microsoft Public License ms-pl
MIT mit
Mozilla Public License 2.0 mpl-2.0
Open Software License 3.0 osl-3.0
PostgreSQL License postgresql
SIL Open Font License 1.1 ofl-1.1
University of Illinois/NCSA Open Source License ncsa
The Unlicense unlicense
zLib License zlib
Open Data Commons Public Domain Dedication and License pddl
Lesser General Public License For Linguistic Resources lgpllr