We are helping the community work together towards the goal of advancing NLP 🔥.
Not one company, even the Tech Titans, will be able to “solve NLP” by themselves – the only way we'll achieve this is by sharing knowledge and resources. On this model hub we are building the largest collection of models, datasets and metrics to democratize and advance AI and NLP for everyone 🚀.
In your README.md model card you should:
If needed you can find a template here.
In addition to textual (markdown) content, to unlock helpful features you can add any or all of the following items to a YAML metadata block at the top of your model card:
language: "ISO 639-1 code for your language, or `multilingual`" thumbnail: "url to a thumbnail used in social sharing" tags: - array - of - tags license: "any valid license identifier" datasets: - array of dataset identifiers metrics: - array of metric identifiers
License identifiers are those standardized by GitHub here.
All the tags can then be used to filter the list of models on https://huggingface.co/models.
On top of each model page (see e.g.
distilbert-base-uncased) you'll see the model's tags – they help for discovery and condition which features are enabled on which model page.
"architectures"field of the model's config.json file – which should be automatically filled if you save your model using
.save_pretrained()– condition the type of pipeline used in the inference API, and the type of widget present on the model page
task_specific_paramssubfield, its sub-keys will be added as
pipeline:tags. All parameters defined under this sub-key will overwrite the default parameters in config.json when running the corresponding pipeline. See
To determine which pipeline and widget to display (text-classification, token-classification, translation, etc.), we use a simple mapping from model tags to one particular
pipeline_tag (we currently only expose one pipeline and widget on each model page, even for models that would support several).
We try to use the most specific pipeline for each model, see pseudo-code in this gist.
Here they are, with links to examples:
text-classification, for instance
token-classification, for instance
question-answering, for instance
translation, for instance
summarization, for instance
text-generation, for instance
fill-mask, for instance
Example inputs are the random inputs that pre-populate your widget on page launch (unless you specify an input by URL parameters).
We try to provide example inputs for some languages and widget types, but it's better if you provide your own examples. You can add them to your model card: see this commit for the format you need to use.
inference: false in your model card's metadata.
If you are interested in accelerated inference and/or higher volumes of requests and/or a SLA, please contact us at
api-enterprise at huggingface.co.
The API is built on top of our Pipelines feature.
On top of Pipelines and depending on the model type, we build a number of production optimizations like:
Yes, we use the KaTeX math typesetting library to render math formulas server-side, before parsing the markdown. You have to use the following delimiters:
$$ ... $$for display mode
)for inline mode (no space between the slashes and the parenthesis).
Then you'll be able to write: